The term "big data" is in vogue now. What makes data big? In particular, what makes it grow so fast? There have been several leaps forward over the decades - computerization of record-keeping, moving commerce to the web. The latest one is the production and retention of massive amounts of unstructured text -- tweets, blog posts, etc. Structuring data places several constraints on its volume - less is retained than you started with, and the work of structuring it means that much raw data does not make it past your filters. Storing unstructured data, especially raw text data, not only relaxes those constraints but spawns whole new data retention requirements for any ad-hoc transformation or re-structuring of the data. Read more.
The Institute for Statistics Education has several courses specifically devoted to text analytics taught by Dr. Nitin Indurkhya. Most courses are 4 weeks and do not require that you be online at any particular time or any particular day during each weekly session.
Nitin Indurkhya, the founder of Data-Miner Pty Ltd., which has been engaged in data-mining, text mining, language technologies consulting and education since 1997. The company developed and markets several data-mining and text-mining toolkits, including the Enterprise Data-Miner toolkit (the first data-mining software to gain the 100% pure Java certification). He has co-authored four best-selling books on data and text analysis, Predictive Rule Discovery from Electronic Health Records (2010), Handbook of Natural Language Processing, Second Edition (2010), Fundamentals of Predictive Text Mining (2010) and Text Mining: Predictive Methods For Analyzing Unstructured Information (2005).
An adjunct faculty member at the School of Computer Science and Engineering at the University of New South Wales in Sydney, Australia, he has been a visiting professor in a variety of universities and research institutions in Australia, Brasil, India, Japan, Malaysia, Portugal, Singapore, Spain, USA and, most recently, Vietnam. He has also been a Principal Research Scientist at eBay, and worked at AT&T Bell Labs Hitachi.
Courses starting this summer
- Text Mining, June 8 - July 6
- Natural Language Processing, July 20 - August 17
- Sentiment Analysis, August 31 - September 21
Analytical methods for text are just the tip of the iceberg. Useful analysis necessarily reduces unstructured text to structured quantitative data, and the rest of the statistical iceberg comes into play.
Deepen your understanding and strengthen your skills with certificate programs from the Institute. Take a look at our Programs in Analytics and Statistical Studies (PASS) - particularly Business Analytics and Data Mining.