- More Data Mining with Weka - Jan 30, 2014.
This online course teaches both principles and practical data mining techniques, lets students work on very big datasets, classify text, experiment with clustering, and much more.
Association Rules, Clustering, Data Mining with Weka, Online Education, Text Classification, Weka
- Determining the Value of Insights - Jan 30, 2014.
With the value of Consumer Insights being questioned to justify ROI, the Market Research professionals need to figure out ways to quantify the value of those insights. Determining the value of insights is no easy task and requires focus on three key components.
Efficiency, Insight Effectiveness, Insight Quality, Market Research
- Viewpoint: Why your company should NOT use “Big Data” - Jan 27, 2014.
Hardcore analytics (and Big Data) can add value, but only marginally and only for companies that have already mastered using the data they already have. The ‘obvious’ information from your own data can get you 90%+ of the total impact, so start there. The hard part is executing the basic insights across the organization.
80/20 Principle, Hardcore Analytics, Pair Search, Quality Score, Sort Order
- Using Data Mining to Predict the Winter Olympics Medal Counts in Sochi - Jan 25, 2014.
Could data mining techniques accurately predict the medal counts at the Olympics? A predictive model could give us an estimate of the number of medals each nation might win; but how close could we get to the actual outcomes? It was a tantalizing project …
Olympics, Russia, Sports
- Split on Data Science Skills: Individual vs Team Approach - Jan 21, 2014.
The results of latest KDnuggets poll show an almost equal split between those who favor individual and those who favor the team approach. See the counterintuitive regional differences and interesting comments.
Data Science, Poll, Skills, Team
- PAN Competition: Plagiarism Detection, Author Identification, Author Profiling - Jan 15, 2014.
Take part in one of 3 tasks: Plagiarism Detection - given a document, is it an original? Author Identification - given a document, who wrote it? Author Profiling - given a document, what is author age / gender?
Author Detection, Author Profiling, Competition, Plagiarism Detection
- Interpreting Model Performance with Cost Functions - Jan 13, 2014.
Cost functions are critical for the correct assessment of performance of data mining and predictive models. This series goes deep into the statistical properties and mathematical understanding of each cost function and explores their similarities and differences.
Cost Function, Model Performance, Online Education, Salford Systems
- MADlib: Big Data Machine Learning in SQL for Data Scientists - Jan 6, 2014.
MADlib is open source with commercially usable BSD license; supports Postgres and Pivotal Greenplum DBMS, and provides classification, regression, clustering, topic modeling and other analytics for Big Data.