- Split on Data Science Skills: Individual vs Team Approach - Jan 21, 2014.
The results of latest KDnuggets poll show an almost equal split between those who favor individual and those who favor the team approach. See the counterintuitive regional differences and interesting comments.
Data Science, Poll, Skills, Team
- PAN Competition: Plagiarism Detection, Author Identification, Author Profiling - Jan 15, 2014.
Take part in one of 3 tasks: Plagiarism Detection - given a document, is it an original? Author Identification - given a document, who wrote it? Author Profiling - given a document, what is author age / gender?
Author Detection, Author Profiling, Competition, Plagiarism Detection
- Interpreting Model Performance with Cost Functions - Jan 13, 2014.
Cost functions are critical for the correct assessment of performance of data mining and predictive models. This series goes deep into the statistical properties and mathematical understanding of each cost function and explores their similarities and differences.
Cost Function, Model Performance, Online Education, Salford Systems
- MADlib: Big Data Machine Learning in SQL for Data Scientists - Jan 6, 2014.
MADlib is open source with commercially usable BSD license; supports Postgres and Pivotal Greenplum DBMS, and provides classification, regression, clustering, topic modeling and other analytics for Big Data.