[ Gregory PS: This talk will be interesting to many who plan to participate in Heritage Health Prize, Kaggle, KDD-2011 or other data mining/analytics competitions data mining/analytics competitions ]
"Getting In Shape For The Sport Of Data Science"
Talk by Jeremy Howard,
Kaggle
Chief Data Scientist
Jeremy recently gave a talk to the Melbourne R meetup group, where gave a brief overview of his "data scientist's toolbox" (using a few Kaggle competitions as practical examples), and also provided an introduction to ensembles of decision trees (including the well-known Random Forest™ algorithm).
A screencast of this talk is now available at media.kaggle.com/MelbURN.html
Among must have tools, Jeremy lists
- Data manipulation: Perl, Vim
- Interactive analysis: Excel
- Statistical toolkit: R (must use R packages caret; ggplot2)