Top 10 R Packages to be a Kaggle Champion
Kaggle top ranker Xavier Conort shares insights on the “10 R Packages to Win Kaggle Competitions”.
Since R is widely being used even outside the data science community (such as by statisticians, actuaries, etc.), this list of top 10 powerful R packages might help you in more ways than you might think.
Here are those 10 packages particularly powerful to build winning solutions:
 gbm [Gradient Boosting Machine]
 randomForest [Random Forest]
 e1071 [Support Vector Machines]
 glmnet [Lasso and ElasticNet Regularized Generalized Linear Models]
 tau [Text Analysis Utilities]
 Matrix [Sparse and Dense Matrix Classes and Methods]
 SOAR [Memory management in R by delayed assignments]
 foreach [Foreach looping construct for R]
 doMC [Foreach parallel adaptor for the multicore package]
 data.table [Extension of data.frame]
Allowing the machine to capture complexity:
Taking advantage of highcardinality categorical or textdata:
Making your code more efficient:
Expert Advice for Kaggle Competitions: Use your intuition to help the machine by doing the following:
 Always compute differences/ratios of features
 Always consider discarding of features that are "too good"
The complete set of slides for this presentation by Xavier Conort: http://www.slideshare.net/DataRobot/final10rxc36610234
Related:
Top Stories Past 30 Days

