ByteMining, Ryan Rosario, Aug 2011
I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics. By far the most enjoyable and interesting aspects of the conference were the breakout sessions.
KDD 2011 featured several keynote speeches that were spread out among three days and throughout the day. This year's conference had a few big names.
Steven Boyd, Convex Optimization: From Embedded Real-Time to Large-Scale Distributed. The first keynote, by Steven Boyd, discussed convex optimization. The goal of convex optimization is to minimize some objective function given linear constraints. The caveat is that the objective function and all of the constraints must be convex ("non-negative curvature" as Boyd said). The goal of convex optimization is to turn the problem into a linear programming problem. We should care about convex optimization because it comes from some beautiful and complete theory like duality and optimality conditions. I must say, that whenever I am chastising statisticians, I often say that all they care about is "beautiful theory" so his comment was humorous to me. Convex optimization is a very intuitive way to think about regression and techniques such as the lasso. Convex optimization has tons of use cases ...
For more information about convex optimization, see the website for Convex Optimization by Boyd and Vandenberghe. The book is available for free as well as lecture slides etc.
Judea Pearl , The Mathematics of Causal Inference. Pearl believes that humans do not communicate with probability, but causality (I do not agree with this entirely). I appreciated that he mentioned that it takes work to overcome the difference in thinking between probability and causality. In statistics, we use some data and a joint distribution to make inferences about some quantity or variable P. In causality, there is an intentional intervention that changes the joint distribution P into another joint distribution P'. Causality requires new language and mathematics (I do not see it). In order to use causality, one must introduce some untestable hypothesis
Data Mining Competitions
One interesting event during KDD 2011 was the panel Lessons Learned from Contests in Data Mining. This panel featured Jeremy Howard (Kaggle), Yehuda Koren (Yahoo!), Tie-Yan Liu (Microsoft), and Claudia Perlich (Media6Degrees). Both Kaggle and Yahoo run data mining competitions: Kaggle has its own series of competitions and Yahoo is a major sponsor of the KDD Cup competition. Perlich has participated and won many data mining competitions. Liu provided a different insight into data mining competitions as an industry observer.
Jeremy Howard gave some insight into the history of data mining competitions. He credited KDD 97 with the formation of the first data mining competition. He announced to the crowd that companies spend 100 billion dollars every year on data mining products and services (not including in-house costs such as employment) and that there are approximately 2 million Data Scientists. The estimate of the number of Data Scientists was based on the number of times R was downloaded, and is an estimate based on David Smith's (Revolution Computing) blog post. I love R, and every Data Scientist should use it, but there are several problems with this estimate. Not everyone that uses R is a Data Scientist; a large portion of R users are statisticians ("beautiful theory"), teachers, miscellaneous students etc. Second, not all Data Scientists use R. Some are even more creative and write their own tools or use little-adopted software packages. There are also a lot of Data Scientists that use Python instead of R. Howard also announced that over the next year, Kaggle with be starting 1000s of "invitation only" competitions. Personally, I do not care for this type of exclusion even though their intentions are good.
Claudia Perlich (Media6Degrees) discussed her experience participating in data mining competitions. She has won several contests. She commented on the distinction between sterile/cleaned data and real data as competitions can include either type. The concept of Occam�€™s Razor applies to data mining competitions; Perlich won most of her competitions using a linear model, but by using more complex and creative features. Perlich emphasizes that complex features are better than complex models.