ACM KDD 2013, Chicago: Report by Dirk Van den Poel

Read Dirk Van den Poel excellent and detailed reports and photographs from KDD-2013 conference in Chicago, August 10-14 – one of the prime events in the data mining, big data, and data science.

By Dirk Van den Poel, 15 August 2013.

The KDD conference #kdd2013 in Chicago from August 10-14 is one of the prime events in the data mining, big data space. This is illustrated by this year’s record-breaking attendance of 1200+ data scientists (both researchers and practitioners) from academia, industry, and government. It’s organized by ACMs SIGKDD (Knowledge Discovery and Data Mining).

This is a personal summary of the event, based on my choices of the sessions. All picture are my own (@dirkvandenpoel). At any time, there were many sessions in parallel. Let’s have an in-depth look at the event in chronological order


On Saturday, KDD Big Data Camp kicked off the event. About 150 data scientists joined this pre-conference bootcamp. Prof. Dr. Robert Grossman (University of Chicago, Open Data Group) kicked off the event with an introduction. He emphasizes that both R and Python are gaining traction. R dominates the modelling space, Python dominates the IT deployment space (mainly because of R’s restrictive GPL license). He also highlighted OSDC’s public datasets initiative.

Xavier Amatriain on Netflix Genre rows

[Sunday] was one of the best talks I heard at KDD this year by Xavier Amatriain (@xamat, Director of Personalization at Netflix). His presentation was packed with useful content about personalized movie recommendations. He also confirmed that the “Top 2 algorithms (of the Netflix prize) are still in production”. He also mentioned that “Popularity is a tough benchmark to beat in personalization”. Still, they found some improvements (see picture below by using more data (ratings) and better models).

The Geography of Facebook NeighborhoodsProf. Dr. Jon Kleinberg (Cornell University, 2013 SIGKDD Innovation Awardee) held his lecture titled “Everyday Life in a Data-Rich World”. He highlighted the fact that some network topographies are unlikely [because of mathematical impossibility, and others] because of how people behave (see picture on the right).

… [Tuesday] Dr. Rayid Ghani, (@rayidghani, University of Chicago / Edgeflip) talked on presidential elections “Using predictive analytics to win elections” (official title: Targeting and Influencing at Scale: From Presidential Elections to Social Good). Again, this industry practice session was very well attended.

Rayid Ghani talked about how machine learning and data mining along with randomized experiments were used to target and influence tens of millions of people. It is clear that predictive analytics could not make up several percentage points in the elections, but it did make a difference for states where both candidates were close.

Milind Bhandarkar and Gregory Piatetsky at KDD-2013Milind Bhandarkar (@techmilind, Pivotal) presented the talk titled

“Hadoop: A View from the Trenches”

(in the picture you also see Gregory Piatetsky, @kdnuggets, who chaired the session).

Milind started off by giving a brief history of how Hadoop came about. He predicts convergence of #bigdata Hadoop + #hpc + databases at #kdd2013

Read more.