Mining a Data Mining Conference: Analytics on KDD-2013

We look at interesting analytics and statistics from KDD-2013 Conference on Knowledge Discovery and Data Mining. Which topics are hot, and which are most likely to be accepted?

By Gregory Piatetsky, Aug 15, 2013.

I have just returned from a very successful KDD-2013 ConferenceKDD-2013 Conference on Knowledge Discovery and Data Mining, held on Aug 11-14, 2013 in Chicago, IL.

KDD continues to be the leading research conference in the field, and this year received 726 papers, from which only 125 were accepted, 17.2% acceptance ratio.

KDD-2013 had about 1,200 attendees, which makes it the largest research, peer-reviewed conference in Data Mining, Data Science, and Knowledge Discovery, ever (so far).

KDD-2013 Program Committee co-chairs, Inderjit Dhillon (U. of Texas at Austin) and Yehuda Koren (Google) have compiled an interesting report on KDD-2013 papers, trends, and topics, and here is an excerpt.

KDD Research Track Papers

KDD is characterized by a healthy mixture of fundamental topics while being in close touch with new applications.
Fundamental topics include: Classification, Clustering, Probabilistic Methods, Rule and pattern mining, active and transfer learning.

New Trends and Applications include social networks, novel statistical techniques for big data, big data, social influence, viral marketing, social media, recommender systems, security & privacy.

Here is a word cloud of 125 accepted papers.

KDD 2013 accepted papers Word Cloud

The following table shows Acceptance ratio by topic, which I divided into hot (acceptance rate significantly above average), medium (medium acceptance rate) and cold (below average acceptance rate).

Topic % Submissions Acceptance Ratio
user modeling 1.72% 33.83%
big data – scalable methods 2.42% 29.43%
unsupervised learning 1.87% 27.46%
supervised learning 1.97% 26.72%
recommender systems 2.72% 25.66%
probabilistic methods 2.52% 24.50%
social and information networks 6.68% 21.85%
classification 3.61% 20.27%
ALL 100% 17.2%
web mining 3.24% 17.07%
graph mining 5.72% 13.35%
security and privacy 1.86% 8.95%
clustering 3.92% 8.45%
other 1.39% 6.72%
information extraction 2.17% 6.67%
feature selection 1.47% 4.79%

The topics most likely to be accepted were user modeling, big data – scalable methods, unsupervised learning, supervised learning, and recommender systems. Note the different odds for two classic topics: clustering had almost 2.5 times smaller acceptance rate than classification!

Note to future authors – don’t put “other” as a topic.

The following table shows Acceptance ratio by the number of authors.

1 45 6.67%
2 179 11.73%
3 217 16.59%
4 167 23.35%
5+ 117 22.22%

Having more authors gives an almost linear improvement in acceptance probability, up to 4 authors. However, the KDD-2013 best paper: “Simple and Deterministic Matrix Sketching”, was written by one author: Edo Liberty, Yahoo! Labs.

Comparing with KDD-2005, which was also held in Chicago, we can see many new topics added since then.

New KDD topics added since 2005

Finally, the most trending topics were social networks, Twitter, and Sampling.

Topic Delta Increase
since 2012
Delta Increase
since 2005
social networks 10%, (from 22.8% to 32.8%) 25.90%, (from 6.9% to 32.8%)
twitter 6.30% 16%
sampling 5.90% 5.90%
social media 5.30% 13.60%
diversity 5.20% 6%
big data 3.90% 5.60%
drug discovery 3.80% 3%
EM 3.60% 0.65%
crowd sourcing 2.90% 4.80%
bioinformatics 2.50% -0.50%
information diffusion 2.50% 4%

Thanks to Yehuda Koren, KDD-2013 Program Chair, for providing the slides from which I extracted the information above.