Top Research Leaders in Data Mining, Data Science, and KDD

We identify the top researchers in Data Mining, Data Science, and KDD. Jiawei Han, Philip Yu, and Christos Faloutsos remain the leaders, but they are joined by many fast rising young researchers - the leaders of tomorrow.

In honor of the upcoming KDD-2014, 20th KDD Conference on Knowledge Discovery and Data Mining, we looked at the research in this field, using a very good tool from Microsoft Academic Search.

Overall, it finds 37,377 publications and 269,627 citations. Both publications and citations started growing dramatically around 1996. The Microsoft Academic search data shows that the growth peaked in 2009, and declined dramatically to almost zero around 2011. Such change cannot be explained only by term "Data Mining" becoming less popular and being replaced by "Data Science" and "Big Data". Since the total number of conferences and publications in the field has only grown since 2011, it seems that Microsoft Academic search data for 2011 and later years in incomplete. See more detailed analysis at the bottom of this post.

Growth in Data Mining publications and citations

Here is a list of 20 top researchers in the field, according to Microsoft Academic Search for Data Mining for all time. The authors are ranked using field rating, which is similar to H-index in that it calculates the number of publications by the author and the distribution of citations to the publications, but it only calculates publications and citations within a specific field to show the impact of the scholar within that field.

NAuthor (Affiliation)Total
Data Mining PublicationsField Rating
1Jiawei Han
2 Philip S. Yu
3 Rakesh Agrawal (Microsoft) 35911348
4 Christos Faloutsos (CMU) 48421645
5 Hans-Peter Kriegel (U. Munich) 45216737
6 Eamonn J. Keogh (UCR) 19611836
7 George Karypis (U. Minnesota) 33011536
8 Heikki Mannila (Aalto U.) 28313035
9 Andrew Mccallum (U. Mass Amherst) 2567835
10 Jian Pei (SFU) 41915134
11 Padhraic Smyth (UCI) 2717334
12 Rajeev Motwani (Stanford) 2715534
13 Mohammed J. Zaki (RPI) 36513033
14 Vipin Kumar (U. Minnesota) 57112433
15 Hector Garcia-Molina (Stanford) 6058732
16 Charu C. Aggarwal (IBM) 23114431
17 Bing Liu (UIC) 53310130
18 Prabhakar Raghavan, Google 2945030
19 Dimitrios Gunopulos (National U. of Athens) 2027829
20 Johannes Gehrke (Cornell) 2287429

There is little correlation between field rating and total publications (R2=0.26). There is, as expected, good correlation between field rating and the number of Data Mining publications (R2=0.69)

Top Data Mining Researchers, Field Rating vs Num. of Data Mining Publications

We note that most of correlation is driven by top 2 researchers - Jiawei Han and Philip Yu who have over 300 publications each. If we exclude them, the correlation between field rating and Data Mining publications drops to R2=0.26.

Data Mining Research Leaders #3-20, Field Rating vs Num. of Data Mining Publications

Seven of these researchers have received KDD Innovation Award
  • Jiawei Han, 2004
  • Rakesh Agrawal, 2000
  • Christos Faloutsos, 2010
  • Heikki Mannila, 2003
  • Padhraic Smyth, 2009
  • Vipin Kumar, 2012

and 5 have received IEEE ICDM Research Contributions Award
  • Jiawei Han, 2002
  • Philip S. Yu, 2003
  • Christos Faloutsos, 2006
  • Hans-Peter Kriegel, 2013
  • Heikki Mannila, 2009

Next I looked at the top researchers in the last 10 years. Jiawei Han, Philip Yu, and Christos Faloutsos are still in the top, but the rest of the list has many new names, showing showing rapid evolution of the field.

N-10N-allChangeAuthor (Affiliation)Total
Data Mining PublicationsField Rating
12up 1Philip S. Yu (U. of Illinois Chicago)78822930
21dn 1Jiawei Han (U. of Illinois Urbana-Champaign)65520729
34up 1Christos Faloutsos (Carnegie Mellon U.)48412924
410up 6Jian Pei (Simon Fraser U.)41910221
530up 25Yufei Tao (Chinese U. of Hong Kong)1405021
660Eamonn J. Keogh (U. of California Riverside)1967220
75dn 2Hans-Peter Kriegel (U. of Munich)4528518
878up 70Qiang Yang (Hong Kong U. of Science and Technology )4127918
916up 7Charu C. Aggarwal (IBM)2317716
1069up 59Lise Getoor (U. of Maryland)2106516
1187up 76Jianyong Wang (Tsinghua U.)1555016
1214up 2Vipin Kumar (U. of Minnesota)5714616
13109up 96Xifeng Yan (U. of California Santa Barbara)1603916
1455up 41Jeffrey Xu Yu (Chinese U. of Hong Kong)4289715
1523up 8Ke Wang (Simon Fraser U.)5465715
1646up 30Beng Chin Ooi (National U. of Singapore)2915415
1774up 57Dimitris Papadias (Hong Kong U. of Science & Technology)2004215
187dn 11George Karypis (U. of Minnesota)3304215
1990up 71Wei-ying Ma (Microsoft)3354015
2019dn 1Dimitrios Gunopulos (National and Kapodistrian U. of Athens)2024015

Researchers with the largest gains in field ranking are:
  • up 96, Xifeng Yan UCSB
  • up 76, Jianyong Wang, Tsinghua University
  • up 71, Wei-ying Ma Microsoft
  • up 70, Qiang Yang HKUST
  • up 59, Lise Getoor U. of Maryland
  • up 57, Dimitris Papadias, HKUST
  • up 41, Jeffrey Xu Yu Chinese University of Hong Kong
  • up 30, Beng Chin Ooi National University of Singapore

so we can see that researchers from China (including Hong Kong) are becoming the leaders in data mining field.

Microsoft Academic search shows a decline in papers on "Data Mining" starting around 2010 (chart above). This is especially clearly seen in the Microsoft Academic search for KDD conference on Knowledge Discovery and Data Mining, which is the first and most-cited conference for this field.

Microsoft Academic Search for KDD - Knowledge Discovery and Data Mining

The chart shows that the number of publications in KDD conference related to Data Mining peaked at 247 in 2009 (probably including workshop papers), but then dropped to 125 publications in 2010, and only 3 in 2011, and 1 in 2012.

While some of the change could be due to other terms like "Data Science" or "Large Scale" replacing "Data Minig" in the paper titles and session topics, this cannot explain such drop and having only 1 "Data Mining" paper for KDD-2012. Since KDD conference main focus is on Knowledge Discovery and Data Mining, all of KDD papers should be considered as relevant to Data Mining.

We also note that KDD 2011 conference had 126 accepted papers and KDD-2012 had 133 accepted papers, so a lot of data is missing from Microsoft Academic search for Data Mining starting with at least 2011.

See also "The decline and fall of Microsoft Academic Search", which shows that after 2011 Microsoft Academic Search is missing a lot of papers. Thus analysis using Microsoft Academic Search option "Last 5 years" will be unrepresentative. However, the all-time list of researchers is still very good, as evident to all who follow the field.