Top Research Leaders in Data Mining, Data Science, and KDD
We identify the top researchers in Data Mining, Data Science, and KDD. Jiawei Han, Philip Yu, and Christos Faloutsos remain the leaders, but they are joined by many fast rising young researchers - the leaders of tomorrow.
In honor of the upcoming KDD-2014, 20th KDD Conference on Knowledge Discovery and Data Mining, we looked at the research in this field, using a very good tool from Microsoft Academic Search.
Overall, it finds 37,377 publications and 269,627 citations. Both publications and citations started growing dramatically around 1996. The Microsoft Academic search data shows that the growth peaked in 2009, and declined dramatically to almost zero around 2011. Such change cannot be explained only by term "Data Mining" becoming less popular and being replaced by "Data Science" and "Big Data". Since the total number of conferences and publications in the field has only grown since 2011, it seems that Microsoft Academic search data for 2011 and later years in incomplete. See more detailed analysis at the bottom of this post.
Here is a list of 20 top researchers in the field, according to Microsoft Academic Search for Data Mining for all time. The authors are ranked using field rating, which is similar to H-index in that it calculates the number of publications by the author and the distribution of citations to the publications, but it only calculates publications and citations within a specific field to show the impact of the scholar within that field.
|Data Mining Publications||Field Rating|
Philip S. Yu|
|3||Rakesh Agrawal (Microsoft)||359||113||48|
|4||Christos Faloutsos (CMU)||484||216||45|
|5||Hans-Peter Kriegel (U. Munich)||452||167||37|
|6||Eamonn J. Keogh (UCR)||196||118||36|
|7||George Karypis (U. Minnesota)||330||115||36|
|8||Heikki Mannila (Aalto U.)||283||130||35|
|9||Andrew Mccallum (U. Mass Amherst)||256||78||35|
|10||Jian Pei (SFU)||419||151||34|
|11||Padhraic Smyth (UCI)||271||73||34|
|12||Rajeev Motwani (Stanford)||271||55||34|
|13||Mohammed J. Zaki (RPI)||365||130||33|
|14||Vipin Kumar (U. Minnesota)||571||124||33|
|15||Hector Garcia-Molina (Stanford)||605||87||32|
|16||Charu C. Aggarwal (IBM)||231||144||31|
|17||Bing Liu (UIC)||533||101||30|
|18||Prabhakar Raghavan, Google||294||50||30|
|19||Dimitrios Gunopulos (National U. of Athens)||202||78||29|
|20||Johannes Gehrke (Cornell)||228||74||29|
There is little correlation between field rating and total publications (R2=0.26). There is, as expected, good correlation between field rating and the number of Data Mining publications (R2=0.69)
We note that most of correlation is driven by top 2 researchers - Jiawei Han and Philip Yu who have over 300 publications each. If we exclude them, the correlation between field rating and Data Mining publications drops to R2=0.26.
Seven of these researchers have received KDD Innovation Award
- Jiawei Han, 2004
- Rakesh Agrawal, 2000
- Christos Faloutsos, 2010
- Heikki Mannila, 2003
- Padhraic Smyth, 2009
- Vipin Kumar, 2012
and 5 have received IEEE ICDM Research Contributions Award
- Jiawei Han, 2002
- Philip S. Yu, 2003
- Christos Faloutsos, 2006
- Hans-Peter Kriegel, 2013
- Heikki Mannila, 2009
Next I looked at the top researchers in the last 10 years. Jiawei Han, Philip Yu, and Christos Faloutsos are still in the top, but the rest of the list has many new names, showing showing rapid evolution of the field.
|Data Mining Publications||Field Rating|
|1||2||up 1||Philip S. Yu (U. of Illinois Chicago)||788||229||30|
|2||1||dn 1||Jiawei Han (U. of Illinois Urbana-Champaign)||655||207||29|
|3||4||up 1||Christos Faloutsos (Carnegie Mellon U.)||484||129||24|
|4||10||up 6||Jian Pei (Simon Fraser U.)||419||102||21|
|5||30||up 25||Yufei Tao (Chinese U. of Hong Kong)||140||50||21|
|6||6||0||Eamonn J. Keogh (U. of California Riverside)||196||72||20|
|7||5||dn 2||Hans-Peter Kriegel (U. of Munich)||452||85||18|
|8||78||up 70||Qiang Yang (Hong Kong U. of Science and Technology )||412||79||18|
|9||16||up 7||Charu C. Aggarwal (IBM)||231||77||16|
|10||69||up 59||Lise Getoor (U. of Maryland)||210||65||16|
|11||87||up 76||Jianyong Wang (Tsinghua U.)||155||50||16|
|12||14||up 2||Vipin Kumar (U. of Minnesota)||571||46||16|
|13||109||up 96||Xifeng Yan (U. of California Santa Barbara)||160||39||16|
|14||55||up 41||Jeffrey Xu Yu (Chinese U. of Hong Kong)||428||97||15|
|15||23||up 8||Ke Wang (Simon Fraser U.)||546||57||15|
|16||46||up 30||Beng Chin Ooi (National U. of Singapore)||291||54||15|
|17||74||up 57||Dimitris Papadias (Hong Kong U. of Science & Technology)||200||42||15|
|18||7||dn 11||George Karypis (U. of Minnesota)||330||42||15|
|19||90||up 71||Wei-ying Ma (Microsoft)||335||40||15|
|20||19||dn 1||Dimitrios Gunopulos (National and Kapodistrian U. of Athens)||202||40||15|
Researchers with the largest gains in field ranking are:
- up 96, Xifeng Yan UCSB
- up 76, Jianyong Wang, Tsinghua University
- up 71, Wei-ying Ma Microsoft
- up 70, Qiang Yang HKUST
- up 59, Lise Getoor U. of Maryland
- up 57, Dimitris Papadias, HKUST
- up 41, Jeffrey Xu Yu Chinese University of Hong Kong
- up 30, Beng Chin Ooi National University of Singapore
so we can see that researchers from China (including Hong Kong) are becoming the leaders in data mining field.
Microsoft Academic search shows a decline in papers on "Data Mining" starting around 2010 (chart above). This is especially clearly seen in the Microsoft Academic search for KDD conference on Knowledge Discovery and Data Mining, which is the first and most-cited conference for this field.
The chart shows that the number of publications in KDD conference related to Data Mining peaked at 247 in 2009 (probably including workshop papers), but then dropped to 125 publications in 2010, and only 3 in 2011, and 1 in 2012.
While some of the change could be due to other terms like "Data Science" or "Large Scale" replacing "Data Minig" in the paper titles and session topics, this cannot explain such drop and having only 1 "Data Mining" paper for KDD-2012. Since KDD conference main focus is on Knowledge Discovery and Data Mining, all of KDD papers should be considered as relevant to Data Mining.
We also note that KDD 2011 conference had 126 accepted papers and KDD-2012 had 133 accepted papers, so a lot of data is missing from Microsoft Academic search for Data Mining starting with at least 2011.
See also "The decline and fall of Microsoft Academic Search", which shows that after 2011 Microsoft Academic Search is missing a lot of papers. Thus analysis using Microsoft Academic Search option "Last 5 years" will be unrepresentative. However, the all-time list of researchers is still very good, as evident to all who follow the field.
- KDD-2014, Top Conference on Knowledge Discovery and Data Mining, New York
- Top 10 in Data Mining: keywords, authors, publications, organizations
- Top Conferences in Data Mining