Twitter network analysis clusters data scientists and reveals most influential data scientists on Twitter. How does it compare to human ranking of data scientists on Quora?
Gilad Lotan from Social Flow recently presented a
tutorial at the Pydata NYC conference on his work using Python's Networkx library and the open source graphing tool, Gephi.
As part of his analysis he also used these tools to analyze social networks on Twitter
pythonistas and data scientists.
Here's the network chart for data scientists, defined as users who have one these phrases in their Twitter bios: "Data Science, Data Scientist, Machine Learning, Data Strateg*".
The biggest nodes include
According to Gilad,
- @hmason, Hilary Mason (purple cluster), bitly, (purple cluster)
- @johnmyleswhite, John Myles White, author of Author of Machine Learning for Hackers (purple cluster)
- @kaggle, Kaggle, (yellow cluster)
- @peteskomoroch, Pete Skomoroch, LinkedIn, (blue cluster)
- @DataJunkie, Ryan Rosario, Data Scientist/Research Engineer at Riot Games (blue cluster)
- @dpatil, DJ Patil, formerly LinkedIn, now Greylock (blue cluster)
- @bigdata, Ben Lorica, Chief Data Scientist at OReillyMedia (blue cluster)
- @ogrisel, Olivier Grisel, contributor to scikit-learn (yellow cluster)
- @kdnuggets, Gregory Piatetsky, KDnuggets (green cluster)
- @revodavid, David Smith, Revolution Analytics (green cluster)
Purple cluster seems to be a mix of east coast and academics,
the dark blue is the west coast data drinking crew. Yellow looks like west coast social network folks while green have been doing it for a while.
The orange cluster is harder to nail down. Perhaps more academic, applied math and less tech-scene?
I think that the above clustering is interesting, but not fully representative - for example
it does not include people like
Jeff Hammerbacher, @hackingdata who with DJ Patil coined the term "Data Scientist",
but his twitter bio does not include "Data Scientist".
For comparison, Ferenc Huszár, data scientist at PeerIndex, recently gave this interesting
answer on Quora:
Who are the most notable and influential data scientists?.
Here is his list of most influential data scientist, which shows a significant overlap between human ranking and Twitter ranking, but about half of the list below is missing from the Twitter map.
Notable and Influential Data Scientists in Industry, according to Quora
- Hilary Mason, bitly
- Dj Patil, linkedIn, then Greylock partners
- Jeff Hammerbacher, cloudera, previously Facebook Data Team, @hackingdata
- Peter Skomoroch, linkedin,
- Drew Conway,
- Olivier Grisel, contibutor to scikit-learn,
- Amy Heineike, Squid
- Gregory Piatetsky, KDnuggets
- Andreas Weigend, former chief scientist at Amazon, now Stanford
- Tim O'Reilly, not a data scientist, but very important influencer in the area
- Andrew Ng, Coursera and Stanford