Facebook Data Mining by Stephen Wolfram
Stephen Wolfram data mines anonymized Facebook data from over a million people who used Wolfram|Alpha personal facebook analytics. See what he found and how you can analyze your facebook data.
By Gregory Piatetsky-Shapiro, Apr 25, 2013.
Stephen Wolfram is a brilliant scientist, the founder of Mathematica, and more recently of Wolfram Alpha, which has many intelligent and knowledge reasoning capabilities. A recent addition is Wolfram|Alpha Personal Analytics for Facebook. You can analyze your own Facebook graph for free, and over a million people have done it.
Wolfram is collecting anonymized statistics, and also launched a Data Donor program that allows people to contribute data for research purposes. In a recent blog Stephen Wolfram analyzes this data and reports interesting findings.
He found that on average, people have 342 friends on Facebook.
Here is a graph showing the number of friends vs. age.
We note that the number of Facebook friends peaks when a person is around 18-20 years, and slowly declines afterward. Is it a reflection of social trends or an artifact of Facebook being more popular among younger people? Or something else?
See also NYTimes perspective: Looking at Facebook's Friend and Relationship Status Through Big Data
Here is a word cloud from KDnuggets facebook postings, generated by Wolfram Alpha.
"Big Data", "Data Science", "Data Mining", Analytics are the most popular topics.
Facebook personal analytics report also includes breakdown by time, post length, most liked posts, relationships of friends, and much more.
Comments from the web
From Business Analytics LinkedIn group: Tom Barker
The peak at age 20 has to do with the adoption and use of Facebook for what it was originally intended - creating circles to ease the college course work load. The more connections you have in college courses the easier you're life will be - notes, previous exams, ect. The decline is from older people generally adding more family and less friends to Facebook.
From Advanced Business Analytics, Data Mining and Predictive Modeling LinkedIn group: Dan Rice
I have not read the book, so my comments should be taken with that in mind. But, what strikes me from the example given of one of the most interesting findings (the chart of the number of friends as a function of age) is that this is something that would be found with probably at most 1000-5000 observations instead of 100 million observations, at least with methods that avoid error in predictive modeling and thus easily allow the modeling of nonlinear effects in small samples..
I would be interested in knowing what findings resulted specifically because of the 100 million observations that would not have been possible with small representative samples of data.