SSRN, danah boyd (Microsoft Research) and Kate Crawford (University of New South Wales), Sep 21
Abstract: The era of Big Data has begun. Computer scientists, physicists, economists, mathematicians, political scientists, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. Diverse groups argue about the potential benefits and costs of analyzing information from Twitter, Google, Verizon, 23andMe, Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data. Significant questions emerge. Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people's access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what 'research' means? Some or all of the above?
This essay offers six provocations that we hope can spark conversations about the issues of Big Data. Given the rise of Big Data as both a phenomenon and a methodological persuasion, we believe that it is time to start critically interrogating this phenomenon, its assumptions, and its biases.
(This paper was presented at Oxford Internet Institute's "A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society" on September 21, 2011.)
Keywords: Big Data, methodology, sociology, computer science, analysis
Get the paper (17 pages) at
6 provocations include:
1. Automating Research Changes the Definition of Knowledge.
In the early decades of the 20th century, Henry Ford devised a manufacturing system of mass production, using specialized machinery and standardized products. Simultaneously, it became the dominant vision of technological progress...
2. Claims to Objectivity and Accuracy are Misleading
'Numbers, numbers, numbers,' writes Latour (2010). 'Sociology has been obsessed by the goal of becoming a quantitative science.' Yet sociology has never reached this goal, in Latour's view, because of where it draws the line between what is and is not quantifiable knowledge in the social domain.
3. Bigger Data are Not Always Better Data
... Twitter has become a popular source for mining Big Data, but working with Twitter data has serious methodological challenges that are rarely addressed by those who embrace it.
5. Just Because it is Accessible Doesn't Make it Ethical
In 2006, a Harvard-based research project started gathering the profiles of 1,700 collegebased Facebook users to study how their interests and friendships changed over time (Lewis et al. 2008). This supposedly anonymous data was released to the world, allowing other researchers to explore and analyze it. What other researchers quickly discovered was that it was possible to de-anonymize parts of the dataset: compromising the privacy of students, none of whom were aware their data was being collected (Zimmer 2008).