KDnuggets Home » News » 2015 » Apr » Opinions, Interviews, Reports » Interview: Emmanuel Letouzé, Data-Pop Alliance on Democratizing the Benefits of Big Data ( 15:n13 )

Interview: Emmanuel Letouzé, Data-Pop Alliance on Democratizing the Benefits of Big Data


We discuss the 3 Cs of Big Data, state of ethics in the field of Big Data, and how to ensure that the benefits of Big Data reach the masses.



emmaneul-letouzeEmmanuel Letouzé is the director and co-founder of Data-Pop Alliance on Big Data and development, jointly created by the Harvard Humanitarian Initiative (HHI), the MIT Media Lab and the Overseas Development Institute (ODI). He is a Visiting Scholar at MIT Media Lab, a Fellow at HHI and a Senior Research Associate at ODI, as well as a PhD candidate (ABD) at UC Berkeley, writing his dissertation on Big Data and demographic research.

Emmanuel is the author of UN Global Pulse's White Paper "Big Data for Development" (2012), the lead author of the 2013 and 2014 OECD Fragile States reports and a regular contributor on Big Data and development.

He previously worked for UNDP in New York (2006-09) and in Hanoi for the French Ministry of Finance as a technical assistant in public finance and official statistics (2000-04). He holds a BA in Political Science and an MA in Economic Demography from Sciences Po Paris, and an MA in International Affairs from Columbia University, where he was a Fulbright fellow.

He is also a political cartoonist for various publications and media outlet including Medium and Rue89 in France, and a member of The Cartoon Movement.

First part of interview

Second part of interview

Here is third part of my interview with him:

Anmol Rajpurohit: Q6. What are the "3 Cs of Big Data"? Why should we care about them?

Emmanuel LetouzéThe 3 Cs of Big Data is essentially a mnemotechnical framework that I developed to clarify and present my perspective on Big Data—what I think it is and what it is ‘about’. The 3Cs stand for Big Data ‘Crumbs’, Big Data ‘Capacities’ and Big Data ‘Community’; it fundamentally frames Big Data as an ecosystem, a complex system actually, not as data sources, sets or streams. And it is both in reference and opposition to the 3 Vs of Big Data.

big-data-3-vsI’ll get to the 3 Cs in a little bit, but let me start by saying a few words on the 3Vs and why I felt we should really stop using and mentioning them altogether. Anyone who’s read a bit about Big Data is probably familiar with the 3 Vs of Big Data—Volume, Velocity and Variety-and this Venn diagram with the three eponymous sets, and Big Data being (at) their intersection. Some other people have added other Vs, for Value, Veracity, Viscosity, etc.

I always had a problem with the 3 Vs. First, I don’t think the novelty of Big Data is primarily quantitative—greater, faster; I think at the core the initial change is primarily qualitative; it is the fact that the data we are talking about are passively emitted by people “as they interact with digital devices and services” as I write regularly. In another paper Patrick Vinck and Patrick Meier and I described them as “digital translations of human actions and interactions”. They are non-sampled data about people’s behaviors and beliefs, and whereas you may know you are producing them and that they are going to be analyzed you don’t produce them for the analytical purposes. And it matters because focusing on the quantitative aspect suggested that it was about having and using ‘more’ information, when it’s fundamentally different information. Of course as Kenn Cukier argues a video or very many photos of a running horse is about more data generating as a result of a qualitative shift. But I would argue that every tiny bit of Big Data is different than survey data. So the bottom line is that Big Data is not about size—it’s a really bad name—I've called it “a misnomer that clouds our thinking”.

Then, I also think that Big Data is not just data—no matter how big or different it is considered to be; this is why and where I distinguish Big Data as a field—an ecosystem—and big data as data—new kinds of data. Gary King at Harvard did a presentation called “Big Data is not about the data”; what he means is that it’s also and perhaps first and foremost ‘about’ the analytics, the tools and methods that are used to yield insights, turn the data into information, then perhaps knowledge.

And so my 2nd C of Big Data, for Capacities, is largely about that—the tools and methods, the hardware and software requirements and developments, and the human skills. There is this article by Kentaro Toyama that I quote often too, who talks about the importance of “intent and capacity”, about which I wrote a post about 4 years ago. The gist of this is the need to both consider and develop these capacities, without which these crumbs are irrelevant. But it’s not just about skills and chips; it’s also about how the whole question is framed. This is of course related to the concept of ‘Data Literacy’, and the need to become sophisticated users and commentators.

The 3rd C of community refers to the set of actors—both producers and users of these crumbs and capacities; it’s really the human element—potentially it’s the whole world. As I said everybody is a decision-maker and everybody is a producer and user of data to make decisions.

big-data-3-cs
And the resulting concentric circles with Community as the larger set is a complex ecosystem—with feedback loops between them. For example new tools and algorithms produce new kinds of data, which may in turn lead to the creation of new startups and capacity needs. The basic point is that Big Data is not big data; and that questions like “how can national statistical office use Big Data” don’t mean much—from my perspective; or rather they miss the point. The real important question is why and how an NSO (National Statistical Office) should engage with Big Data as an ecosystem, partner with some of its actors, become one of its actors, and help shape the future of this ecosystem, including its ethical, legal, technical and political frameworks.

We should care about them because changing the framing, the paradigm, from one were we focus narrowly on Big Data as data with everything else being pretty much constant to as systems approach changes fundamentally everything else—I think. I sometimes use the analogy of the industrial revolution; where you would have aristocrats and heads of governments wondering how they are going to use coal, and not realizing what was happening outside of their windows.