Interview: Emmanuel Letouzé, Data-Pop Alliance on the Role of Big Data in Economic Development

We discuss the emerging Big Data ecosystem, its key players, and the severe consequences of inadequate statistical capabilities across many African nations.

AR: Q5.Can you explain the term "Statistical Tragedy of Africa and other emerging economies"?

EL: It’s indeed a phrase I talk about quite a bit; it was coined I think by Shanta Deverajan from the World Bank in a blog post from 2011. It is a reference to the term “growth africa-datatragedy” used in the title of an influential paper published in 1997 that described what happened to Africa until the mid to late 1990s, with essentially no or negative per capita growth. What Shanta Deverajan pointed out is the lack of reliable statistics—official statistics—about Africa, and more specifically its economies and populations. As Claire Melamed from ODI and one of our co-directors wrote more recently, it’s not just or so much that there are no statistics or ‘development data’, but that “most of what we think of as facts, are actually estimates”—and they are often found out in retrospect to be pretty inaccurate.

The international statistical system is a complex animal—various UN agencies are lawa-sausage-makingresponsible for collecting, computing and providing statistics from various sources, including national statistical systems; sometimes the same indicator will differ between what the UN system provides and what countries provide; sometimes it will be the result of some computations, etc. When I started working at the UNDP in 2006 one of my first tasks was to conduct an assessment and overview of how the MDG (Millennium Development Goals) indicators were produced; the common analogy with the sausage-making and policy-making processes was pretty telling—you don’t want to know how it’s made.

It applies to most countries but especially so in poor countries, of which there are quite a few in Africa, for obvious reasons—doing a survey is expensive, requires significant technical capacities; and it’s hard to conduct a census where there is a civil war going on. The young staffers who are well-trained and may join their national stats office will most likely soon be offered a better paying job with the UN, an NGO or a private company.

Hal Varian famously said years ago that the next sexiest job would be statistician—well it hasn't really happened, it’s data scientist; and quite a few statisticians are retraining and re-branding themselves as data scientists and there are no jobs for them in national stats office—yet.

This isn't a new issue, though; for decades and centuries there were only very partial economic and demographic statistics; the historical demography literature is full of great papers that have discussed and devised ways to go around lack of statistics—for instance in a lot of the literature on European development in the 16th and 17th centuries, population size over short periods of time was considered constant, for simplicity, because there was no data and because demographic growth was very slow, near zero. It changed in Europe during the industrial revolution, and then in the developing world; so estimating population size became both more difficult and more needed, because a small change in assumptions about birth and /or death rates made a big difference over a relatively short time span. Ron Lee at UC Berkeley said that it was like “trying to hit a moving target”, almost literally. As a result though there are pretty good methods—known as indirect methods—to assess population composition and size that have been developed. What is new is the hope that perhaps ‘Big Data’ could help fix the tragedy—in all or in part. And indeed there is very promising research being done in that space and I think it will become a very hot topic.

platos-caveBut let me add two caveats once we have described the obvious. One is that most statistics give a misleading picture of reality—it shrinks the human experience into a number, such as GDP per capita; ad we have to be reminded of that. I often use Plato’s Allegory of the cave in my talks; what we ‘know’ comes from statistics that are often by definition misleading reflections of humanity, like the shadows in the cave.

Another question is whether and how not having good statistics really matters. Of course there is a strong correlation between a country’s poverty and the quality of its socioeconomic data. But is this causal? If so, in which direction? Are we suggesting that Niger would become Norway if it had Norway’s statistical apparatus, and vice-versa? Of course it’s more complicated and complex than that and no one is really saying this, but there is a bit of this undertone, as I alluded to before, that with better data we would have better policies and better outcomes, almost mechanistically. I think Norway could go on and thrive for decades without collecting any data at all. At the same time I think that if Niger had a really good statistical system it would make big progress; but in great part because of what it takes to build such a system, as much as if not more because of the policy impact of having good data.

Third part of the interview