Big Data: Content and Technology

A discussion of using Big Data to provide insight into the big economic questions, and the big expectations that come along.

By Gio Wiederhold, Stanford.

The processing of voluminous data, now primarily found on the Internet, has been making rapid strides. Relationships among diverse sources are routinely established, and no longer requires experts to embody their knowledge into formalized and awkward schemas. Individual entries are linked with a fair amount of confidence using entity resolution technologies. Those data have varied provenance, and can produce exciting results [deSaRRSWWZ:16].

Big Data

The particular issue I am addressing here is the use of Big Data by folk engaged in analyzing data relevant to national economics and then giving advice to private and public agencies on what should be done to help national and international economies. Having `Big Data’ raises high expectations [JagadishEA:14]. The objectives are wonderful: help our economies grow, decrease unemployment, and spread comfort and happiness worldwide.

Imbalanced data

Still, the results of Big Data technology depend on the provenance of the data. And the data available to our economists are woefully imbalanced. Specifically, data needed to measure the processes relevant to high technology enterprises are sparse. That matters, because our high technology industry is a major driver of the current economy. I am concerned about how this imbalance leads to poor advice in governmental decision-making.

Economists have depended mainly on financial data to measure the economy. Corporate production and cost data are aggregated for their think tanks [Brown:09]. Such data are reported down to the pennies by accountants and required to be presented to the world in annual reports. Much of the financial assets of those high-technology multinational corporations are held outside of the U.S. in taxhavens. However, for high-tech enterprises operating globally, these `booked’ values tend to be a fraction, about 20% on the average, of the market value that investors assign to the corporations. Economists will also use governmental sources, as income data from tax revenues. However, because taxation is imbalanced those data mislead as well. Investors have insights that are broader.

The economy of the 20th century depended on much labor and substantial financial capital. Building aircraft, automobiles, as well as the steel mills and machine shops that supplied them were tangible evidence of economic prowess. These industries were associated with known locations, and their products were costly to ship. Geography was an important factor.

Even studies that purport to analyze innovation mislead. A recent study, cited in Science, intended to provide guidance to U.S. policies, was based on patent data [NagerHEA:16] . Patents are the means for established industries to protect themselves. Ongoing innovation relies on trade secrets [Wiederhold 13, Chap.3 ]. It is no surprise that this study is interpreted to show that established industries are very innovative, that women and Asians contribute little, and not to “think of Bill Gates” as an example [Malakoff:16].

The world has changed

The post-industrial economy is based on intellectual capital. The Apples, Microsofts, Googles and the many smaller, hipper players that create an ever larger fraction of the goods that people purchase are not strapped for financial capital. Furthermore, the GE’s, Intel’s and similar enterprises that do require costly factories have moved much of labor-intensive production of their tangible products overseas. The critical intangibles embedded in chips, phones, computers, are transmitted to production facilities from far away. Much research, development, testing, and prototyping, and the equally important market research and promotion activities remain in the US, complemented with laboratories in the EU and Asia.

Intangible products can be copied at negligible costs and shipped freely worldwide over the Internet. Such transfers are not obvious in the Big Data being mined. Containerized shipping has similarly reduced the costs of distributing the high-technology tangible products. It costs only about $0.50 each to ship a pallet of iPads anywhere in the world. Computerized logistics minimizes inventory investments. On-line payment systems allow revenues from world-wide sales to be collected anywhere, preferably in locations that don’t insist on excessive reporting to their government agencies.

An evidence for the mismatch is the difference of the valuations companies show on their books – based on financial information, and what investors consider the value of the company to be – the market capitalization (the share price x the number of shares on the market). For a traditional enterprise, say a railroad, the two assessments are close. For a high-technology company, the additional market value due to its intellectual capital is typically 4 times the book value. Check it, but subtract the excess cash held in taxhavens first!

Data missing now

To model and give advice for modern enterprises economists need data about the resources and the flow of intellectual capital: the people that create and exploit intellectual property (IP), and the IP itself. Those are the factors that drive modern industry.

If data about the intellectual capital that drives modern enterprises is so important, why don’t the economists that give advice go looking for it? The cycle of data availability and demand is stuck.

Little is being recorded in accessible form by industry, because reporting regulations ignore intellectual property and employee capabilities.

Our leading economists have grown up and been educated in a time where financial capital and cheap labor was the crucial contributor to growth [Nasar:11].

Big DataThe effect is that economic analyses cannot measure the impact of the intellectual capital, the experts and IP, the factors that drive modern industry. Ignoring its contribution in decision-making leads to selection bias [KobieluZ:16]. The effect is that the needed infrastructure including education, training, and levels of immigration, as well as protection against external threats, is short-changed, since it there is no documentable path of such investments to the outputs of modern industry. There are many anecdotes, but these cannot be placed into a broad coherent economic model.

All inputs to the modern economy need intellectual capital. But the prominent economists, those that have risen to the level of providing advice to governments, continue to focus on financial capital for their metrics and tools [FurmanO:15]. They struggle to explain the rise in income inequality while only using goodwill, booked when companies are purchased for more than their book value, which is certainly a miserable surrogate for intellectual capital. Still, without including goodwill the return for the top companies is over 90% now, while when goodwill is included the returns for the best companies are less than 30% – still great. And those great companies, earning super-normal returns are the ones that rely on intellectual capital. Other commentators missed that point while reviewing this and the work of many economists. They concluded, that since in the past those best performing companies obtained returns on capital of about 25%, the shift is a sign of growing unfair income distribution [Ip:16].

It is clear that by focusing on financial capital a policy as keeping interest rates low helps primarily the traditional segments of industry, but does very little for high-technology enterprises. Those policy makers fail to realize that conclusions they derive from the historical financial corporate data are ignored by smart investors. Investors in high-technology businesses value enterprise according to future expectations, not by past and current costs. They count on future income due to the smart people and the intellectual property (IP) they generate and exploit to make attractive products [Wiederhold:13]. Predicting the future remains risky, but is critical. Avoiding the collection data relevant to modern industries because of risks and imprecision is not acceptable.