KDnuggets Home » News » 2014 » Jul » Opinions, Interviews, Reports » MIT CDOIQ Symposium: Where is the Big Data Boundary of Effectiveness? ( 14:n19 )

MIT CDOIQ Symposium: Where is the Big Data Boundary of Effectiveness?

My report on Day 1 of MIT CDOIQ symposium, why MIT seniors may be less smart than freshmen, 6 types of digitization of capital, the main role of Chief Data Officer, 7-Eleven Japan, and the Big Data Boundary of Effectiveness.

By Gregory Piatetsky, @kdnuggets, Jul 25, 2014.

This week I attended the MIT CDOIQ (Chief Data Officer and Information Quality) symposium. Parking was easy to find and the traffic seemed to be less than usual - many in Boston area were probably on a well-deserved summer vacation.

The symposim had an excellent program and was well organized by Richard Y. Wang. The audience dress was typical of business and management, with most people in suits and ties, ignoring the summer outside.

Jeff Kelly and Dave Vellante from SiliconANGLE brought TheCUBE to Cambridge, and were broadcasting and taking interviews, which you can see at MITCDOIQ 2014 playlist.

Check informative symposium tweets (including many from @kdnuggets) at #MITIQ hashtag.

Prof. Stuart Madnick from MIT Sloan School, one of the luminaries who opened the symposium, emphasized the importance of data interpretation. He gave an entertaining anecdote about a study (apocryphal?) done at MIT many years ago, which measured the IQ of a class of freshmen, and then 4 years later of a class of graduating seniors. The study found a small, but significant decline in IQ. (see Dave Vellante tweet /photo What happens in aggregate to MIT students' IQ over 4 yrs?)

What happened? Were there so many formulas crammed into freshmen heads that their IQ declined? Was there an special vacuum cleaner that sucked IQ from unsuspecting freshmen?

Madnick explanation was that the smartest freshmen graduated in less than 4 years (or perhaps dropped out). Another data science lesson - question the assumptions!

U. Georgia Prof. Richard Watson gave an excellent keynote on
The Digitization of Capital. He identified 6 forms of capital - Natural, Economic, Human, Organizational, Social, Symbolic and gave good examples of digitization of each type of capital.
  • The Digitization of Natural Capital (natural resources): geo-location; GoldCorp example - sharing proprietary geological data and have people all over the world do gold prospecting.
  • The Digitization of Economic Capital: sense the changing status of an asset. Eg Parking spaces, Gambling chips
  • The Digitization of Human Capital: digital engagement, eg Amazon Mechanical Turk, Competition platforms like Kaggle, codifying language, automating knowledge work, automatic translation
  • The Digitization of Organizational Capital: digitizing procedures and creating databases
  • The Digitization of Social Capital: electronic communication, eg Obama 2012 campaign
  • The Digitization of Symbolic Capital: brands are co-invented with consumers, others. Eg the success of Smart Car, and the disaster of #AskJPM JP Morgan tweetchat.

The key role of Chief Data Officer, according to Richard Watson, is
  • Link organization strategy to capital digitization;
  • Develop and execute the plan
  • Promote exploitation of digitized capital

Prof. Jeanne Ross from MIT has expanded on her article in HBR entitled You May Not Need Big Data After All.

Her point, partly obscured by the catchy title, is that companies need to learn how to use small data effectively before working with Big Data. She also said
everybody now has access to #BigData - what matters is using it smarter than the next guy.

She divided big opportunities for Big Data into
1. extraordinary insights
2. better decisions every day

7-Eleven Japan While there are companies that get extraordinary insights from Big Data like comScore or UPS, she argued that the biggest impact would be from the second opportunity, making better decisions every day. As an example, she used 7-Eleven stores in Japan, which are independently operated of the US stores.

There are now over 16,000 stores. Each store is very small, but has about 1,000 items, with items delivered daily and fresh items delivered up to 3 times a day. The stores use very analytical approach and analytics counselors to maximize their turnover and to decide what to order, and are among (or the) most profitable retail stores in Japan.

Other presentations in the afternoon discussed the role of Big Data is US Department of Defense, Navy, and the Army.

However, my feeling was that something important was missing from the discussion about Big Data great features.

I am as big advocate of Big Data benefits as anyone - after all KDnuggets was voted the best Big Data Twitter account.

Big Data Effectiveness Boundary However, I am also very concerned about people not understanding when data is effective and when it is not. Big Data is great in cases when there many similar examples and many small decisions that need to be automated, like 7-11 Japan or for increasing ad clicks. Big Data can also create entirely new platforms or products, like Google or Facebook.

However, when there are few similar examples, predictions based on data are not very reliable.

While Nate Silver was able to perfectly predict US 2012 presidential elections in every state (using results of many state polls), he was much less accurate in predicting Oscars or 2014 World Cup.

No #BigData would have predicted that Brazil would lose to Germany 1:7 in the World Cup semifinal. Steve Jobs did not rely on data for deciding to create iPhone. There is no data to help US Air Force reliably decide about next planes it wants to order - that is why we routinely hear about planes behind schedule, over budget, and not needed after all.

Even for more trivial cases, like predicting how a person will rate a Netflix movie, the best algorithms still have an error of about 0.86 stars out of 5 - see my HBR blog Big Data Hype (and Reality).

Data helps for making decisions for situations that are frequent enough and similar enough, but how can this be quantified ?

I would like to see more emphasis and discussion about the boundary where the Big Data is effective and where it is not.

What do you think?

See also
  • The SiliconANGLE report on the takeaway from Day 1 of MIT CDOIQ Symposium highlighted the redefinition of the CDO in the corporate structure, and the expected dissolution of the position of CIO in many organizations.