The Next Big Inflection in Big Data: Automated Insights

To keep up with big data and improve our use of information, we need insightful applications that will quickly and inexpensively extract correlations while associating insights with actions.

We are working to improve our ability to analyze data, but face a shortage of data professionals.

In order to collect and analyze more data, without giving up on reports, we started employing more broadly the automated information-extraction approaches offered by machine learning and other AI-based data analysis techniques. However, these approaches required the use of a new type of specialized personnel—the data scientists. And even though we are seeing a surge in the number of data scientists in the workforce, we need more, and it is unlikely that we will ever be able to produce as many data scientists as we will need, given the data we are generating. McKinsey has estimated that by 2018, the U.S. will face a shortage of people (a roughly 140K-190K deficit) who possess the deep analytical skills required to extract insights from collected data. We will also be short roughly 1.5 million managers who possess the quantitative skills necessary to make important business decisions based on big data analyses produced by data scientists.

Machine learning has improved our ability to find correlations in data, even as time to decision is decreasing and data velocity is increasing.

Business intelligence as a field has been around for almost 40 years. Statistical analysis and machine learning techniques have been used for even longer than that. During this period, we have been improving our ability to identify correlations in data sets, even as the time to making a decision is decreasing and the data velocity is increasing. For example, corporate CFOs may have a month to create a financial forecast, whereas an automated online advertising platform has only 10 milliseconds to decide which digital ad to show to a particular user (see Figure 3). Also, while a CFO may be able to arrive at a decision referencing only a few gigabytes of data, the online advertising system has to work with terabytes of data, most of which is generated in near-real time.

average time-to-decision across various industries
Figure 3. Graphic showing average time-to-decision across various industries. Image courtesy of Evangelos Simoudis.

In some application areas, simply identifying correlations among data sets is sufficient for decision-making. And in a few of these areas that are high value with high return on investment, it may always be necessary and justified to use data scientists and other specialized personnel to extract information from a body of available data. Computer security threat detection and credit card fraud detection are two such areas. In these areas, the time to make a decision is very short, and the cost of making the “wrong” decision (by being overly conservative), at least initially, may not be extremely high. The cost of flagging a transaction as fraudulent or a behavior as a security intrusion is also low (i.e., an inconvenience for the cardholder in the first case and some network forensics for a systems administrator in the second). But, the cost of failing to detect anomalies in an established pattern of behavior can be much higher.

To keep up with big data and improve our use of information, we need applications that will quickly and inexpensively extract correlations while associating insights with actions.

Given the expected shortage of data scientists and business users with strong quantitative skills, and our desire to continually exploit the high volume of data that is collected and managed, we need to become better at developing analytic applications that can generate insights and associated actions. Such applications, which I call insightful applications, go beyond the extraction of correlations from data.

We have made great progress in terms of data comprehension. We have decreased the cost of managing big data while improving our ability to analyze and extract the key information. But, the growth of big data is so massive that we will not be able to keep up through faster or more flexible queries and report-writes alone. We need to be able to create actionable insights inexpensively and quickly, particularly through the use of insightful applications. I will explore this topic more fully in the next post.

Bio: Evangelos Simoudis is a seasoned venture investor and senior advisor to global corporations. His investing career started 15 years ago at Apax Partners and continued with Trident Capital. Today Evangelos invests in early and growth stage companies focusing on the enterprise in the areas of data and analytics, SaaS applications, and mobility. A recognized thought leader on corporate innovation, big data, cloud computing, and digital marketing platforms he is a frequent speaker and contributor in these topics.

Originally published at O’