Interview: Arijit Sengupta, CEO, BeyondCore on Advanced Analytics and Big Data

We discuss traditional analytics vs. modern analytics, avoiding over-simplification, human-technology interaction for Big Data, challenges in democratizing analytics and more.

Arijit SenguptaArijit Sengupta is CEO of BeyondCore and Chair of the SRII Big Data and Advanced Analytics group. Arijit has guest lectured at Stanford; spoken at conferences in a dozen countries; and was written about in The World Is Flat 3.0, New York Times, San Jose Mercury News, Harvard Business Review and The Economist. Arijit held leadership positions at several industry associations and previously worked at Oracle and Microsoft. He has been granted nine patents in the domains of analytics and privacy. Arijit holds an MBA with Distinction from the Harvard Business School and Bachelor degrees with Distinction in Computer Science and Economics from Stanford University.

Arijit recently delivered a talk at Big Data Innovation Summit 2014 held in Santa Clara on “Advanced Analytics for All”.

Here is my interview with him:

Anmol Rajpurohit: Q1. How do you differentiate traditional analytics from modern analytics? Can you please explain the "Advanced Analytics for All (A3)" approach?

Arijit Sengupta: In traditional analytics, people need significant training and knowledge in analytics to do analytics right. In BeyondCore’s BeyondCoreAdvanced Analytics for All (A3) approach, the software itself guides the user so that most people without training in Statistics or Computer Science can do analytics themselves. Overall our A3 solution hides the complexity of the analytics / statistics methodology and process so that users can focus on overlaying their human intuition and domain knowledge on the analysis. Areas where A3 may assist users include:

  1. Data setup: A3 solutions introspect the data to figure out the metadata. Analysts don't have to manually specify the structure of the data.
  2. Data cleansing: A3 solutions can evaluate the extent to which data is unclean and still present statistically sound insights despite the data quality issues. They also point users to specific opportunities for improving data quality so that more statistically sound insights may be presented.
  3. Hypothesis generation: A3 solutions automatically generate key hypotheses / questions based on the broader analytical goals of the user. Such solutions guide the user to useful questions that they otherwise might not have even thought to ask. Think of it as the solution automatically looks at millions of graphs and selects the most useful ones given the broader goal of the analysis.
  4. Analytical processes: A3 solutions conduct multiple statistical tests to confirm the validity of the analysis. They automatically ensure that users are not misled by statistically unsound patterns.
  5. Graph interpretation: A3 solutions point out the key insights in each graph and compare different graphs to point out key insights across graphs. They automatically leverage best practices to ensure users are not misled by visual outliers and visually exciting patterns as opposed to statistically-sound insights.
  6. Presentation generation: A3 solutions simplify the creation of slideshows and dashboards. BeyondCore’s two-minute animated briefing even automates the presentation and walks users though automatically generated slides as an analyst would.
  7. Iterative improvement: Every analyst knows that each analysis involves iteration - you learn things from the initial analysis, adjust the focus of the analysis, and reiterate. A3 solutions guide users to how they can improve their analysis and make it easy for them to make changes in the analysis settings and reiterate.
BeyondCore Analytics AR: Q2. Einstein mentioned "Make things as simple as possible, but not simpler." Given that over-simplification can lead to false conclusions, how do you walk the thin line between making things simpler and avoiding over-simplification, with respect to Big Data solutions?

AS: The main goal is to guide the user but not hide things from the user. For example, BeyondCore grays out parts of the graph it knows are not ‘interesting’ but it doesn’t completely hide those parts of the graph. This way, the user can still overrule the software and see the full graph if they so choose. Similarly, while BeyondCore suggests specific graphs and drill-downs, users can still quickly see whatever graph they choose. Users can also shift to Statistician Mode and see all the complexity they choose. Here context is crucial: BeyondCore makes things simple when appropriate and exposes subtleties when desired.

AR: Q3. In the world of Big Data there are numerous variables, and thus, a tremendous number of combination of variables to be charted out on multi-dimensional graphs. As it gets humanly impossible to select the right variable combinations for insightful graphs, can the users blindly trust technology to give the right answers? And that too, in the context of users' business problems, competitive landscape and market research? In this human-technology interaction, what role do you expect the users to play?

Human Technology InteractionAS: Users are uniquely positioned to overlay domain knowledge and human intuition on top of an analysis. However, as noted, they are incapable of manually evaluating millions of variable combinations.

Humans are also not very good at following best practices, such as adjusting for confounding effects of other variables, conducting statistical tests or remembering to present graphs so they don’t overstate the significance of patterns in the data.

What differentiates BeyondCore technology is that it takes care of analytical complexity and enforces best practices, while making the automated analysis results completely comprehensible to users. This way, users can focus on the tasks they are uniquely qualified to perform – namely interpreting patterns.

Moreover, we should never ask users to blindly trust technology. It is very hard to evaluate millions of variable combinations to find a specific pattern. However, once a specific pattern is found, it is extremely easy to confirm the pattern manually. For example, what Google search does is difficult to fully understand, but we believe it works, not because we ‘trust’ it, but because when we click on Google search results, they tend to be accurate. Even if one out of the ten links is not relevant to what we were looking for, we quickly skip over that one link because our human domain knowledge tells us to skip it.

Automated analysis cannot be perfect, but it can be far faster and actually more accurate than manual analysis. In manual analysis, we risk missing useful patterns that we never thought to check for (false negatives), and risk wasting time on apparently interesting patterns that we did not conduct the appropriate statistical tests on (false positives). In automated analysis, we may have some cases where the pattern is obvious and thus not useful. But, we don’t have as many false positives and negatives as in the case of manual analysis.

AR: Q4. Ideally, users of data analytics should need just a very basic understanding of statistics or computer science to analyze the data. But, in terms of where we are today, how much (if any) knowledge of statistics or computer science is required for Business Managers to be effective at data-driven decisions? When do you think Technology will reach the state that users with statistical (or computer science) understanding will have no leverage over those who do not have such understanding?

AS: With BeyondCore, data analytics is like driving a car. Most of the time most users do not need expert help to drive the car. However, every once in a while, when the engine light comes on for example, you have to involve experts. For 80% of use cases, BeyondCore makes a typical business user nearly as effective as a Data Scientist. The experts are then brought in to focus on the truly difficult cases where they can deliver the greatest value.

AR: Q5. What are the most critical challenges involved in the democratization of data analytics within an organization?

Democratization of AnalyticsAS:

Cultural change is typically the most difficult part, and that is why it is crucial that senior leaders personally take responsibility for transforming their organizations.

Last week I had a two-hour meeting with the President of a Fortune 500 firm and his direct reports. They are personally taking the lead in democratizing data analytics. The technology already exists, what is necessary is true leadership.

From a technological standpoint, what crucially helped BeyondCore democratize analytics was our core principles of simplicity and speed. Sears confirmed “In less than 5 minutes total, we had been set up on a cloud-based server, were able to load up our uniquely structured data, analyze it with a single click, and see the results of the automated analysis.” (See for details). The key here was that it took a single click and just 5 minutes total to analyze all of their data. When analytics takes a long time and requires a significant amount of skills, it can never become ubiquitous. Democratization happens when analytics is as easy as using Google, but does not require the user to know the right questions to ask.

AR: Q6. In your extensive research over past 10 years at BeyondCORE, what were some of your memorable "aha" experiences?

AS: The ‘aha’ experiences that truly matter are when users look at their data for the first time using BeyondCore. Typically, at some point during their first analysis project, they take over the laptop themselves and start diving into the patterns on their own. That is what we call the ‘iPad moment.’ You can spend six months intellectually debating the superiority of an iPad, or you can spend six minutes with an iPad and experience its power and simplicity. Those customer ‘aha’ moments are the foundation of BeyondCore. 20 of the Fortune 100, as well as companies as small as a three-person trading firm, used BeyondCore in the very first year of our production release and had these ‘aha’ moments.

AR: Q7. What do you like to do when you are not working?

AS: Dancing, though I don’t get enough time to dance these days.