Exclusive Interview: Ajay Bhargava, TCS shares the Big Data Mantra: Harness Data and Harvest Value

We discuss how to tame Big Data through harnessing data and harvesting value, the top Big Data priorities in Insurance sector, short-term and long-term needs of Healthcare Analytics, and more.

Ajay BhargavaAjay Bhargava has more than 25 years of industry, research, strategic consulting, and teaching experience in the areas relating to databases, enterprise data management, data warehousing, business intelligence, advanced analytics, and Big Data. He has also contributed to SQL, ODBC and IDAPI database standards. Ajay built, and heads up the Global Analytics & Big Data practice for TCS Insurance & Healthcare customers.

He has frequently spoken at industry conferences, authored whitepapers, and has driven thought leadership in the Data industry. In addition, he has actively taught (Analytics, Database Design, and Data Mining etc.) at The University of Texas, Austin, and College of Engineering, Pune, and mentors high school entrepreneurs for global competitions.

Ajay holds an M.S. in Computer Science and M.S. in Aerospace Engineering from The University of Texas at Arlington. He obtained his B.Tech in Aeronautical Engineering from Indian Institute of Technology, Mumbai in 1984.

Here is first part of my interview with him:

Anmol Rajpurohit: Q1. What do you mean by "Harnessing Data" and "Harvesting Value" for leveraging Big Data?

Ajay Bhargava:

There are two fundamental aspects or dimensions in which I look at Big Data. The first is harnessing, which involves collection, administration and management of Big Data. The second is harvesting, the skills, techniques, and art of asking the right questions required to apply science to data in order to derive actionable and meaningful insight and value from it.

Big DataAt the most basic level, harnessing is the amassing of Big Data; it relates to how insurers manage Big Data and how they create an ecosystem that can not only create Big Data but sustain it as well. Years ago, harnessing data was much easier than it is today; however, benefits of using this data were more limited as well. Harnessing Big Data involves accessing a combination of internal and external sources of data, structured and unstructured data like social media as well as newer technology that provides access to data and the ability to analyze it.

Harvesting utilizes technology and algorithms that enable organizations to analyze and deliver actionable insights and derive real value from Big Data. Skills such as statistical analysis, data mining, econometrics, business analytics, and visualization techniques, are in high demand as they provide a solid foundation for deriving useful insights from the data. Universities have started trying to fill the supply demand gap by offering various graduate programs in business analytics to provide for the next generational skills needed to mine actionable insights.

Below is a table that compares and contrasts Harnessing and Harvesting dimensions of Big Data.
Harnessing versus Harvesting For more information, refer to my whitepaper on this subject here.

InsuranceAR: Q2. What are the top 5 opportunities and challenges posed by Big Data in the Insurance sector?

AB: Whether we look at property (e.g. personal home, commercial building) and casualty (e.g. personal car, fleet of trucks), or life insurers, or even reinsurers, some of the challenges (and hence opportunities) they face are quite common in nature. Here are some examples:
  1. Commercial Property insurers are sitting on a vast amount of internal data, sometimes referred to as “dark data”, which is not fully utilized for risk evaluation. As an example, evaluating risk by looking at commercial location of a customer, and then superimposing the risk with external weather risk, terrorist risk, geological risks, etc. can greatly enhance the ability of the insurer to not only price the risk more accurately, but also saves them from doing repetitive analysis for other prospects in the same location or vicinity.
  2. There is a fair bit of suspicious activities in insurance claims that is buried in the text part of claims, especially in disability claims. In the past, one of the challenges has been the lack of maturity of text analytics to really harvest value out of it. Today, insurers are using NLP techniques and using combination of structured and unstructured data to not only look for suspicious behaviors in these enormous volumes of claims, but are also evaluating, what risk variables could be identified to better price the risk of prospects and customers.
  3. Today’s insurance customers connect with their insurers through many channels (mobile, agent, online, email, call center, postal mail, social networks etc.). These interactions often occur at discrete events, either external, such as a hurricane, hailstorm, or events occurring in their own lives, such as moving into a new home, purchase of a car, birth of a child, getting kids to college, marriage, accidents, theft etc. Insurers often find themselves in a “siloed data” environment, where they do not have a 360 degree view of their customer or household. Connecting with these customers on a personal level, and responding to them in a unified manner is one of the biggest challenges that Big Data is trying to solve today.
  4. Reinsurers are looking at different ways to provide a plethora of services to their customers i.e. insurers. Predicting onset of diseases for health insurers for preventive care, discovering clusters of insurers based on external financial data, reducing the prospect to customer life cycle in determining a new life insurance policies are some example of opportunities in this space.
  5. In a call center, triaging the right claims to the right agent in a timely fashion, saves a considerable amount of money to the insurers, but also, increases customer service sentiment for its customers. This area is ripe for opportunities not only in insurance, but any industry that deploys call centers for servicing customers.

AR: Q3. How do you assess the current state of Healthcare Analytics? What would you term as short-term and long-term needs to derive the most value from Healthcare Analytics?

AB: In the last decade or so, a series of advances have taken place that puts the healthcare analytics at a very opportune moment to take off in this century. Some of these include:
  • The passing Affordable Care Act in the US, bringing in patient-centric care to the fore
  • Opening up of Electronic Health Records
  • The advancement in data platforms (e.g. Hadoop) and analytical algorithms to harness and harvest value from vast amounts of structured and unstructured data, as well as storage capabilities (e.g. cloud)
  • Advancement in genome sciences
  • For patient care, proliferation of commercial and personal sensors to monitor and sense data in real time from many devices (mobile, or otherwise)
  • Advancement in disease understanding and pharma industry, especially to tackle chronic diseases, to name a few.

As a result, there is huge impetus to break down data and analysis barriers Healthcare Analyticsacross payers (insurers), providers (doctors, hospitals), and pharmacy benefits management organizations. Hence, in the short term, lot of these organizations are leap frogging the traditional data warehousing environments to quickly get siloed data onto a big data platform, so that in the long term, analytics could be used to harvest real value for improving quality of care for patients at a much lower cost.

AR: Q4. How do you differentiate Descriptive, Predictive and Prescriptive Analytics?

AB: Descriptive Analytics (sometimes referred to as Business Intelligence) is lot more past looking, trying to figure out what has happened, or look at past trends across various dimensions. Usually, these phenomena are visualized via reports, scorecards, dashboards, etc. using simple visualization widgets, such as bar charts or histograms, pie charts, box and whisker plots, scatter plots, trend graphs etc. The goal is to summarize data and intrinsic relationships, especially between metrics (or facts) computed at the intersection of many attributes. An example of descriptive analytics would be a report (visualized with the help of a pie chart) that shows different types of auto accident injury claims, by severity, frequency, location, and time over the last year in the US state of Texas.

Predictive Analytics, tries to predict the likelihood of certain events to occur in the future, based on analyzing data from the past. Many a times, identifying what variables are key factors in a certain event occurring, is an important part of the analysis. Classification and prediction models, along with discovery of trends, patterns, and relationships fall under the purview of predictive analytics. A good example of predictive analytics is to predict the likelihood of a claim to be suspicious or not.

Prescriptive Analytics, just like predictive, is forward looking as well. But here, lot more simulations (what-if scenarios) and optimizations are modeled. In simulation, a lot of modeling experiments are performed to understand what might happen (effect) based on different choices of perturbations (causes). In optimization, we are looking to optimize the most efficient route to certain outcomes. The recommendations from analysis are lot more nudging to perform certain actions. In health insurance, for example, various simulations could be performed for the patient to show onset of the next stage of a chronic disease such as diabetes, could occur much quicker, if medication schedule and dosage is not adhered to, at varying degrees of non-adherence. Types of Analytics
All the three kind of analytics become lot more useful if actionable insights are derived and acted upon to achieve strategic objectives, and engrain this process as part of an analytics-driven culture.

Second part of interview.