Infinite Data Overlap Detection Arrives to Speed Business Insights

Infinite Data Overlap Detection(IDOD) is a new, Spark-based technology that empowers non-technical business users to automatically discover data patterns and blendany data type for any set of values from multiple sources – both inside and outside the enterprise.

By Tim Howes, CTO of ClearStory Data.

Everyday business users need fast-cycle data analysis to quickly deliver insights for marketing campaigns, sales strategies, operations, and R&D programs. But the problem is most companies don’t have enough IT and data experts at the ready to blend and accurately analyze the huge volume and variety of data fast enough to meet the growing business need for insights.

In primary research conducted last October, nearly 70 percent of companies reported they need access to refreshed data insights either hourly or daily. Eighty-six percent struggle with this challenge on a regular basis where four or more data sources and file formats are involved for analysis. A majority (68 percent) reported they experienced “data blindness” at least once per week because they couldn’t spot “what’s happening now, and why” soon enough, which impacts their ability to make smart decisions and perform their jobs well.

This problem is exacerbated by the growing dearth of data experts. Through 2017, Gartner research predicts the number of business users doing data analysis will grow five times faster than the number of highly skilled data scientists — and most business users and analysts will access self-service tools to prepare data for analysis. (Source: Harvard Business Review, August 2015).

As a result, business professionals need to do the work themselves and be less reliant on technical data experts. This urgent business need is driving a strong demand for advanced software that automatically discovers, prepares and blends massive amounts of highly dimensional data that often require hundreds of terabytes of space for larger enterprises.

Enter Infinite Data Overlap Detection (IDOD), a new technology developed and just released for the first time for commercial use by ClearStory Data during July.

IDOD is a new, Spark-based technology that empowers non-technical business users to automatically discover data patterns and blend any data type for any set of values from multiple sources – both inside and outside the enterprise.

IDOD is smart software that essentially plays the role of a data-modeling advisor for business users. By discovering how disparate data sets overlap and can be combined, IDOD lets business users blend large data sets and discover insights much more quickly without the need for deep technical expertise. IDOD is a breakthrough technology for business users because it automates and speeds the preparation and blending of complex data in just minutes or hours compared to manual data modeling that can require several days or weeks.

Key features of ClearStory’s new IDOD technology include:

  • Smart Data Inference – IDOD automatically detects and infers the overlap of categorical values for all data types (e.g., geographic location, time series, currency, product names, etc.) based on hundreds, millions and even billions of unique values per attribute across all the source data being analyzed. IDOD primarily handles the structured, relational data sets used by most enterprises today.
  • Infinite Data Types – IDOD has no limits on how many unique custom data types, dimensions or values that can be recognized in each source for data inference and harmonization.
  • Highly Extensible for “Plug and Play” Data – IDOD is highly extensible, meaning new data types can be easily patterned and plugged into the technology. Its extensibility means a data blending system with IDOD can continually learn industry and customer-specific attributes that further accelerate discovering insights while simplifying the tedious, complex process of manual data modeling.
  • Data Harmonization Scores – Data harmonization scores, originally developed and introduced by ClearStory Data, measure how well the data within any two data sets can be combined. The values within any data type are used to determine the right way to automatically blend data sources together into a holistic, harmonized view. IDOD improves data harmonization by enabling a much more detailed data harmonization score between any two data sets regardless of the size or complexity of the data.


Use Cases for IDOD

Consumer packaged goods (CPG), retail, healthcare, manufacturing and logistics companies – and others in hyper-competitive industries driven by highly complex markets – can all use IDOD. This includes companies that use data services from Amazon, Microsoft, Google and IBM because IDOD can be integrated with most major data pipelines used by these enterprises.

One interesting use case for IDOD is to speed CPG makers’ sales and profitability analysis. Brand and production managers need to get a holistic view of how specific products, categories and franchises are performing so they can determine how to allocate marketing budgets, optimize inventory levels across various channels and track against competitors. It’s a difficult challenge that requires processing and understanding data on a daily and weekly basis across thousands of product SKUs and hundreds of categories and franchises.

Another vertical example is using IDOD to analyze subscriber churn for cable TV companies. By analyzing a vast amount of customer interaction data (e.g. account information, call durations, order and cancellation requests) together with data on programming content and demographics tied to device viewing (e.g. live set-top box vs. on-demand streaming vs. mobile), business users can better understand the key factors that contribute to subscriber churn and how to minimize that churn through actions like special discounts or package promotions.

As machine-learning algorithms evolve, IDOD will help provide even deeper visibility into how large, complex data can be blended and harmonized. As a result, business users will be able to make more refined insights and better decisions to improve marketing campaigns, optimize sales, improve operational inefficiencies, make better products, and reduce costs.