Messy Data is Beautiful
Once these types of data have been cleaned, they do more than show organized data sets. They reveal unlimited possibilities, and AI analytics can reveal these possibilities faster and more efficiently than ever before.
Data scientists have always been expected to curate data into ‘aha’ moments and tell stories that can reach a wider business audience. But what is the cost of this curation?
The real signal is in the noise
Tidy data doesn’t help that much.
Every aggregation and pivot performed on datasets reduces the total amount of information available to analyze. That clever NLP topic mining on free text fields was no doubt very useful, but the raw text is more interesting. Perhaps those ‘meaningless’ raw sensor logs are just that, or not.
Just a few examples of messy data we’ve seen:
- Spelling mistakes on loan applications
- Error reports from maintenance crews
- Oscillating pressure changes in wells
- Proximity of launderettes to grocery stores
- Broken features on app causing customer churn
Once these types of data have been cleaned, they do more than show organized data sets. They reveal unlimited possibilities, and AI analytics can reveal these possibilities faster and more efficiently than ever before (see how in banking here).
Let’s say there’s sensor data that’s difficult to understand. Typically, a sensor array will generate a lot of data, usually unreadable.
Following a detailed investigation, the analytics team has noticed that one of the sensors has a sustained high reading and the high variability seems to predict one type of mechanical fault. As a result, reports now occur on the 3-hour rolling average for this sensor and the 1-hour rolling variance.
These metrics are easy to explain, and everyone from the senior management team to the repair crews understands what they’re measuring. But what was the cost of curating data like this?
While tidy data delivers a nice, explainable story, it does so at the cost of ruling out hypotheses that never may have been considered. And that's exactly where the actual underlying issue may lie.
Instead, a powerful AI-powered analytics platform can apply a host of functions to this and every other sensor reading, exponentially moving averages, roots, and FFTs. Then, an analyst can try a range of threshold values, comparing this to context data sets such as weather or more bespoke domain knowledge.
Capturing unique insights and revealing hidden patterns buried deep in messy data allows us to spot emerging trends, and identify new behaviors and customer needs.