By Lutz Finger, LinkedIn.
How does the latest “Star Wars” movie parallel with the data industry? Both have a force that awakens. However, while Rey, the tough scavenger in “Star Wars” (played well by Daisy Ridley), saw her force come to life in less then 30min, the data industry has been waiting for ‘that’ to happen, for half a decade. However, finally, we seem to be at the tipping point – at least if you believe the latest Gartner report. Working with data has shifted, according to the report, “from IT-led enterprise reporting to business-led self-service analytics.” In other words, business-focused analytics and data discovery are on the rise.
For a while, the common consensus was that the hardest part in data science would be to find the actionable insights. A data platform should empower the business users. It should offer businesses easy and agile ways of working with data directly, without the need to go through an IT or BI department. The reality, in many companies around the world, unfortunately, was far from that. Over the last few years, we have seen a lot of innovation happening. Some tools even offer the capability to just type their business questions and an algorithm will translate this into a data query. Let’s look at the areas that one will need for a data enabling platform.
An easy way to load data
Data Scientists often complain that they feel more like a “data janitor”. Most of their time is taken up by the process of extracting the data (e.g.: a website), transforming it (e.g.: clean it up) and loading it into a database that one can start working with. Especially in companies that do not have their natural foundation in data, this can be a daunting task. A data platform needs not only be able to connect to different data sources, but also to simplify the process for the ‘force’ to awaken, Joe Hellerstein, founder of the platform Trifacta, thoughtfully pointed out.
“If 80% of the work is data wrangling, then the biggest productivity gains can be found there. Business users need platforms that let them see their data, assess quality, and transform the data into shape for analysis.” (Click to Tweet) (@joe_hellerstein)
If your sales went down, you would want to know, right now, why that happened. And, if you have data that no one else has, you will want to play around with new product ideas.
Agility – a concept we know very well from the SW development space, has made it over into our data world. Stefan Groschupf, CEO and Founder at Datameer, pointed out that the ‘force’ only awakens under the following condition:
“For real business value, an analyst should be able to dig into their data to understand what’s in it and what new insights it can reveal. A good data platform needs to allow an easy exploration of data with the fastest time to insight without needing engineering skills or knowledge.” (Click to Tweet) (@StefanGroschupf)
Governance and metadata
The easier it is to explore data – the more people will do it. We saw this, in the mid 2000’s, with the onset of social media analytics. Platforms offered so-called insights with the ease of a mouse click. Suddenly, business folks created a plethora of new metrics – many of them highly useless, as I pointed out in my book; “Ask Measure Learn”.
But high quality data is unquestionably a prerequisite to sound decision making, and is the #1 most important criteria for any organization. Thus, in the last few years, data governance and data lineage have become the focal points of the industry. William Kan – Senior Product Manager at LinkedIn, who created a scalable way to define and manage metrics via a unified data platform at LinkedIn explains:
“People often associate the term governance with unnecessary overhead and processes. Our idea of governance is to put checks and balances in place with the goal to provide the best data quality but at the same time to be as low touch as possible via machine learning and automation.” (Click to Tweet) (@WillKanman)
Accounting is a notion that Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel and Thomas W. Oestreich did not mention in their reports. However, since Data processing comes at a cost, so no one should be surprised that there will soon be a need to ‘regulate’ the usage of data platforms. Not everyone should be able, with a few clicks, to bring the server down (or our budget, as the server now scales into the cloud). But hold on… didn’t’ we try, very hard, to make data accessible? Correct. Thus, this time, we should not make it more complex, but we should ask for higher accountability. As Gregory Piatetsky-Shapiro the well known data scientist, co-founder of KDD conferences said:
“The impact a more complex machine learning algorithm might not always drive the wanted insight. Organizations will need to balance what is feasible and what is useful.” (Click to Tweet)