Social Media and Machine Learning Transform Self-service Data Prep

Social media and machine learning concepts are transforming self-service data prep into a collaborative data marketplace.

By Jon Pilkington, Datawatch.

The rise of self-service analytics and data preparation technologies has put the power of data in the hands of individuals. But is this a step forward? Six months ago, the answer would have been:“It’s complicated.”

The fact that business users can access, cleanse, blend and prepare data to make better business decisions and create effective operational processes is a huge improvement. Rather than waiting for beleaguered IT to deliver specific data sets or having to spend the majority of their day trying to access and blend data into a spreadsheet, users can employ self-service tools to create meaningful reports in a matter of minutes, resulting in increased productivity and efficiency across the organization.

Data Preparation

However, giving individuals unfettered access to data creates additional challenges: the duplication of work and the proliferation of ungoverned, insecure data. Suppose a financial domain expert creates a dataset each month detailing revenue figures per department. But at the same time,the managers in each of those departments are creating similar reports for data analysis. These individuals are working in silos, creating datasets and storing them on local machines. This overlap of effort is inefficient. Additionally, storing data locally puts a strain on IT’s ability to enforce data governance, compliance and security guidelines.

Evolution of self-service analytics

All of this has precipitated the next evolution of self-service analytics. By applying the core concepts of social media and consumer applications to self-service data preparation, the downside of these solutions – duplication of data and effort – fades away. Centralized data management platforms fuse existing data preparation, data mining and self-service automation attributes with enhanced machine learning, governance and social media features. The result is a unified data marketplace that expedites, simplifies and improves data preparation best practices, increases productivity and expands data governance efforts.

Using a centralized data management platform with data socialization features creates an internal analytics community, allowing data scientists and business users to search for, share and reuse prepared data for true enterprise collaboration and agility. Also, storing data on this central platform improves IT’s ability to more easily manage that information, simplifying the complicated data governance process. Firms can harness the “tribal knowledge” of their business users, combine it with data and create best practices and shared resources that will drive better business decision-making and operational processes. Productivity is also enhanced as users can search cataloged data to quickly find the right information needed for new analysis projects.

New social media attributes extend insight into the available enterprise data, giving users the immediate access they have come to expect. Individuals can perform data quality scoring, follow key users and datasets and intelligently serve data notifications – similar to how you follow an individual on Twitter, rate a business on Yelp or share content on Facebook.

Socialization Of Data

Additionally, machine learning is an important aspect of self-service data preparation as it is tied directly into the data evaluation process. Smart recommendations are provided related to how colleagues are using the data as well as suggestions of relevant data models and other datasets to consider. With these automatic recommendations, users can better understand how data fits into a context to identify patterns of use and related assets for deeper insights. Individuals learn from each other, creating self-sufficient teams that can conduct detailed market evaluations, dashboards and reports for rapid analysis and improved efficiency.

Yet, data scientists should not fear that these next generation self-service analytic tools will replace them. Rather, these solutions are used by data scientists as they conduct their deep dives into data pools to find, extract, cleanse and blend data. Data scientists are empowered to become the data steward of their organization as they monitor the centralized platform while creating and certifying new datasets to improve overall data quality in the company.

Problem solved

With this latest evolution of self-service data preparation and analytics tools, the answer to the original question of whether it is good for all individuals to have the power of data is a resounding “yes.” Machine learning and data socialization is turning what had been a process fraught with security and governance concerns into a robust solution where business users and data scientists collaborate and leverage each other’s work to enhance efficiency and support better business decision making.

Bio: As chief product officer, Jon Pilkington brings more than two decades of business analytics experience to Datawatch, including 18 years in the business intelligence market. Previously, Jon spent 13 years at Cognos in a variety of executive roles, including VP of BI product management, VP of global solution architects and VP of North American field marketing. He holds a B.S. in Management Information Systems from Bryant University.