new Topics: Coronavirus | AI | Data Science | Deep Learning | Machine Learning | Python | R | Statistics

KDnuggets Home » News » 2020 » Mar » Opinions » Scaling Your Data Strategy ( 20:n11 )

Scaling Your Data Strategy


This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.



By Javier Bosch, Senior Data Scientist at SSENSE

 

A guide for enterprises with data-related growing pains

 

Companies growing at a fast pace enjoy two unique advantages simultaneously. They are able to utilize small but mighty teams with a can-do attitude and deliver product features quickly to the market. At the same time, in order to maintain rapid expansion, they have a tangible incentive to pay just as much attention to setting up the fundamental practices for sustainable growth. This means managing digital transformation to take advantage of the benefits of new technology, and managing increasing simultaneous and interconnected projects. Not to mention that at the intersection of all these projects and processes, there is often a lot of data zipping around! As data grows exponentially, this unique position can be leveraged to craft a robust, long term data strategy.

At SSENSE, we attempt to address large-scale problems with data-driven solutions. In this context, business relationships can quickly become complex, and identifying patterns and behaviors around your data can become incrementally challenging. If you are regularly starting new projects that are deeply intertwined with existing features and processes at your organization, you are no longer able to build isolated, one-off built-from-scratch solutions. Your operation has grown to the point that it pays off to have a corporate data strategy. In this article, I will present my vision for a cohesive data strategy based on my prior professional experiences. While the purpose of this article is not to outline our exact data strategy at SSENSE, a lot of the main principles draw inspiration from it.

 

Data Strategy

 
No matter where you are in your data-driven journey, having a data strategy helps unlock the power of data and allows your organization to treat data as a critical asset. A data strategy is a plan designed to improve the methods, practices, and processes of data used across the organization, and to ensure that data is being used in a sustainable and reproducible way. Because data is generated and used by diverse business units with different practices and responsibilities, having a committee to oversee a data strategy across the organization becomes central to business success. Not having a data strategy is okay depending on where you are in your journey, but it becomes more likely that different departments will solve data issues on their own, resulting in wasted resources, units operating in silos, and a growing lack of cohesiveness in the organization. I managed to encapsulate that grim possibility in one sentence, so let’s focus on positive and constructive ideas henceforth.

 

Components of a Good Data Strategy

 
A robust data strategy should generally cover the following core topics:

 

Semantics — Identify data and define its meaning

 
This involves understanding all the core entities of the organization’s data such as: customers, locations, product, transactions, and their relationships. In the early stages of an operation, it would have been okay to represent these relationships in a relational database schema, but as the various data types, processes, and storage formats evolve; it is important to have a managed data catalog with definitions, meanings, and relationships to link disparate systems together.

Newer methods of combining data under a single interface such as a data lake, help to avoid a lot of the problems and complexity of managing and using data in organizations. Even so, consolidating the terminology, metadata, and relationships under one catalog makes accessing and using data much simpler. For each field of data, such a catalog might include definitions, sources, locations, domains, use-cases, stakeholders, etc.

 

Governance — Establish and communicate policies and mechanisms for proper data usage

 
Often the idea of data governance seems restricted to users and the analytics environment, but in reality, we are using and generating data in every operation. Once data is decoupled from the application that created it, the organization should define rules of and details for the data so that all stakeholders can understand how to make use of it.

A strong governance model outlines security details, access rights, high-level transformation logic, naming conventions, and rules for how the data should be used. In a data-driven organization, a strong governance model should not be thought of as an overwhelming barrier to entry for users, or as a means to limit access to data. Rather, it should empower all members of the organization to use data responsibly and effectively.

 

Storage and Provisioning — Persisting data in a structure that is accessible and intuitive

 
Companies are constantly in a state of digital transformation. We should consider this the norm as we seek to take advantage of the benefits of new technology, and we cannot ignore that we are generating more and more data in different systems across the organization. Data storage is one of the basic capabilities of technology stacks. Identifying the right data store for an application or business process is important to ensure that the storage technology is used correctly for its purpose.

With a proper data catalog, as mentioned above, the organization is able to make sure that there are practical ways of storing data for business applications, while making it easily accessible and shareable to all interested parties.

The exact choice of the type of data storage, whether it be SQL, graph databases, warehouses, etc., should be made on a case-by-case basis, to best fit the needs of the data being stored and its stakeholders.

A good data strategy for storage ensures that all data is being stored efficiently, in a manner that is tailored for its use-case, while laying the foundations for a centrally managed data sharing process.

 

Process — Move and combine data in disparate systems to provide a unified and consistent access point

 
For data and business analysts, raw data generated from applications is a gold mine of knowledge. If the business impact cannot be identified today, it likely makes sense to store and transform the data so that it can be used later on. Processing data is a vital component of a data strategy as it turns raw data into a finished good. As such, it turns data generated from business applications into an asset that can be used for data-driven decision making.

If for every new project, engineers and analysts needed to process raw data from many sources, this would be an immense waste of time and energy! As importantly, an organization would waste precious human resources if it would expect its developers to spend enormous amounts of time building logic to match and link entities across many data sources on an ad-hoc basis.

At some point, if you are not preparing ahead of time, fast growing companies will meet this harsh realization and their performance will begin to plateau. Of course, you can mask this behavior by bolstering your technology talent, but it does not erase the problem.

Implementing a cohesive and centralized strategy for processing, cleaning, combining, and transforming analytics data from many disparate systems enables the organization to make proper use of data and maintain its agility. In fact, this part of the data strategy might be most critical as it enables end-users to make quicker use of data.

 

A Skeleton for a Data Committee

 
To be successful at using data effectively, it is important to gather the right stakeholders so that the organization can generate the most value when driving the data strategy. Depending on your current capabilities and context, this committee should represent members from a combination of the following teams:

  1. Tech Direction: Oversees wider architecture implications and changes in technological innovation.
  2. Software Engineering: Implements and assesses the feasibility of new products and/or features.
  3. Product: Manages the scope and complexity of software products and their trade-offs between quick and long term wins. Identifies how to leverage data to meet their business objectives.
  4. Data Operations: Governors of data with a deep understanding of the organisation’s data, responsible for reporting, analytics, and data standards.
  5. Data Science: Defines how to use the data to develop models and generate insights.
  6. Data Engineering: Builds pipelines that extract, collect, and streamline data from all data generating sources, turning it into data the business can use.
  7. Legal: Oversees privacy, ethical concerns, and legal implications of data use.
  8. Executive: Define the high-level strategy of how data will help the company meet business goals.

If you didn’t have a data committee before, these concerns were likely being handled by smaller units of software or data leaders. This may work for smaller and simpler operations but it does not scale well. Depending on where you are in your growth, your committee can be much smaller or much bigger. It should, however, represent or cover the concerns of the above teams and their responsibilities.

 

Conclusion

 
Like implementing new business strategies or technical applications, implementing a data strategy is an evolving process. Understanding the need for a data strategy and a committee to oversee its implementation is a good start. Members of the organization start to think about data as a strategic asset that enables better understanding of their business, and how they can use it to drive value.

Having a clear vision of where you want to be, enables members of the data committee to leverage their unique skills and experience in designing a road map that is right for the organization.

The process of identifying all data entities, data sources, definitions, and relationships should be the first point of entry. The next step is defining the governance standards to use data, and a system to link all data sources together. Each process in the data strategy should have key stakeholders to drive that part of the process. For example, a member from the data reporting and operations team is most likely suited to drive the data governance standards. Taking advantage of the expertise of your data committee ensures that you are able to deliver and implement a data strategy quickly and effectively.

 

References

 

  1. How to Create a Successful Data Strategy — Stephanie Shen
  2. The 5 Essential Components of a Data Strategy — SAS
  3. How to Build a Data Strategy — Lotame
  4. Defining a Data Strategy: An Essential Component of your Digital Transformation Journey — DXC Technology

Editorial reviews Deanna ChowLiela Touré, & Prateek Sanyal.

Want to work with us? Click here to see all open positions at SSENSE!

 
Bio: Javier Bosch is a Senior Data Scientist at SSENSE.

Original. Reposted with permission.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy