Data Mesh & Its Distributed Data Architecture
Going forward, data professionals have found a new way to address the scalability of sources through data mesh.
Photo by Ricardo Gomez Angel on Unsplash
The enterprise vision to respond faster and deliver superlative customer experience requires an overarching remodeling of data management. So far, technologies have resolved the issues in storing & processing big data. It has also attained the competency of putting big data into deep analytics. While we are at it, the global market size for advanced data management solutions is expected to touch USD 122.9 billion by 2025.
However, the increasing diversity in type and number of data sources continues to obstruct seamless data lifecycle. Till now, data management landscapes were capturing & streaming data into a centralized data lake. The lake would further process and cleanse the sets in a fabric solution. Going forward, data professionals have found a new way to address the scalability of sources through data mesh.
What is a Data Mesh?
A Data Mesh is a distributed architecture solution for the lifecycle management of analytical data. Based on decentralization, the Mesh eliminates the obstructions in data availability and accessibility. It empowers the users to capture and operationalize insights from multiple sources regardless of their location and type. Subsequently, it performs automated querying without having to transport it to a centralized data lake.
The distributed architecture of a mesh decentralizes the ownership of every business domain. This means every domain has control over the quality, privacy, freshness, accuracy and compliance of data for analytical and operational use cases.
Migrating from a Centralised Data Lake to a Distributed Mesh
As the number of data sources continues to grow, data lakes are unable to perform on-demand integration. With data mesh, dumping large volumes of data into lakes is a practice on the verge of extinction.
The new data management framework ensures collaborative participation from all nodes, each controlling a specific business unit. It does so by following the principle of data-as-a-product. This means every data set is treated as a digital product that consists of clean, complete and conclusive data sets. These can be delivered to anyone and anywhere on-demand. For a rapidly growing data management ecosystem, Mesh is an instrumental approach for delivering organizational data insights.
The decentralization of ownership reduces the dependency on engineers and scientists. Every business unit controls its own domain-specific data. However, every domain still depends upon centrally standardized policies for data modeling, security protocols and governance compliance.
Using Data Mesh and Fabric
Any discussion around data management is incomplete and irrelevant if it misses out on the fabric architecture. There’s a myth around the fact that data fabrics and mesh compete with each other. That’s untrue. Gartner has discussed both titles side-by-side and cleared the air.
A data fabric is a good old yet relevant architecture that drives continuous and optimal use of fabric in different industries. It automatically discovers and proposes a management architecture thereby streamlining the entire data lifecycle. It also assumes support for validating data objects and contextual references for reusing those objects. A Mesh does this differently by consuming current subject matter expertise and preparing solutions for data objects.
There’s a myth around the fact that data fabrics and mesh compete with each other. That’s untrue. In fact, fabrics could be instrumental in extracting optimal value from the Mesh architecture.
Implementing Data Mesh with an Entity-Based Data Fabric
Consider K2View’s entity-based data fabric architecture. It accommodates saving data for every business entity in an exclusive micro-DB and thus supports hundreds of thousands of these databases. Further fusing this concept of ‘business entity' and ‘data as a product’, their fabric supports the implementation of the data mesh design pattern. Here, the fabric creates an integration layer of connected data sets from multiple sources. This provides a holistic view of the landscape to the operational and analytical workloads.
The entity-based fabric standardized the semantic definition for all the data products. In accordance with the regulations, it establishes the data ingestion methods and governance policies to secure the data sets. Given such support from the fabric, the mesh pattern performs better with entity-level storage.
So for every business domain in a mesh distributed network, an exclusive fabric node is deployed. These domains that are specific to a particular business entity own local control of services and pipelines to access the products for the consumers.
Decentralized Data Ownership Model
Enterprises have to import multiple data types from multiple sources into a centralized repository such as a data lake. Here, the data processing normally consumes a lot of effort and is prone to errors too. Querying such heterogeneous data sets for analytics takes a direct hit on the cost. Data professionals, therefore, have been looking for an alternative to this centralized approach. With Mesh’s distributed architecture, they are able to achieve decentralization of ownership for every business entity. Now, such a model reduces the time to generate qualitative insights and thus adds value to the core purpose – access data quickly and impact key business decisions.
The decentralization approach addresses more issues. For example, the query method in traditional data management may lose efficiency with an uncontrollable increase in data volume. It is bound to force changes in the entire pipeline and ultimately fails to respond. As a result, the response time slows down drastically as the number of data sources increases. This has been affecting the process agility to extract data value and scale business outcomes.
With decentralization, the Mesh distributes the ownership to different domains to cater to challenges of incoming data volume and ultimately perform querying at their level, for their relevant sets. As a result, the architecture enables the enterprise process to lessen the gap between an event and its consumption analysis. Enterprises are able to improvise upon key decision-making.
By provisioning data-as-a-service architecture, mesh brings agility in business operations. Not only does it reduce IT backlog but also empowers data teams to work on lean and relevant data streams only.
Therefore, authorized consumers would easily gain access to their respective data sets without realizing the underlying complexity.
Moving on from digital data, web 3.0 is committed to decentralizing enterprise processes. And data management is an important use case in this direction. It is clear that centralized authority fails beyond a certain limit to accommodate the explosive, incoming data. Wait and watch for 2022 that would put Data Mesh architecture at the front.
Yash Mehta is an IoT and Big Data enthusiast who has contributed many articles on IDG, IEEE, Entrepreneur, etc. publications. He co-developed platforms like Getlua that lets users easily merge multiple files together. He also founded a research platform that generates actionable insights from experts.