Is OLAP Dead?
OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.
On Google, it's easy to find information about OLAP (online analytical processing) and OLAP-related concepts, such as multi-dimensional analysis, and OLAP cubes. That is because OLAP is a well-known and mature concept; but in this age of high-powered analytics, is it "yesterday's technology," or even obsolete? That question seems to be on the minds of many, as a search for "OLAP is dead" returns more than 2 million results (See Fig. 1).
Fig. 1 Search "OLAP is dead" on Google
Is OLAP Really Dead?
Before the cloud-era, when enterprise data warehouses (EDW) enjoyed wide adoption, OLAP databases were mandatory components in the data mart layer, such as Oracle Essbase and Microsoft SSAS (SQL Server Analysis Service). But when data warehouses moved to the cloud, and the data lake rose in prominence, OLAP became an old school and uncool; many even argued that OLAP–as well as other legacies, approaches like extract, transform, and load (ETL) and data modeling–was dead.
New, cloud-centric technologies and techniques would transform analytics. Enterprises that didn’t adopt the new ways would be left behind. It was time to adapt to innovative approaches, like:
- Multi-cloud architecture: With lots of enterprises going global and building products and services on different cloud platforms and in different regions, data was being stored in many places rather than consolidated in a single database or data warehouse. Data management philosophy focused on connected, not collected. (See Fig. 2)
- Cloud-native architecture: To accommodate cloud adoption, the software industry focused on building cloud-native products to allow users of their software to fully leverage the unlimited resources of the cloud, manage them with elasticity and flexibility, and enjoy a lower total cost of ownership (TCO) on IT infrastructure.
- Modern data stack: With the rise of the modern data stack, enterprises have more choices for data management and analytics. Many easy-to-adopt cloud data lakes or data warehouse-as-a-services are provided out-of-the-box in the cloud, helping organizations of every kind maximize the value of their data for a wide range of purposes.
Fig.2 Data from collected to connected
Given the hype and the speed with which these trends and technologies are evolving, users might be excused for thinking that a data warehouse and data lake is all that is needed on a data analytics platform. They could also be excused for believing that OLAP is strictly optional. But is that true? Let's take a closer look.
Citizen Analytics is Commonplace
It wasn't that long ago that data consumers operated mostly at the management level. Today, however, first-line office workers such as store managers or marketers rely heavily on data as a daily part of their jobs. They are citizen analysts, and they are everywhere. For example, during the pandemic, many insurance companies released products covering safety and convenience based on insights and analyses derived from COVID-19 datasets. Without that data, product managers may have rushed new products to market that were unneeded or poorly designed, resulting in losses.
With many organizations adopting digital transformation, every business is now a data consumer seeking ways to leverage data to gain new insights, make better decisions, and operate more competitively.
Meeting the needs of citizen analysts means delivering usage and performance expectations that make citizen analytics possible, including:
- Faster time-to-insight from data: Time is the key. Business users require quick turnaround to ensure maximal value from new insights, and to gain an edge by capitalizing on emerging market trends derived from data delivered by the data engineering team.
- Easy-to-use, self-service interface: SQL is one of the best languages for using data, but it is still not user-friendly because it is presented as tables and columns. Citizen analysts want simple tools that allow them to use data without an engineering degree.
- Single source of truth: A major pain point for citizen analysts is that it is often hard to find the right data. Sales metrics may exist in both a CRM dashboard and a financial application, and the data may be in different formats. Citizen analysts need a single source of truth, and a metrics store populated with data they can trust makes it easy.
From In-House Analytics to Internet Services
It is said that, in the digital era, every product is a data product. That is because B2B companies not only sell their technologies but also build new products and services based on data those products generate and collect. That data is not only used for in-house analytics, but also for internet services. Building data products and data services based on a data lake or data warehouse is a step toward maximizing the value of data and generating new revenue streams for the business.
For example, the company AppZen supports corporate finance teams by using artificial intelligence and automation to streamline common processes like expense approvals, invoicing, fraud detection, and more. But the company also has a data product called Mastermind Analytics that provides insights to help finance auditors reduce spending, comply with policy, and streamline processes.
Mastermind Analytics operates on the data derived from its flagship product. The synergy between AppZen and Mastermind Analytics illustrates how the data collected and generated by one product can become Data-as-a-Service, and provide the basis for innovation and an entirely new product.
Technical Debts Keep Growing
By recognizing how data is generated, analyzed, and consumed by businesses today, and by understanding the urgency for delivering new insights and innovations, it is easy to see how the problem of technical debt could continue to grow. When the speed of product development is prioritized over writing excellent code, the cycle can outpace the ability to complete needed revisions. Avoiding this situation is where we make the case that OLAP is not only not dead, but should be a vital part of any data platform.
Debts of Flat Tables and ETL Jobs
The most common way to build a data pipeline for serving data analytics is to generate flat tables. However, the overuse of such methods will cause a proliferation of flat tables. For example, one internet company in China has more than 5,700 source tables in its data lake, but that number explodes to nearly one million flat tables and aggregated tables after a long period of use in data analytics, forcing the data team to actively govern and manage its excess of tables. They not only face the challenge of managing data quality and consistency but also controlling the ever-growing operating cost resulting from the proliferating data. What’s worse, each flat table generated by at least one ETL job and the script takes extra effort to maintain, and extra computing resources to run.
Fig. 3 Flat Tables Explosion
How many computation and storage resources must be invested for each source table? And how much usage is required for each flat table? It is difficult to know, and so costs are not only high but unpredictable. And once a flat table is generated, it's difficult for it to be reused by others since the processing logic is not easy to understand. The problem is exacerbated by problematic lifecycle management. As a result, flat tables and ETL jobs become technical debts.
Debts of Business Intelligence Dashboards
Business intelligence (BI) dashboards are regarded as requisite deliverables from data engineers and analysts to business users. Even in smaller organizations, it is common to see lots of BI dashboards and reports in use. The core value of data is the metrics generated by the dashboards, but too many dashboards can be confusing and costly. It is more efficient to use fewer dashboards to maintain a consistent user experience, lower costs, and still provide the necessary tools for users across the organization.
One Chinese financial services firm built a metrics store to maintain nearly 10,000 metrics in total while reducing the number of dashboards curated and generated by data engineers and analysts. This not only saves cost and makes it easier for both users and data teams, but a proliferation of BI dashboards will lead to data silos and business misalignment, having a detrimental effect on processes and results. Once again, the effect is more technical debt.
The Solution? Bring OLAP to the Cloud
This is where OLAP can support greater process efficiency and drive more consistent results. That is because OLAP is an approach designed to process analytics queries involving multiple dimensions. Leveraging the core concept of multidimensional data modeling (MDM), OLAP enables the "slicing and dicing" of data from different perspectives for a streamlined query experience.
Take Apache Kylin as an example. It evolved as cloud-native architecture and is positioned as OLAP on data lakes. Once the data is stored in a cloud, like Amazon S3 or Azure Data Lake Storage, it supports self-service analytics for business users.
Fig. 4 Apache Kylin Architecture
Here are some of the key features of Apache Kylin that make more efficient self-serve data analytics in the cloud possible:
- Multi-dimensional data modeling: This is the core concept of Apache Kylin. It's designed by joining tables into a star or snowflake schema and defining a data model's dimensions and measures. This gives users an easier interface to access and analyze data and delivers the data in a format business users can more easily understand.
- Precomputation: By precomputing source data into cubes, query performance and concurrency can be optimized in multiple scenarios. Especially in the cloud age, storage is much cheaper than computing and network resources. So precomputing helps by making processes more efficient and saving costs.
- Query push-down: With query push-down, Apache Kylin can route some queries to the data source, or another SQL engine. As a result, even if the source data resides in a different location, MDMs can be created in one place, and connect the source data in a unified view.
- Cloud-native architecture: Using Apache Spark to compute, Apache Kylin can be deployed easily in the cloud and integrated with your cloud data lake. Elastic scaling means the cluster can scale out on demand according to workload requirements.
With OLAP evolving for use in the cloud, other common challenges can be addressed as well, including:
Use governed data marts to reduce flat tables
Once a multidimensional data model is created and precomputation finished, an end-to-end data pipeline is complete and business users can begin operationalizing their data. There are no ETLs or flat tables that need to be created and managed for data aggregation. Precomputation results will use fewer storage resources and be a better alternative to flat tables, and they will be governed under the umbrella of data models. And precomputation results can be managed flexibly according to the query patterns, reducing the debts normally caused by flat tables.
Use as a metrics store for citizen analytics
With all the business metrics stored in MDMs, users can easily find the metrics in their data models, rather than search for metrics in disparate dashboards; and business users can use whatever tools they want to access the metrics. For example, the finance team or operation team can use Excel to access the metrics defined in the data models, so they don't need to learn any new technology. This reduces the debts created by too many BI dashboards.
With a unified query interface and underlying engine, OLAP enables Data-as-a-Service (DaaS) to allow enterprises to expose services for processed multi-source data through standard APIs. OLAP functions as the service layer, and also provides standard capabilities such as data access control, encryption, and obfuscation. Most importantly, high performance and high concurrency are the keys to making this happen. Lastly, OLAP can scale out to adapt to future developments and be deployed on private, public, and hybrid clouds to fit a variety of enterprise IT architectures.
OLAP is far from dead. It remains relevant, even in the cloud era, storing data in multidimensional structures, providing semantic definitions, and taking on an important role in data analytics and management on the data lake. Moreover, OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.
- Every product will be a data product: https://medium.com/kyligence/every-product-will-be-a-data-product-19e648f0333
- AppZen: https://www.appzen.com/company/?hsLang=en
- AppZen Launches Mastermind Analytics to Deliver AI-Powered On-Demand Finance Benchmarking: https://www.appzen.com/newsroom/appzen-launches-mastermind-analytics-to-deliver-ai-powered-on-demand-finance-benchmarking
- BI Dashboards are Creating a Technical Debt Black Hole: https://medium.com/@LoriLu/bi-dashboards-are-creating-a-technical-debt-black-hole-31be41ee96f
- News of the Death of OLAP Has Been Greatly Exaggerated: https://kyligence.io/blog/news-of-the-death-of-olap-has-been-greatly-exaggerated/
Dong Li is the Founding Member and VP of Growth at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data technology development. Previously, he was a Senior Engineer in eBay's Global Analytics Infrastructure Department, a Software Development Engineer for Microsoft Cloud Computing and Enterprise Products, and a core member of the Microsoft Business Products Dynamics Asia Pacific team where he participated in the development of a new generation of cloud-based ERP solutions.