GPU-accelerated, In-database Analytics for Operationalizing AI
This blog explores how the massive parallel processing power of the GPU is able to unify the entire AI pipeline on a single platform, and how this is both necessary and sufficient for overcoming the challenges to operationalizing AI.
By Amit Vij, CEO and co-founder, Kinetica.
Organizations often have custom algorithms, functions and libraries for analyzing the large datasets feeding artificial intelligence applications. Data scientists also use open source machine learning and deep learning libraries, such as TensorFlow, Caffe and Torch, to help gain insights. Because the typical architecture is a data pipeline consisting of multiple analytical processes that all require the outputs to be persisted, operationalizing AI applications can be quite challenging.
This article explores how the massive parallel processing power of the graphics processing unit (GPU) is able to unify the entire AI pipeline on a single platform, and how this is both necessary and sufficient for overcoming the challenges to operationalizing AI.
For in-memory databases—a common platform for AI applications—the dramatic increase in read/write access to system RAM (100 nanoseconds vs. 10 milliseconds for direct-attached storage), I/O is no longer the biggest bottleneck. Instead, for those applications that must ingest and analyze large volumes of high-velocity data, the new performance bottleneck is the CPU.
The default solution to achieving satisfactory performance is to scale clusters of servers both up and out. The problem with this approach is that, after 50 years of achieving steady gains in price/performance, Moore’s Law has finally reached a practical limit for the x86 CPU. The geometries now needed to achieve higher performance are so costly to manufacture that price/performance actually decreases.
A far more cost-effective approach for compute-bound applications is to supplement the CPU with one or more GPUs. Such configurations are able to analyze data up to 100 times faster than those containing CPUs alone based on the massively parallel processing power of the GPU. Some GPUs, for example, contain upwards of 5,000 cores—roughly 200 times more than the 16-32 cores found in today’s more powerful CPUs.
Another reason for the dramatic improvement in performance is that the GPU’s architecture is particularly well suited to processing the types of vector and matrix operations found in the machine learning and deep learning algorithms used in AI applications. In addition, by storing data in system memory in vectorized columns, the architecture is able to optimize processing across all available GPU cores.
These and other advantages of the GPU database make it possible to unify and operationalize the entire AI pipeline.
Unifying the AI Pipeline
There are three basic machine learning (ML) steps or processes normally used in AI applications: data generation, model training and model serving. For reasons ranging from organizational silos to throughput performance, these steps are often implemented on separate platforms. Such an architecture is inherently inefficient owing to the need to transport and often transpose data across disparate systems.
With open architectures, and support for both open source and commercial software now making it easier to bridge the gaps between organizational silos, the main impediment to unifying the AI pipeline is performance. With its ability to ingest and analyze data on a single platform and in real time, the GPU database is able to deliver the performance needed to unify all three steps in the ML pipeline, as shown in the diagram.
Fig. 1. GPU databases deliver the performance needed to unify and operationalize the entire ML pipeline on a single platform to facilitate faster model development and deployment
The first step, data generation, involves acquiring, preparing and persisting the datasets needed to train the models. GPU databases offer advantages in all three data generation tasks:
- Data acquisition uses connectors for both data-in-motion and data-at-rest that are capable of acquiring millions of rows of data across multiple sources in seconds.
- Data preparation delivers millisecond response times using popular languages like SQL, C++, Java and Python, making it easier to explore even the most massive datasets.
- Data persistence provides the ability to store and manage multi-structured data types in a single GPU database to make all data readily accessible to all ML algorithms.
Step two, model training, is the most resource-intensive step in the ML pipeline, making it the biggest potential bottleneck to supporting the plug-in custom code and open source machine learning libraries needed for in-line model training. GPU databases maximize performance in three ways:
- Acceleration: Massive parallel processing makes GPUs well-suited for compute-intensive model training workloads on large datasets, which eliminates the need for data sampling and expensive, resource-intensive tuning.
- Distributed, scale-out architecture: Clustered GPU databases distribute data across multiple database shards, enabling model training to be parallelized while still being unified on a common platform.
- Vector and matrix operations: GPU databases use purpose-built data structures and process optimizations to take full advantage of the GPU’s parallel processing power.
The third and final model serving step benefits from the ability to operationalize AI by bundling the ML framework(s) and deploying the models in the same GPU database used for data generation and model training. By unifying the entire ML pipeline, models can be assessed in-line for faster scoring and more accurate predictions. The AI database can then be operationalized by running it continuously in a production environment, where it is able to deliver the performance and persistence needed for business-critical applications.
Making Artificial Intelligence Real
The GPU-accelerated in-memory database is destined to become an increasingly popular platform for unifying and operationalizing AI applications. Only the GPU can deliver the performance and price/performance needed to put AI affordably within reach of most organizations. Their open designs also make it easy to incorporate GPU databases into virtually any existing data architecture, and integrate with all open source, commercial and/or custom data analytics frameworks.
Operationalization is made even easier with solutions that support user-defined functions. UDFs are able to receive filtered data, perform arbitrary computations and save the output to a separate table—all in real time on a single GPU database platform. And by making it possible to leverage existing algorithms, models and libraries almost effortlessly, UDFs provide a practical way to bridge the gaps across the organizational silos of data scientists, data analysts, programmers, IT staff and business analysts.
Bio: Amit Vij is a co-founder, board member and CEO of Kinetica. Prior to Kinetica, Amit was a subject matter expert on geospatial intelligence with General Dynamics AIS and had been chief architect for several Department of Defense and Department of Homeland Security contracts. Amit received a B.S. in Computer Engineering from the University of Maryland with concentrations in Computer Science, Electrical Engineering and Mathematics.
- The Rise of GPU Databases
- Deep Learning – Past, Present, and Future
- Tensorflow Tutorial, Part 2 – Getting Started