Follow Gregory Piatetsky, No. 1 on LinkedIn Top Voices in Data Science & Analytics

KDnuggets Home » News » 2018 » Mar » News, Features » For GPU Databases of today, the big challenge is doing JOINS ( 18:n10 )

For GPU Databases of today, the big challenge is doing JOINS


While some GPU database problems have been solved, one challenge remains that only one vendor has tackled properly and that is fast SQL joins on GPU.



Sponsored Post.
By Richard Heyns, CEO of BrytlytBrytlyt GPU Database & Analytics Platform.

For GPU Databases of today, the big challenge is doing JOINS

Recent years have seen massive shifts in both technology and use case as databases have re-invented themselves. From clustered servers to in-memory solutions and NoSQL, the emphasis has largely focused on analytics. The growth of datasets along with pressing need to reduce query times has triggered yet another major evolution in how data is processed.

The trend today is to use innovative hardware accelerators like Graphics Processor Units (GPUs) that can run SQL queries on multi-billion row data sets in milliseconds.

GPU Accelerated Database - Easy Concept, Tough Implementation:

The concept is simple enough – use the massive parallelism of GPUs to achieve massive data processing acceleration. Fine in theory but the reality is more complex. The need for data to be in GPU memory limited database size and shuttling data on and off the GPU undoes much of the performance gains.

Today these problems have mostly been figured out by majority of GPU Database vendors. But one challenge remains that only one vendor has tackled properly and that is fast SQL joins on GPU.

JOINS – The Achilles Heel of GPU Databases

The biggest hurdle for GPU Databases has been figuring out how to achieve parallel processing and that is extremely challenging when it comes to JOIN operations. JOINs, establish a relationship between two tables of data and are critical to meaningful analytics.

Traditional approaches for running JOINs were designed years ago for single-core CPUs and are not well suited for the hundreds of thousands of cores in a GPU system. Figuring out how to effectively parallelize JOINs has been the holy grail of GPU database development in the last several years.

To understand the challenge and the solution, please realize that the nature of GPUs is not to have each core independent of the other cores. In fact, cores are grouped in chunks, typically in the order of 64 at a time, with each chunk running the same instructions.

Brytlyt’s Unique Approach to JOINS

Brytlyt approached the parallelism challenge by devising a patent pending method that recursively separates rows containing a hit from rows that do not. It breaks up the first data set into equal blocks of data then distributes them to the many cores used for searching. For example, a dataset of 400,000 rows would be broken into blocks of 200 rows on a 2000-core GPU. Each GPU core then runs its own search on its own block of data in parallel with all the other cores, giving a huge boost in performance over the traditional CPU.

Empty blocks are discarded and the process repeated with the remainder of the blocks. Then the whole process is done over and over until only the relevant blocks remain. This is an easily scalable process, and the importance of that cannot be overestimated. 10 billion rows could be distributed over 100 GPUs, keeping the same block size, and so achieve the same cycle time.

In Conclusion:

Companies looking to get the most out of their GPU database need to understand how important JOINs are and if the vendors they are considering are proficient in this area.

An independent benchmark comparing Brytlyt with other GPU-based solutions had Brytlyt more than four times as fast as the next vendor. In fact, a billion row table was queried in only a few milliseconds, not only acing the competition by a good margin, but leaving traditional databases lagging far behind  with run times that were many hundreds of times slower.

This benchmark indicates the huge value of the Brytlyt solution. Millisecond queries change the way a user sees responsiveness; they’re well below the “threshold of irritation” where the response latency exceeds user comfort level (typically 2 to 5 seconds). Applying this to mobile marketing, for example, might mean the difference between a sale or a turnoff.

Brytlyt has a very scalable solution, capable of running JOINs on much larger data sets than the competition. It is PostgreSQL-based which means it is very easy to use and is feature rich. And it runs on GPU instances in the cloud so running a trial or setting up Brytlyt for your company is fast and easy!


Sign Up