Leveraging the Power of GPUs with CuPy in Python

Whether you're doing machine learning, scientific computing, or working with huge datasets, CuPy is an absolute game-changer.

Leveraging the Power of GPUs with CuPy in Python
Image by Author


What is CuPy?


CuPy is a Python library that is compatible with NumPy and SciPy arrays, designed for GPU-accelerated computing. By replacing NumPy with CuPy syntax, you can run your code on NVIDIA CUDA or AMD ROCm platforms. This allows you to perform array-related tasks using GPU acceleration, which results in faster processing of larger arrays.

By swapping out just a few lines of code, you can take advantage of the massive parallel processing power of GPUs to significantly speed up array operations like indexing, normalization, and matrix multiplication. 

CuPy also enables access to low-level CUDA features. It allows passing of ndarrays to existing CUDA C/C++ programs using RawKernels, streamlines performance with Streams, and enables direct calling of CUDA Runtime APIs.


Installing CuPy


You can install CuPy using pip, but before that you have to find out the right CUDA version using the command below. 

!nvcc --version


nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


It seems that the current version of Google Colab is using CUDA version 11.8. Therefore, we will proceed to install the cupy-cuda11x version. 

If you are running on an older CUDA version, I have provided a table below to help you determine the appropriate CuPy package to install. 


Leveraging the Power of GPUs with CuPy in Python
Image from CuPy 12.2.0


After selecting the right version, we will install the Python package using pip. 

pip install cupy-cuda11x


You can also use the conda command to automatically detect and install the correct version of the CuPy package if you have Anaconda installed.

conda install -c conda-forge cupy


Basics of CuPy


In this section, we will compare the syntax of CuPy with Numpy and they are 95% similar. Instead of using np you will be replacing it with cp.

We will first create a NumPy and CuPy array using the Python list. After that, we will calculate the norm of the vector. 

import cupy as cp
import numpy as np

x = [3, 4, 5]
x_np = np.array(x)
x_cp = cp.array(x)
l2_np = np.linalg.norm(x_np)
l2_cp = cp.linalg.norm(x_cp)
print("Numpy: ", l2_np)
print("Cupy: ", l2_cp)


As we can see, we got similar results. 

Numpy:  7.0710678118654755
Cupy:  7.0710678118654755


To convert a NumPy to CuPy array, you can simply use cp.asarray(X).

x_array = np.array([10, 22, 30])
x_cp_array = cp.asarray(x_array)




Or, use .get(), to convert CuPy to Numpy array. 

x_np_array = x_cp_array.get()




Performance Comparison 


In this section, we will be comparing the performance of NumPy and CuPy.

We will use time.time() to time the code execution time. Then, we will create a 3D NumPy array and perform some mathematical functions. 

import time

# NumPy and CPU Runtime
s = time.time()
x_cpu = np.ones((1000, 100, 1000))
np_result = np.sqrt(np.sum(x_cpu**2, axis=-1))
e = time.time()
np_time = e - s
print("Time consumed by NumPy: ", np_time)


Time consumed by NumPy: 0.5474584102630615


Similarly, we will create a 3D CuPy array, perform mathematical operations, and time it for performance. 

# CuPy and GPU Runtime
s = time.time()
x_gpu = cp.ones((1000, 100, 1000))
cp_result = cp.sqrt(cp.sum(x_gpu**2, axis=-1))
e = time.time()
cp_time = e - s
print("\nTime consumed by CuPy: ", cp_time)


Time consumed by CuPy: 0.001028299331665039


To calculate the difference, we will divide NumPy time with CuPy time and It seems like we got above 500X performance boost while using CuPy. 

diff = np_time/cp_time
print(f'\nCuPy is {diff: .2f} X time faster than NumPy')


CuPy is 532.39 X time faster than NumPy


Note:  To achieve better results, it is recommended to conduct a few warm-up runs to minimize timing fluctuations.


Beyond its speed advantage, CuPy offers superior multi-GPU support, enabling harnessing of collective power of multiple GPUs.

Also, you can check out my Colab notebook, if you want to compare the results.




In conclusion, CuPy provides a simple way to accelerate NumPy code on NVIDIA GPUs. By making just a few modifications to swap out NumPy for CuPy, you can experience order-of-magnitude speedups on array computations. This performance boost allows you to work with much larger datasets and models, enabling more advanced machine learning and scientific computing.





Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.