One Simple Trick for Speeding up your Python Code with Numpy
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
Python is huge.
Over the past several years the popularity of Python has grown rapidly. A big part of that has been the rise of Data Science, Machine Learning, and AI, all of which have highlevel Python libraries to work with!
When using Python for those types of work, it’s often necessary to work with very large datasets. Those large datasets get read directly into memory, and are stored and processed as Python arrays, lists, or dictionaries.
Working with such huge arrays can be time consuming; really that’s just the nature of the problem. You have thousands, millions, or even billions of data points. Every microsecond added to the processing of a single one of those points can drastically slow you down as a result of the large scale of the data you’re working with.
The slow way
The slow way of processing large datasets is by using raw Python. We can demonstrate this with a very simple example.
The code below multiplies the value of 1.0000001 by itself, 5 million times!
import time start_time = time.time() num_multiplies = 5000000 data = range(num_multiplies) number = 1 for i in data: number *= 1.0000001 end_time = time.time() print(number) print("Run time = {}".format(end_time  start_time))
I have a pretty decent CPU at home, Intel i7–8700k plus 32GB of 3000MHz RAM. Yet still, multiplying those 5 million data points took 0.21367 seconds. If instead I change the value of num_multiplies
to 1 billion times, the process took 43.24129 seconds!
Let’s try another one with an array.
We’ll build a Numpy array of size 1000x1000 with a value of 1 at each and again try to multiple each element by a float 1.0000001. The code is shown below.
On the same machine, multiplying those array values by 1.0000001 in a regular floating point loop took 1.28507 seconds.
import time import numpy as np start_time = time.time() data = np.ones(shape=(1000, 1000), dtype=np.float) for i in range(1000): for j in range(1000): data[i][j] *= 1.0000001 data[i][j] *= 1.0000001 data[i][j] *= 1.0000001 data[i][j] *= 1.0000001 data[i][j] *= 1.0000001 end_time = time.time() print("Run time = {}".format(end_time  start_time))
What is Vectorization?
Numpy is designed to be efficient with matrix operations. More specifically, most processing in Numpy is vectorized.
Vectorization involves expressing mathematical operations, such as the multiplication we’re using here, as occurring on entire arrays rather than their individual elements (as in our forloop).
With vectorization, the underlying code is parallelized such that the operation can be run on multiply array elements at once, rather than looping through them one at a time. As long as the operation you are applying does not rely on any other array elements, i.e a “state”, then vectorization will give you some good speed ups.
Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.
The fast way
Here’s the fast way to do things — by using Numpy the way it was designed to be used.
There’s a couple of points we can follow when looking to speed things up:
 If there’s a forloop over an array, there’s a good chance we can replace it with some builtin Numpy function
 If we see any type of math, there’s a good chance we can replace it with some builtin Numpy function
Both of these points are really focused on replace nonvectorized Python code with optimised, vectorized, lowlevel C code.
Check out the fast version of our first example from before, this time with 1 billion multiplications.
We’ve done something very simple: we saw that we had a forloop in which we were repeating the same mathematical operation many times. That should trigger immediately that we should go look for a Numpy function that can replace it.
We found one — the power
function which simply applies a certain power to an input value. The dramatically sped of the code to run in 7.6293e6 seconds — that’s a
import time import numpy as np start_time = time.time() num_multiplies = 1000000000 data = range(num_multiplies) number = 1 number *= np.power(1.0000001, num_multiplies) end_time = time.time() print(number) print("Run time = {}".format(end_time  start_time))
It’s a very similar idea with multiplying values into Numpy arrays. We see that we’re using a double forloop and should immediately recognised that there should be a faster way.
Conveniently, Numpy will automatically vectorise our code if we multiple our 1.0000001 scalar directly. So, we can write our multiplication in the same way as if we were multiplying by a Python list.
The code below demonstrates this and runs in 0.003618 seconds — that’s a 355X speedup!
import time import numpy as np start_time = time.time() data = np.ones(shape=(1000, 1000), dtype=np.float) for i in range(5): data *= 1.0000001 end_time = time.time() print("Run time = {}".format(end_time  start_time))
Like to learn?
Follow me on twitter where I post all about the latest and greatest AI, Technology, and Science! Connect with me on LinkedIn too!
Recommended Reading
Want to learn more about coding in Python? The Python Crash Course book is the best resource out there for learning how to code in Python!
And just a heads up, I support this blog with Amazon affiliate links to great books, because sharing great books helps everyone! As an Amazon Associate I earn from qualifying purchases.
Bio: George Seif is a Certified Nerd and AI / Machine Learning Engineer.
Original. Reposted with permission.
Related:
 Why You Should Start Using .npy Files More Often
 Why You Should Forget ‘forloop’ for Data Science Code and Embrace Vectorization
 Working With Numpy Matrices: A Handy First Reference
Top Stories Past 30 Days

