Making Python Programs Blazingly Fast

Let’s look at the performance of our Python programs and see how to make them up to 30% faster!



By Martin Heinz, DevOps Engineer at IBM

Python haters always say, that one of the reasons they don’t want to use it, is that it’s slow. Well, whether specific program — regardless of the programming language used — is fast or slow is very much dependent on the developer who wrote it and their skill and ability to write optimized and fast programs.

So, let’s prove some people wrong and let’s see how we can improve performance of our Python programs and make them really fast!

Figure

by @veri_ivanova on unsplash

 

Timing and Profiling

 
Before we start optimizing anything, we first need to find out which parts of our code actually slow down the whole program. Sometimes the bottleneck of the program might be obvious, but in case you don’t know where it is, then here are options you have for finding out:

Note: This is the program I will be using for demonstration purposes, it computes e to power of X (taken from Python docs):

 

The Laziest “Profiling”

 
First off, the simplest and honestly very lazy solution — Unix time command:

This could work if you just want to time your whole program, which is usually not enough…

 

The Most Detailed Profiling

 
On the other end of the spectrum is cProfile, which will give you too much information:

Here, we ran the testing script with cProfile module and time argument, so that lines are ordered by internal time ( cumtime). This gives us a lot of information, the lines you can see above are about 10% of the actual output. From this, we can see that exp function is the culprit ( surprise, surprise) and now we can get little more specific with timing and profiling...

 

Timing Specific Functions

 
Now that we know where to direct our attention, we might want to time the slow function, without measuring the rest of the code. For that we can use simple decorator:

This decorator can be then applied to function under test like so:

This gives us output like this:

One thing to consider is what kind of time we actually (want to) measure. Time package provides time.perf_counter and time.process_time. The difference here is that perf_counter returns absolute value, which includes time when your Python program process is not running, therefore it might be impacted by machine load. On the other hand process_time returns only user time (excluding system time), which is only the time of your process.

 

Making It Faster

 
Now, for the fun part. Let’s make your Python programs run faster. I’m (mostly) not going to show you some hacks, tricks and code snippets that will magically solve your performance issues. This is more about general ideas and strategies, which when used, can make a huge impact on performance, in some cases up to 30% speed-up.

 

Use Built-in Data Types

 
This one is pretty obvious. Built-in data types are very fast, especially in comparison to our custom types like trees or linked lists. That’s mainly because the built-ins are implemented in C, which we can’t really match in speed when coding in Python.

 

Caching/Memoization with lru_cache

 
I have already shown this one in a previous blog post here, but I think it’s worth repeating it with simple example:

The function above simulates heavy computation using time.sleep. When called first time with parameter 1, it waits for 2 seconds and only then returns the result. When called again, the result is already cached so it skips the body of the function and returns the result immediately. For more real life example see previous blog posts here.

 

Use Local Variables

 
This has to do with the speed of lookup of variables in each scope. I’m writing each scope, because it’s not just about using local vs. global variables. There’s actually a difference in speed of lookup even between — let’s say — local variable in function (fastest), class-level attribute (e.g. self.name - slower) and global for example imported function like time.time (slowest).

You can improve performance, by using seemingly unnecessary (straight-up useless) assignments like this:

 

Use Functions

 
This might seem counter-intuitive, as calling function will put more stuff onto the stack and create overhead from function returns, but it relates to the previous point. If you just put your whole code into one file without putting it into function, it will be much slower because of global variables. Therefore you can speed up your code just by wrapping whole code in main function and calling it once, like so:

 

Don’t Access Attributes

 
Another thing that might slow down your programs is dot operator (.) which is used when accessing object attributes. This operator triggers dictionary lookup using __getattribute__, which creates extra overhead in your code. So, how can we actually avoid (limit) using it?

 

Beware of Strings

 
Operations on strings can get quite slow when ran in loop using for example modulus (%s) or .format(). What better options do we have? Based on recent tweet from Raymond Hettinger, the only thing we should be using is f-string, it's most readable, concise AND the fastest method. So, based on that tweet, this is the list of methods you can use - fastest to slowest:

Generators are not inherently faster as they were made to allow for lazy computation, which saves memory rather than time. However, the saved memory can be cause for your program to actually run faster. How? Well, if you have a large dataset and you don’t use generators (iterators), then the data might overflow CPUs L1 cache, which will slow down lookup of values in memory significantly.

When it comes to performance, it’s very import that CPU can save all the data it’s working on, as close as possible, which is in the cache. You can watch Raymond Hettingers talk, where he mentions these issues.

 

Conclusion

 
The first rule of optimization is to not do it. But, if you really have to, then I hope these few tips help you with that. However, be mindful when optimizing your code as it might end up making your code hard to read and therefore hard to maintain, which might outweigh benefits of optimization.

Note: This was originally posted at martinheinz.dev

 
Bio: Martin Heinz is a DevOps Engineer at IBM. A software developer, Martin is passionate about computer security, privacy and cryptography, focused on cloud and serverless computing, and is always ready to take on a new challenge.

Original. Reposted with permission.

Related: