How to Write Efficient Python Code Even If You’re a Beginner

You don’t need to be a Python pro to write fast, clean code. Just a few smart coding habits can go a long way.



How to Write Efficient Python Code Even If You're a Beginner
Image by Author | Ideogram

 

When you're starting out with Python, getting your code to work correctly is your first priority. But as you grow as a developer, you'll want your code to be not just correct, but also efficient.

Efficient code runs faster, uses less memory, and scales better when handling larger datasets. The good news is that you don't need years of experience to start writing more efficient Python. With a few simple techniques, you can write more efficient Python even if you’re a  beginner.

In this article, I'll walk you through practical techniques to make your Python code more efficient. For each technique, you'll see a clear comparison between the less-than-efficient approach and the more efficient alternative.

🔗 You can find the code on GitHub

 

Use Built-In Functions Instead of Manual Implementations

 
Python comes with many built-in functions that can do almost any simple non-trivial task. These functions have been optimized and are designed to handle common operations efficiently.

Instead of this:

def process_sales_data(sales):
    highest_sale = sales[0]
    for sale in sales:
        if sale > highest_sale:
            highest_sale = sale
    
    total_sales = 0
    for sale in sales:
        total_sales += sale
    
    return highest_sale, total_sales, total_sales / len(sales)

 
This approach iterates through the list twice to find the highest value and the total, which is not efficient.

Do this:

def process_sales_data(sales):
    return max(sales), sum(sales), sum(sales) / len(sales)

 
This approach uses Python's built-in max() and sum() functions, which are optimized for these exact operations. This version is not only faster (especially for larger datasets) but also more readable and less prone to errors.

So whenever you find yourself writing loops to perform common operations on data collections, check if there's a built-in function that could do the job more efficiently.

 

Use List Comprehensions, But Keep Them Readable

 
List comprehensions are go-to options to create lists from existing lists and other sequences. They are more concise than equivalent for loops and are often faster, too.

Instead of this:

def get_premium_customer_emails(customers):
    premium_emails = []
    for customer in customers:
        if customer['membership_level'] == 'premium' and customer['active']:
            email = customer['email'].lower().strip()
            premium_emails.append(email)
    return premium_emails

 
This creates an empty list, then repeatedly calls .append() inside a loop. Each append operation comes with some overhead.

Do this:

def get_premium_customer_emails(customers):
    return [
        customer['email'].lower().strip()
        for customer in customers
        if customer['membership_level'] == 'premium' and customer['active']
    ]

 
The list comprehension expresses the entire operation in one statement. The result is code that runs faster while also being more readable once you're familiar with the pattern.

🔖 List comprehensions work best when the transformation is straightforward. If your logic gets complex, consider breaking it into simpler steps or using a traditional loop for clarity.

Need further advice, read Why You Should Not Overuse List Comprehensions in Python.

 

Use Sets and Dictionaries for Fast Lookups

 

When you need to check if an item exists in a collection or perform frequent lookups, sets and dictionaries are far more efficient than lists. They provide nearly constant-time operations regardless of size, while list lookups get slower as the list grows.

Instead of this:

def has_permission(user_id, permitted_users):
    # permitted_users is a list of user IDs
    for p_user in permitted_users:
        if p_user == user_id:
            return True
    return False

permitted_users = [1001, 1023, 1052, 1076, 1088, 1095, 1102, 1109]
print(has_permission(1088, permitted_users))

 
This checks each element in the list until it finds a match, which is linear time O(n).

Do this:

def has_permission(user_id, permitted_users):
    # permitted_users is now a set of user IDs
    return user_id in permitted_users

permitted_users = {1001, 1023, 1052, 1076, 1088, 1095, 1102, 1109}
print(has_permission(1088, permitted_users))

 
The second approach uses a set (note the curly braces instead of square brackets). Sets in Python use hash tables internally, which allow for very fast lookups.

When you check if an item is in a set, you can get the answer almost instantly, regardless of the set's size. This is constant time complexity (O(1)).

For small collections, the difference might be negligible. But as your data grows, set approach is faster.

 

Use Generators to Process Large Data Efficiently

 

When working with large datasets, trying to load everything into memory at once can cause your program to slow down or crash. Generators provide a memory-efficient solution by producing values one at a time, on demand.

Instead of this:

def find_errors(log_file):
    with open(log_file, 'r') as file:
        lines = file.readlines()
    
    error_messages = []
    for line in lines:
        if '[ERROR]' in line:
            timestamp = line.split('[ERROR]')[0].strip()
            message = line.split('[ERROR]')[1].strip()
            error_messages.append((timestamp, message))
    
    return error_messages

 
This reads the entire file into memory with readlines() before processing any data. If the log file is very large (several gigabytes, for example), this could use a lot of memory and potentially cause your program to crash.

Do this:

def find_errors(log_file):
    with open(log_file, 'r') as file:
        for line in file:
            if '[ERROR]' in line:
                timestamp = line.split('[ERROR]')[0].strip()
                message = line.split('[ERROR]')[1].strip()
                yield (timestamp, message)

# Usage:
for timestamp, message in find_errors('application.log'):
    print(f"Error at {timestamp}: {message}")

 
Here we use a generator. Also note how generator functions use the yield keyword instead of return. It reads and processes just one line at a time, returning each result as it's found. This means:

  1. Memory usage stays low regardless of file size
  2. You start getting results immediately without waiting for the entire file to be processed
  3. If you only need to process part of the data, you can stop early and save time

Generators are great for processing large files, web streams, database queries, or any data source that might be too large to fit comfortably in memory all at once.

 

Don't Repeat Expensive Operations in Loops

 
A simple but powerful optimization is to avoid performing the same expensive calculation repeatedly in a loop. If an operation doesn't depend on the loop variable, do it only once outside the loop.

Instead of this:

import re
from datetime import datetime

def find_recent_errors(logs):
    recent_errors = []
    
    for log in logs:
        # This regex compilation happens on every iteration
        timestamp_pattern = re.compile(r'\[(.*?)\]')
        timestamp_match = timestamp_pattern.search(log)
        
        if timestamp_match and '[ERROR]' in log:
            # The datetime parsing happens on every iteration
            log_time = datetime.strptime(timestamp_match.group(1), '%Y-%m-%d %H:%M:%S')
            current_time = datetime.now()
            
            # Check if the log is from the last 24 hours
            time_diff = (current_time - log_time).total_seconds() / 3600
            if time_diff <= 24:
                recent_errors.append(log)
    
    return recent_errors

 
The first approach has two operations inside the loop that don't need to be repeated:

  1. Compiling a regular expression with re.compile() on every iteration
  2. Getting the current time with datetime.now() on every iteration

Since these values don't change during the loop execution, calculating them repeatedly is wasteful.

Do this:

import re
from datetime import datetime

def find_recent_errors(logs):
    recent_errors = []
    
    # Compile the regex once
    timestamp_pattern = re.compile(r'\[(.*?)\]')
    # Get the current time once
    current_time = datetime.now()
    
    for log in logs:
        timestamp_match = timestamp_pattern.search(log)
        
        if timestamp_match and '[ERROR]' in log:
            log_time = datetime.strptime(timestamp_match.group(1), '%Y-%m-%d %H:%M:%S')
            
            # Check if the log is recent (last 24 hours)
            time_diff = (current_time - log_time).total_seconds() / 3600
            if time_diff <= 24:
                recent_errors.append(log)
    
    return recent_errors

 
In this second approach, we move the expensive operations outside the loop so they're performed just once.

This simple change can significantly improve performance, especially for loops that run many times. The savings grow proportionally with the number of iterations. Meaning with thousands of log entries, you could save thousands of unnecessary operations.

 

Don't Use += on Strings in Loops

 
When building strings incrementally, using += in a loop is inefficient. Each operation creates a new string object, which becomes increasingly expensive as the string grows larger. Instead, collect string parts in a list and join them at the end.

Instead of this:

def generate_html_report(data_points):
    html = "<html><body><h1>Data Report</h1><ul>"
    
    for point in data_points:
        # This creates a new string object on each iteration
        html += f"<li>{point['name']}: {point['value']} ({point['timestamp']})</li>"
    
    html += "</ul></body></html>"
    return html

 

The problem with the first approach is that strings in Python are immutable: they can't be changed after creation. When you use += on a string, Python:

  1. Creates a new string large enough to hold both strings
  2. Copies all the characters from the original string
  3. Adds the new content
  4. Discards the old string

As your string grows larger, this process becomes expensive.

Do this:

def generate_html_report(data_points):
    parts = ["<html><body><h1>Data Report</h1><ul>"]
    
    for point in data_points:
        parts.append(f"<li>{point['name']}: {point['value']} ({point['timestamp']})</li>")
    
    parts.append("</ul></body></html>")
    return "".join(parts)

 

The second approach builds a list of string fragments with the .append() method, then joins them all at once at the end. This avoids creating and destroying multiple intermediate string objects.

This pattern becomes particularly important when building long strings iteratively, such as when generating reports, concatenating file contents, or building large XML or HTML documents.

 

Wrapping Up

 
Writing efficient Python code doesn't require advanced knowledge. It's often about knowing which approach to use in common situations. The techniques covered in this guide focus on practical patterns that can make a real difference in your code's performance:

  • Using built-in functions instead of manual implementations
  • Choosing list comprehensions for clear and efficient transformations
  • Selecting the right data structure (sets and dictionaries) for lookups
  • Using generators to process large data efficiently
  • Moving invariant operations out of loops
  • Building strings efficiently by joining lists

Remember that code readability should still be a priority. Fortunately, many of these efficient approaches also lead to cleaner, more expressive code, giving you programs that are both easy to understand and performant.

I hope these tips help you on your journey to becoming a better Python programmer. Keep coding!

 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she's working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy


Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

Get the FREE ebook 'KDnuggets Artificial Intelligence Pocket Dictionary' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy

No, thanks!