Getting Started with Python Generators

Learn about Python generators and write memory-efficient and Pythonic code.



Getting Started with Python Generators
Image by Author

 

Learning how to work with Python generators can help you write more Pythonic and efficient code. Using generators can be especially useful when you need to work with large sequences.

In this tutorial, you’ll learn how to use generators in Python by defining generator functions and generator expressions. You’ll then learn how using generators can be a memory-efficient choice. 

 

Defining Generator Functions in Python

 

To understand how a generator function is different from a normal Python function, let's start with a regular Python function and then rewrite it as a generator function.

Consider the following function get_cubes(). It takes in a number num as the argument and returns the list of cubes of the numbers 0, 1, 2 up to num -1:

def get_cubes(num):
    cubes = []
    for i in range(num):
        cubes.append(i**3)
    return cubes

 

The above function works by looping through the list of numbers 0, 1, 2, up to num -1 and appending the cube of each number to the cubes list. Finally, it returns the cubes list. 

You can already tell this is not the recommended Pythonic way to create a new list. Instead of looping through using a for loop and using the append() method, you can use a list comprehension expression. 

Here is the equivalent of the function get_cubes() that uses list comprehension instead of an explicit for loop and the append() method:

def get_cubes(num):
    cubes = [i**3 for i in range(num)]
    return cubes

 

Next let’s rewrite this function as a generator function. The following code snippet shows how the get_cubes() function can be rewritten as a generator function get_cubes_gen():

def get_cubes_gen(num):
    for i in range(num):
        yield i**3

 

From the function definition, you can tell the following differences:

  • We have the yield keyword instead of the return keyword.
  • We are not returning a sequence or populating an iterable such as a Python list to get the sequence.

So how does the generator function work? To understand, let’s call the above-defined functions and take a closer look.

 

Understanding Function Calls

 

Let us call the get_cubes() and get_cubes_gen() functions and see the differences in the respective function calls.

When we call the get_cubes() function  with the number 6 as the argument, we get the list of cubes as expected. 

cubes_gen = get_cubes_gen(6)
print(cubes_gen)

 

Output >> [0, 1, 8, 27, 64, 125]

 

Now call the generator the function with the same number 6 as the argument and see what happens. You can call the generator function get_cubes_gen() just the way you would call a normal Python function. 

cubes_gen = get_cubes_gen(6)
print(cubes_gen)

 

If you print out the value of cubes_gen(), you’ll get a generator object as opposed to the entire resultant list that contains the cube of each of the numbers.

Output >> <generator object get_cubes_gen at 0x011B6530>

 

So how do you access the elements of the sequence? To code along, start a Python REPL and import the generator function. Here, I have my code in the gen_example.py file, so I’m importing the get_cubes_gen() function from the get_cubes_gen() module. 

>>> from gen_example import get_cubes_gen
>>> cubes_gen = get_cubes_gen(6)

 

You can call next() with the generator object as the argument. Doing so returns 0, the first element in the sequence

>>> next(cubes_gen)
0

 

Now when you call next() again, you’ll get the next element in the sequence, which is 1.

>>> next(cubes_gen)
1

 

To access the subsequent elements in the sequence, you can continue to call next(), as shown: 

>>> next(cubes_gen)
8
>>> next(cubes_gen)
27
>>> next(cubes_gen)
64
>>> next(cubes_gen)
125

 

For num = 6, the resultant sequence is the cube of the numbers 0, 1, 2, 3, 4, and 5. Now that we’ve reached 125, the cube of 5, what happens when you call next again? 

We see that a StopIteration exception is raised. 

>>> next(cubes_gen)
Traceback (most recent call last):
  File "", line 1, in 
StopIteration

 

Under the hood, the generator function executes until the execution reaches the yield statement, and the control returns to the call site. However, unlike a normal Python function that returns control to the call site once the return statement, a generator function suspends execution temporarily. And it keeps track of its state that helps us get the subsequent elements by calling next(). 

You can also loop through the generator object using a for loop. The control exits the loop when the StopIteration exception is raised (that’s how for loops work under the hood). 

for cube in cubes_gen:
    print(cube)


# Output
0
1
8
27
64
125

 

cubes_gen = (i**3 for i in range(num))

 

Generator Expressions in Python

 

Another common way to use generators is using generator expressions. Here’s  the generator expression equivalent of the get_cubes_gen() function:

cubes_gen = (i**3 for i in range(num))

 

The above generator expression may look similar to list comprehension, except for the use of () in place of []. However, as discussed, the following key differences hold:

  • A list comprehension expression generates the entire list and stores it in memory.
  • The generator expression, on the other hand, yields the elements of the sequence on demand.

 

Python Generators vs. Lists: Understanding Performance Improvements

 

In the sample function call in the previous section, we generated a sequence of cubes of the numbers zero through five. For such small sequences, using a generator may not give you significant performance gains. However, generators are certainly a memory-efficient choice when you work with longer sequences. 

To see this in action, generate the sequence of cubes for value of num in a  wider range:

size_l = []
size_g = []

# run for various values of num
for i in [10, 100, 1000, 10000, 100000, 1000000]:
    cubes_l = [j**3 for j in range(i)]
    cubes_g = (j**3 for j in range(i))
    # get the sizes of static list and generator expression
    size_l.append(sys.getsizeof(cubes_l))
    size_g.append(sys.getsizeof(cubes_g))

 

Now let us print out the size of the size in memory of the static list and the generator object for the when num changes (as in the snippet above):

print(f"size_l: {size_l}")
print(f"size_g: {size_g}")

 

From the output, we see that the generator object has a constant memory footprint unlike a list where the memory grows with num.This is because a generator performs lazy evaluation and yields the subsequent values in the sequence on demand. It does not compute all the values ahead of time.

# Output
size_l: [92, 452, 4508, 43808, 412228, 4348728]
size_g: [56, 56, 56, 56, 56, 56]

 

To get a better idea of how the sizes of the static list and generator change with change in the value of num, we can plot the values of num and the sizes of the list and the generators, as shown below: 

 

Getting Started with Python Generators

 

In the graph above, we see that when num increases, the size of the generator is constant, whereas the size of the list is prohibitively large.

 

Conclusion

 

In this tutorial, you’ve learned how generators work in Python. The next time you need to work with a large file or dataset, you can consider using generators to iterate efficiently over it. When you use generators, you can iterate over the generator object, read in a line or a small chunk, process it or apply transformations as needed—without having to store the original dataset in memory. However, keep in mind that you cannot store such values in memory for processing at a later time. If you need to, you’ll have to use lists.
 
 
Bala Priya C is a technical writer who enjoys creating long-form content. Her areas of interest include math, programming, and data science. She shares her learning with the developer community by authoring tutorials, how-to guides, and more.