Getting Started with Python Generators

Learn about Python generators and write memory-efficient and Pythonic code.

Image by Author

Learning how to work with Python generators can help you write more Pythonic and efficient code. Using generators can be especially useful when you need to work with large sequences.

In this tutorial, you’ll learn how to use generators in Python by defining generator functions and generator expressions. You’ll then learn how using generators can be a memory-efficient choice.

Defining Generator Functions in Python

To understand how a generator function is different from a normal Python function, let's start with a regular Python function and then rewrite it as a generator function.

Consider the following function `get_cubes()`. It takes in a number `num` as the argument and returns the list of cubes of the numbers 0, 1, 2 up to num -1:

``````def get_cubes(num):
cubes = []
for i in range(num):
cubes.append(i**3)
return cubes``````

The above function works by looping through the list of numbers 0, 1, 2, up to num -1 and appending the cube of each number to the `cubes` list. Finally, it returns the `cubes` list.

You can already tell this is not the recommended Pythonic way to create a new list. Instead of looping through using a for loop and using the `append()` method, you can use a list comprehension expression.

Here is the equivalent of the function `get_cubes()` that uses list comprehension instead of an explicit for loop and the `append()` method:

``````def get_cubes(num):
cubes = [i**3 for i in range(num)]
return cubes``````

Next let’s rewrite this function as a generator function. The following code snippet shows how the `get_cubes()` function can be rewritten as a generator function `get_cubes_gen()`:

``````def get_cubes_gen(num):
for i in range(num):
yield i**3``````

From the function definition, you can tell the following differences:

• We have the yield keyword instead of the return keyword.
• We are not returning a sequence or populating an iterable such as a Python list to get the sequence.

So how does the generator function work? To understand, let’s call the above-defined functions and take a closer look.

Understanding Function Calls

Let us call the `get_cubes()` and `get_cubes_gen()` functions and see the differences in the respective function calls.

When we call the `get_cubes()` function  with the number 6 as the argument, we get the list of cubes as expected.

``````cubes_gen = get_cubes_gen(6)
print(cubes_gen)``````

``Output >> [0, 1, 8, 27, 64, 125]``

Now call the generator the function with the same number 6 as the argument and see what happens. You can call the generator function `get_cubes_gen()` just the way you would call a normal Python function.

``````cubes_gen = get_cubes_gen(6)
print(cubes_gen)``````

If you print out the value of `cubes_gen()`, you’ll get a generator object as opposed to the entire resultant list that contains the cube of each of the numbers.

``Output >> <generator object get_cubes_gen at 0x011B6530>``

So how do you access the elements of the sequence? To code along, start a Python REPL and import the generator function. Here, I have my code in the gen_example.py file, so I’m importing the `get_cubes_gen()` function from the `get_cubes_gen()` module.

``````>>> from gen_example import get_cubes_gen
>>> cubes_gen = get_cubes_gen(6)``````

You can call `next()` with the generator object as the argument. Doing so returns 0, the first element in the sequence

``````>>> next(cubes_gen)
0``````

Now when you call `next()` again, you’ll get the next element in the sequence, which is 1.

``````>>> next(cubes_gen)
1``````

To access the subsequent elements in the sequence, you can continue to call `next()`, as shown:

``````>>> next(cubes_gen)
8
>>> next(cubes_gen)
27
>>> next(cubes_gen)
64
>>> next(cubes_gen)
125``````

For `num = 6`, the resultant sequence is the cube of the numbers 0, 1, 2, 3, 4, and 5. Now that we’ve reached 125, the cube of 5, what happens when you call next again?

We see that a StopIteration exception is raised.

``````>>> next(cubes_gen)
Traceback (most recent call last):
File "", line 1, in
StopIteration``````

Under the hood, the generator function executes until the execution reaches the yield statement, and the control returns to the call site. However, unlike a normal Python function that returns control to the call site once the return statement, a generator function suspends execution temporarily. And it keeps track of its state that helps us get the subsequent elements by calling `next()`

You can also loop through the generator object using a for loop. The control exits the loop when the StopIteration exception is raised (that’s how for loops work under the hood).

``````for cube in cubes_gen:
print(cube)

# Output
0
1
8
27
64
125``````

``cubes_gen = (i**3 for i in range(num))``

Generator Expressions in Python

Another common way to use generators is using generator expressions. Here’s  the generator expression equivalent of the `get_cubes_gen()` function:

``cubes_gen = (i**3 for i in range(num))``

The above generator expression may look similar to list comprehension, except for the use of () in place of []. However, as discussed, the following key differences hold:

• A list comprehension expression generates the entire list and stores it in memory.
• The generator expression, on the other hand, yields the elements of the sequence on demand.

Python Generators vs. Lists: Understanding Performance Improvements

In the sample function call in the previous section, we generated a sequence of cubes of the numbers zero through five. For such small sequences, using a generator may not give you significant performance gains. However, generators are certainly a memory-efficient choice when you work with longer sequences.

To see this in action, generate the sequence of cubes for value of `num` in a  wider range:

``````size_l = []
size_g = []

# run for various values of num
for i in [10, 100, 1000, 10000, 100000, 1000000]:
cubes_l = [j**3 for j in range(i)]
cubes_g = (j**3 for j in range(i))
# get the sizes of static list and generator expression
size_l.append(sys.getsizeof(cubes_l))
size_g.append(sys.getsizeof(cubes_g))``````

Now let us print out the size of the size in memory of the static list and the generator object for the when `num` changes (as in the snippet above):

``````print(f"size_l: {size_l}")
print(f"size_g: {size_g}")``````

From the output, we see that the generator object has a constant memory footprint unlike a list where the memory grows with `num`.This is because a generator performs lazy evaluation and yields the subsequent values in the sequence on demand. It does not compute all the values ahead of time.

``````# Output
size_l: [92, 452, 4508, 43808, 412228, 4348728]
size_g: [56, 56, 56, 56, 56, 56]``````

To get a better idea of how the sizes of the static list and generator change with change in the value of num, we can plot the values of `num` and the sizes of the list and the generators, as shown below:

In the graph above, we see that when `num` increases, the size of the generator is constant, whereas the size of the list is prohibitively large.

Conclusion

In this tutorial, you’ve learned how generators work in Python. The next time you need to work with a large file or dataset, you can consider using generators to iterate efficiently over it. When you use generators, you can iterate over the generator object, read in a line or a small chunk, process it or apply transformations as needed—without having to store the original dataset in memory. However, keep in mind that you cannot store such values in memory for processing at a later time. If you need to, you’ll have to use lists.

Bala Priya C is a technical writer who enjoys creating long-form content. Her areas of interest include math, programming, and data science. She shares her learning with the developer community by authoring tutorials, how-to guides, and more.