Python Generators

Python generators are very powerful for handling operations which require large amount of memory.

Let us start with simple example. Below function prints infinite sequence of numbers.

In [1]:
def generator_example1():
    count = 0
    while True:
        yield count
        count+=1
In [2]:
g = generator_example1()
In [3]:
next(g)
Out[3]:
0
In [4]:
next(g)
Out[4]:
1
In [5]:
next(g)
Out[5]:
2

and so on...

Python Yield

Ok let us revisit our function 'generator_example1()'. What is happening in the below code?

Inside while loop, we have 'yield' statement. Yield breakes out of loop and gives back control to whomever called function generator_exampe1(). In statement 'g = generator_example1()', g is now a geneator as shown below.

In [6]:
def generator_example1():
    count = 0
    while True:
        yield count
        count+=1
In [7]:
g = generator_example1()
In [8]:
g
Out[8]:
<generator object generator_example1 at 0x7f3334416e08>

Once you have a generator function, you can iterate through it using next() function. Since we have a infinite 'while' loop in the genereator_example() function, we can call iterator as many times as we want it. Each time, we use next(), generator starts the execution from previous position and prints a new value.

Python Generator Expression

Python generators can be used outside the function without the 'yield'. Check out the below example.

In [9]:
g = (x for x in range(10))
In [10]:
g
Out[10]:
<generator object <genexpr> at 0x7f3334416f68>

(x for x in range(10)) is a Python generator object. The syntax is quite similar to Python list comprehension except that instead of square brackets, generators are defined using round brackets. As usual, once we have generator object, we can call iterator next() on it to print the values as shown below.

In [11]:
next(g)
Out[11]:
0
In [12]:
next(g)
Out[12]:
1

Python Generator stop Iteration

Python generators will throw 'StopIteration' exception, if there is no value to return for the iterator.

Let us look at following example.

In [13]:
def range_one():
    for x in range(0,1):
        yield x
In [14]:
g = range_one()
In [15]:
next(g)
Out[15]:
0
In [16]:
next(g)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-16-e734f8aca5ac> in <module>
----> 1 next(g)

StopIteration: 

To avoid above error, we can catch exception like this and stop the iteration.

In [17]:
g = range_one()
In [18]:
try:
    print(next(g))
except StopIteration:
    print('Iteration Stopped')
0
In [19]:
try:
    print(next(g))
except StopIteration:
    print('Iteration Stopped')
Iteration Stopped

Python Generator send()

We can pass value to Python Generators using send() function.

In [20]:
def incrment_no():
    while True:
        x = yield
        yield x + 1
In [21]:
g = incrment_no()    # Create our generator
In [22]:
next(g) # It will go to first yield
In [23]:
print(g.send(7)) # value 7 is sent to generator which gets assgined to x, 2nd yield statement gets executed       
8

Python Recursive Generator

Python generators can be used recursively. Check out the below code. In below function, "yield from generator_factorial(n - 1)" is recursive call to function generator_factorial().

In [24]:
def generator_factorial(n):
    if n == 1:
        f = 1
    else:
        a = yield from generator_factorial(n - 1)
        f = n * a
    yield f
    return f
In [25]:
g = generator_factorial(3)
In [26]:
next(g)
Out[26]:
1
In [27]:
next(g)
Out[27]:
2
In [28]:
next(g)
Out[28]:
6

Python Generator throw() Error

Continuing with above example, let us say we want generator to throw error for the factorial of number greater than 100. We can add generator.throw() exception such as shown below.

In [29]:
n  = 100
if n >= 100:
    g.throw(ValueError, 'Only numbers less than 100 are allowed')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-bf449f9fafac> in <module>
      1 n  = 100
      2 if n >= 100:
----> 3     g.throw(ValueError, 'Only numbers less than 100 are allowed')

<ipython-input-24-e76bd978ab03> in generator_factorial(n)
      5         a = yield from generator_factorial(n - 1)
      6         f = n * a
----> 7     yield f
      8     return f

ValueError: Only numbers less than 100 are allowed

Python Generators Memory Efficient

Python generators take very less memory. Let us look at following two examples. In the examples below, note the difference between byte size of memory used by 'Python list' vs 'Python generator'.

In [30]:
import sys
In [31]:
#Python List comprehension
sequence = [x for x in range(1,1000000)]
sys.getsizeof(sequence)
Out[31]:
8697464
In [32]:
#Python Generators
sequence = (x for x in range(1,1000000))
sys.getsizeof(sequence)
Out[32]:
88

Python Generator Performance

One thing to notice here is that, Python generators are slower than Python list comprehension if the memory is large engough to compute. Let us look at below two examples from the performance perspective.

In [33]:
#Python List comprehension
import cProfile
cProfile.run('sum([x for x in range(1,10000000)])')
         5 function calls in 0.455 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.327    0.327    0.327    0.327 <string>:1(<listcomp>)
        1    0.073    0.073    0.455    0.455 <string>:1(<module>)
        1    0.000    0.000    0.455    0.455 {built-in method builtins.exec}
        1    0.054    0.054    0.054    0.054 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


In [34]:
#generators
import cProfile
cProfile.run('sum((x for x in range(1,10000000)))')
         10000004 function calls in 1.277 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    0.655    0.000    0.655    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    1.277    1.277 <string>:1(<module>)
        1    0.000    0.000    1.277    1.277 {built-in method builtins.exec}
        1    0.622    0.622    1.277    1.277 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Check the number of function calls and time the 'Python generator' took to compute the sum compare to Python 'list comprehension'.

Data Pipeline with Python Generator

Let us wrap up this tutorial with Data Pipelines. Python generators are great for building the pipelines.

Let us open a CSV file and iterate through it using Python generator.

In [41]:
def generator_read_csv_file():
    for entry in open('stock.csv'):
        yield entry
In [42]:
g = generator_read_csv_file()
In [43]:
next(g)
Out[43]:
'Date,Open,High,Low,Close,Adj Close,Volume\n'
In [44]:
next(g)
Out[44]:
'1996-08-09,14.250000,16.750000,14.250000,16.500000,15.324463,1601500\n'

Let us say, we want to replace the commas in the CSV for each line with spaces, we can build a pipeline for this.

In [45]:
g1 = (entry for entry in open('stock.csv'))
In [46]:
g2 = (row.replace(","," ") for row in g1)
In [47]:
next(g2)
Out[47]:
'Date Open High Low Close Adj Close Volume\n'
In [48]:
next(g2)
Out[48]:
'1996-08-09 14.250000 16.750000 14.250000 16.500000 15.324463 1601500\n'
In [50]:
next(g2)
Out[50]:
'1996-08-12 16.500000 16.750000 16.375000 16.500000 15.324463 260900\n'

Wrap Up:

It takes a little practice to get hold on Python generators but once mastered, Python generators are very useful for not only building data pipelines but also handling large data operations such as reading a large file.