Iterators and Generators

In this notebook, we would explore the difference between iterators and generators, how to use them and also the best use cases for each of them.

Iterators

As the name states, a Python iterator is an object that you can iterate upon, which returns one object at a time, therefore enables us to traverse through all values as well. Iterators are implicitly implemented in for loops and python generators.

The functions iter() and next() which we will learn more later in this tutorial are from the iterators.

The objects which we can iterate upon are called iterables. The following are examples of iterables:

Lists.
Strings.
Tuples.

Iterator Objects and Protocols

# myList is a python list which we learned before is also an iterable.
myList = [1,3,5,7]

We then apply iter() function to create a Python iterator object.

iterator_obj = iter(myList)
iterator_obj

<list_iterator at 0x7fdc36ab2bb0>

As we can see, now we have a list iterator object. What about accessing the values of our iterable? This is where we second function of the iterator protocol i.e next() comes in.

Using the next() function it will return the next value inside the iterator object in line. So at first it will return 1, then when call it again, it'll return 3, then 5, then 7. But let's explore what will happen when the last iterator object value has been reached.

next(iterator_obj)

1

next(iterator_obj)

3

next(iterator_obj)

5

next(iterator_obj)

7

next(iterator_obj)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-9-31379ae42bad> in <module>
----> 1 next(iterator_obj)

StopIteration:

As you can see, upon reaching the last element of the iterator object, calling next() function again will raise a StopIteration exception. This gives rise to the notion that looping over iterables to access values is a better and rather more effecient way.

FOR loop implementation

Internally, the for loop creates an iterator object and accesses it's values one by one until the StopIteration exception is raised. This is how a for loop is internally implemented.

iter_obj = iter(iterable)
while True:
    try:
        element(next(iter_obj))
    except StopIteration:
        break

As you can see, the for loop is actually internally using the iterator protocol with an exception handling to iterate over iterables and accessing their values.

Creating our first Python Iterator

Now that we know what is the iterator protocol functions and how it works, we can now finally create our own python iterators. So let's create our own very first python iterator which will be responsible for squaring integers.

class MyIterator:
    # Instantiation method stores the class input in max attribute
    # to act as check later on
    def __init__(self, max = 0):
        self.max = max
    
    # Sets n to zero
    def __iter__(self):
        self.n = 0
        return self
    
    # Checks if the value of n has reached max or not, if it didn't
    # it will square the number stored at n and increment n by one.
    def __next__(self):
        if self.n <= self.max:
            res = self.n ** 2
            self.n += 1
            return res
        else:
            raise StopIteration

So our iterator has two main attributes, max and n.

max - an attribute to store user input and acts as check for max value reached
n - an attribute to check against max and incremented each time it doesn't.

Now that we wrote our first iterator, let's try it out.

a = MyIterator(4)
a

<__main__.MyIterator at 0x7fdc36ab2ee0>

# We now use the __iter__ method we defined previously to initiate
# the attribute n with zero.
a.__iter__()
a.n

0

a.__next__()

0

a.n

1

As we can see in the previous two code blocks, the first value squared was zero and then the value of n was incremented by 1. If we keep on using the methods next() and iter() that we defined, we'll find out that our iterator works as needed.

print("2nd Iteration")
print("n: %d, squared: %d" % (a.n, a.__next__()))
print("New value for n: ", a.n)
print("3rd Iteration")
print("n: %d, squared: %d" % (a.n, a.__next__()))
print("New value for n: ", a.n)

2nd Iteration
n: 1, squared: 1
New value for n:  2
3rd Iteration
n: 2, squared: 4
New value for n:  3

Benefits of Iterators

Saving Resources: Iterators are best known for saving resources. Only one element is stored in the memory at a time.If it wasn't for iterators and should we have used lists, all the values would've been stored at once, which means more memory and less efficient.

This can come in handy at almost all types of applications, ranging from web applications to AI and neural network models. Whenever we are thinking about minimizing memory usage, we can always resort to iterators.

Exercise

Now that we know what are iterables, iterators and iterator protocol, let's dive into writing another custom iterator that reverses the iteration over an iterable.

class ReverseIterator:
    
    # Instantiation method taking in a list and storing it in attribute called data to iterate upon
    # Attribute called index to mark the length of the collection. The length of the collection matches
    # the last index of the collection which is where we'll start our iterations from and go backwards.    
    
    def __init__(self, collectnot be exposed toion):
        self.data = collection
        self.index = len(self.data)
    
    def __iter__(self):
        return self
    
    # The __next__ method checks if the index has reached the 0 (i.e the first element of the collection)
    # If so, it raises a StopIteration exception since it's the last element to iterate on.
    # Otherwise, it'll return the element with the current index from the collection and reduce the index by 1
    # to get to the preceeding element.Python Generator functions allow you to declare a function that behaves likes an iterator, allowing programmers to make an iterator in a fast, easy, and clean way. An iterator is an object that can be iterated or looped upon. It is used to abstract a container of data to make it behave like an iterable object. 
    
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

Generators

Python generators are somewhat similar to iterators. The resemblance can be confusing at times but the main difference is that iterators use return while generators use the keyword yield instead. We'll get to that in a minute.

Generators in python is dedicated to generate a sequence of values of any data type.The generators let us process only one value at a time and not store the entire values of the sequence into the memory. This can be very useful while processing or dealing with very large numbers or big files.

The usage of yield in generators is what gives it the edge over iterators. The yield keyword allows the generator function to pause and store the state of current variables (this is why iterators are more memory-effecient) so that we can resume the generator function again anytime we need. Please refer to the examples below.

Comparison between iterators and generators

In iterators, we need to make use of the interator protocol methods (iter() and next()) but generators are simpler as we only need to use a function.
Generators use yield, iterators don't.
Implementing our own iterators requires us writing a class as shown earlier, generators don't need classes in python.
Generators are faster than iterators but iterators are more memory-effecient.

Writing your first generator function

Since our first iterator implementation was squaring a collection, our first generator function will do the same in order for you to understand how much easier generators are to write and implement.

def gen(n):
    for i in range(n):
        yield i**2

That's right.. That's it. The yield here is just a pause only since generators let us process one sequence value at a time. Now let's try and run this generator function.

g = gen(100000)
g

<generator object gen at 0x7f86cc3e49e0>

As we can see, a generator object has been created and stored at g. Now we can iterate over this object and get the values of squares.

for i in g:
    print(i)

Depending on your resources, sometimes large numbers as the one we passed on to the generator function in the above example would consume all of your memory. You can try that out using normal loops to iterate over a large number and then try again using generators to see the difference.

We can also use the next() method to iterate over the generator object.

Comparing memory efficiency of iterators and generators.

Generator

def func():
    i=1
    while i>0:
        yield i
        i-=1
print(func().__sizeof__())

96

Iterator

iter([1,2]).__sizeof__()

32

As you can see above, the generator and iterator having the same functionality but still consuming different memory. Iterator is using lesser memory than generators.

Benefits of generators

Working with data streams or large files - Usually for large csv files for example, we'd use a library like csv_reader. However, the amount of computation needed for extremely large files would probably exceed your memory resources. Suppose we want to have the rows of the file separately stored into an array or have the count of the rows instantly available, csv_reader will probably fail at counting large number of rows, but with generators using yield statement, it is rather a trivial task.

Generating Infinite Sequences - Since your computer memory is finite, an infinite sequence will definitly use all of it, which is why we would use generators for this task. Here is a little snippet to generate an infinite sequence.
```
def infinite_sequence():
  num = 0
  while True:
      yield num
      num += 1
```

Example - Generating Fibonacci Numbers

def fibonacci(limit):
    # Initializing the first fibonacci numbers
    a, b = 0, 1
    
    # We need the generator to yield fibonacci values one by one
    # until the limit is reached.
    while a < limit:
        yield a
        # As you can notice here, the yield takes place
        # prior to calculating the upcoming number, so when the
        # generator is resumed, it will return back to this point
        # and resumes from there.
        a, b = b, a+b

Now lets try it out!

x = fibonacci(30)

next(x)

0

next(x)

1

next(x)

1

next(x)

2

next(x)

3

for i in x:
    print(i)

5
8
13
21