Python Yield

In this notebook, we are going to discuss about what is Python yield, how to use Yield and it's pros and cons.

What is Python Yield

Yield is a Python keyword used to return from a function without destroying the state of its local variables. If we call a function that contains yield keyword, it's execution starts after the yield statement from which the function previously returned. On the other hand, when you use a function with a return statement, the logic inside function and all its variables get executed from the beginnining with no history of previous state.

The yield statement suspends function’s execution and sends a value back to the caller, but retains present state to enable execution of function where it left off. Upon resuming, the function continues execution immediately after the last yield statement. This allows us to produce a series of values over time rather than computing them all at once.

The above property explains the difference between yield and return as the former can produce a sequence of values but the latter sends only a specific value to the caller.

How Python Yield works

Let us go through few examples to illustrate how yield works.

In [ ]:
def foo(x):
  print("First step.")
  yield x
  print("Second step")
  yield x * 2
  print("Last step")
  yield x ** 2

The function above accespts an argument and then takes 3 steps in returning a sequence related to that input. These steps are:

  • First Step: Prints the number itself.
  • Second Step: Prints the number, doubled.
  • Last Step: Prints the number, squared.

Let us try and call this function now and see what is the result.

In [ ]:
y = foo(2)
y
Out[ ]:
<generator object foo at 0x7f86f4952d50>

Upon calling the function/generator foo, we get a generator object. As discussed in earlier tutorials, we can find out the generated values using the **next()** keyword.

In [ ]:
next(y)
First step.
Out[ ]:
2

As you can see above, due to having a yield statement after the first step, the first generated/returned number is the value which we passed to function foo and the function/generator paused after that. Now we need to proceed to the next stage, we need to get the next value of the sequence.

In [ ]:
next(y)
Second step
Out[ ]:
4

As you can see, upon calling the generator object again, the function resumed working from the last yield and then printed "Second step" and also returned the number 4 and then paused again.

Applications of yield

  • Yield is a better option for faster execution and computation of large datasets.

  • If the size of data to be returned is massive, its better to use yield.

  • Yield can produce infinite stream of data. You can't do that with lists because it would definitely result in memory limit error. The following snippet shows an example of representing infinite streams. (even numbers)

    def even_nums():
      n = 0
      while True:
          yield n
          n += 2
  • For continuous calls to a function, we can make use of the fact that yield pasues and resumes the function on call where the last yield statement stopped.

  • Example - A normal function which returns a sequqnce will create the entire sequence in memory before returning the result. Using yield, we can start getting sequence instantly.

Advantages and Disadvantages of yield

Advantages

  • The values returned in case of yield are stored in local variables and are returned as a sequence therefore requires very less memory and compute resources.

  • Each time, the code execution doesn't start from the beginning since the previous state is retained.

Disadvantages

  • Yield improves the time and space(memory) complexity but the complexity of the code itself increases making it less readable and a bit more difficult to understand.

Examples of Yield

Reading Large Files

A common use of generators and yield is working with large files or massive data streams. Example - Let us count rows of a CSV file.

Usually, our code without yield and generators would look something like this.

# Traditional method of readng files in python

def csv_reader(file_name):
    file = open(file_name)
    result = file.read().split("\n")
    return result

csv_gen = csv_reader("some_file.txt")
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row count is {row_count}")

The above is a very typical approach of reading CSV files. The function 'csv_reader' opens the file into memory, reads the lines and then splits these lines by new line ('\n') to form an array of lines. This approach would work just fine, but what if that file or data stream is massive in size then the time to read the file and store in memory both would increase substantially.

If the file contains 1000 lines for example, it would work on modern computers just fine but what if the file contained 10 million records, there is no way we'd be able to perform this task on normal laptop or pc. The machine might slow down to the point that we need to terminate the program.

Here, the yield keyword would definitely come in handy. If we change that csv_reader function into a generator using the yield, the results would be much different. This is how our new code snippet would look like with the Python 'Yield'.

def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

csv_gen = csv_reader("some_file.txt")
row_count = 0

for row in csv_gen:
    row_count += 1

print(f"Row count is {row_count}")

if we run our code now irrespective of file size, the program will use very minmial memory which is required to read one line at a time every time the generator object is called.

Pagination

Let us use RESTful API for our next example. Usually APIs return collection of data. Consider the following JSON data as an example:

{
      {
        "name": "Big Mac",
        "price": 3.99,
      },
      {
        "name": "Big Tasty",
        "price": 4.99,
      },
      {
        "name": "Chicken Mcdo",
        "price": 3.99,
      }
      ...

Assume that the API query results returned 1000 results. It will not be a good user experience to have the client receive 1000 results in one JSON object not to mention that it can also cause performance issues. So we will resort to pagination. Now there are multiple ways of paginating results but let use 'yield' for this example and load only 10 results per page for a seemless user experience and lighter data transfer.

def result_pagination(page):
    current_page = page
    while current_page >=0:
        results = self.get_queryset(page=current_page)
        yield results

The psuedocode above would return the requested page every time user makes the request.

The pagination solution might not be the best use case of Python 'Yield' but above example illustrates how can we utilize the yield keyword in almost any problem where we are dealing with massive amounts of data and limited memory or compute resources.