The Python yield keyword explained

To understand what yield does, you must understand what generators are. And before generators come iterables.

Iterables

When you create a list, you can read its items one by one, and it's called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

Mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use "for... in..." on is an iterable: lists, strings, files... These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.

Generators

Generators are iterators, but you can only iterate over them once. It's because they do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you can not perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

Yield is a keyword that is used like return, except the function will return a generator.

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

Then, your code will be run each time the for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.

The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.

-------------------------------------------------------EXAMPLE---------------------------------------------------

source: http://code.activestate.com/recipes/137270-use-generators-for-fetching-large-db-record-sets/

Use generators for fetching large db record sets (Python recipe)

When using the python DB API, it's tempting to always use a cursor's fetchall() method so that you can easily iterate through a result set. For very large result sets though, this could be expensive in terms of memory (and time to wait for the entire result set to come back). You can use fetchmany() instead, but then have to manage looping through the intemediate result sets. Here's a generator that simplifies that for you.

# This code require Python 2.2.1 or later
from __future__ import generators    # needs to be at the top of your module

def ResultIter(cursor, arraysize=1000):
    'An iterator that uses fetchmany to keep memory usage down'
    while True:
        results = cursor.fetchmany(arraysize)
        if not results:
            break
        for result in results:
            yield result

To iterate through the result of a query, you often see code like this:

# where con is a DB API 2.0 database connection object
cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')
for result in cursor.fetchall():
    doSomethingWith(result)

This is fine if fetchall() returns a small result set, but not so great if the query result is very large, or takes a long time to return. 'very large' and 'long time' is relative of course, but in any case it's easy to see that cursor.fetchall() is going to need to allocate enough memory to store the entire result set in memory at once. In addition, the doSomethingWith function isn't going to get called until that entire query finishes as well.

Doing it one at a time with cursor.fetchone() is an option, but doesn't take advantage of the database's efficiency when returning multiple records for a single (as opposed to multiple) queries.

To address this, there's a cursor.fetchmany() method that returns the next 'n' rows of the query, allowing you to strike a time/space compromise between the other two options. The ResultIter function shown here provides a generator-based implementation that lets you take advantage of fetchmany(), but still use the simple notation of fetchall()

ResultIter would be used like so:

...
# where con is a DB-API 2.0 database connection object
cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')
for result in ResultIter(cursor):
    doSomethingWith(result)

This looks similar to code above, but internally the ResultIter generator is chunking the database calls into a series of fetchmany() calls. The default here is that a 1000 records at a time are fetched, but you can change that according to your own requirements (either by changing the default, or just using the second parameter to ResultIter(). As always, trying different values with the profiler is probably a good idea...performance could vary based on schema, database type, and/or choice of python DB API 2.0 module.

Jinuine Blog

이 블로그 검색

[python] yield keyword

The Python yield keyword explained

Iterables

Generators

Yield

댓글

댓글 쓰기

이 블로그의 인기 게시물

[인코딩] MS949부터 유니코드까지

[맞춤법] 안돼(o) vs 안되(x); 안돼요(o) 안되요(x); 안되지(o) vs 안돼지(x);

세션아이디 session Id