기본 콘텐츠로 건너뛰기

[python] yield keyword

source: http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained

The Python yield keyword explained


To understand what yield does, you must understand what generators are. And before generators come iterables.

Iterables

When you create a list, you can read its items one by one, and it's called iteration:
>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3
Mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:
>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4
Everything you can use "for... in..." on is an iterable: lists, strings, files... These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.

Generators

Generators are iterators, but you can only iterate over them once. It's because they do not store all the values in memory, they generate the values on the fly:
>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4
It is just the same except you used () instead of []. BUT, you can not perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

Yield is a keyword that is used like return, except the function will return a generator.
>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4
Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.
To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)
Then, your code will be run each time the for uses the generator.
Now the hard part:
The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.
The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.

-------------------------------------------------------EXAMPLE---------------------------------------------------
Use generators for fetching large db record sets (Python recipe)
When using the python DB API, it's tempting to always use a cursor's fetchall() method so that you can easily iterate through a result set. For very large result sets though, this could be expensive in terms of memory (and time to wait for the entire result set to come back). You can use fetchmany() instead, but then have to manage looping through the intemediate result sets. Here's a generator that simplifies that for you.
Python, 11 lines  
Copy to clipboard
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# This code require Python 2.2.1 or later
from __future__ import generators    # needs to be at the top of your module

def ResultIter(cursor, arraysize=1000):
    'An iterator that uses fetchmany to keep memory usage down'
    while True:
        results = cursor.fetchmany(arraysize)
        if not results:
            break
        for result in results:
            yield result
To iterate through the result of a query, you often see code like this:
# where con is a DB API 2.0 database connection object
cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')
for result in cursor.fetchall():
    doSomethingWith(result)
This is fine if fetchall() returns a small result set, but not so great if the query result is very large, or takes a long time to return. 'very large' and 'long time' is relative of course, but in any case it's easy to see that cursor.fetchall() is going to need to allocate enough memory to store the entire result set in memory at once. In addition, the doSomethingWith function isn't going to get called until that entire query finishes as well.
Doing it one at a time with cursor.fetchone() is an option, but doesn't take advantage of the database's efficiency when returning multiple records for a single (as opposed to multiple) queries.
To address this, there's a cursor.fetchmany() method that returns the next 'n' rows of the query, allowing you to strike a time/space compromise between the other two options. The ResultIter function shown here provides a generator-based implementation that lets you take advantage of fetchmany(), but still use the simple notation of fetchall()
ResultIter would be used like so:
...
# where con is a DB-API 2.0 database connection object
cursor = con.cursor()
cursor.execute('select * from HUGE_TABLE')
for result in ResultIter(cursor):
    doSomethingWith(result)
This looks similar to code above, but internally the ResultIter generator is chunking the database calls into a series of fetchmany() calls. The default here is that a 1000 records at a time are fetched, but you can change that according to your own requirements (either by changing the default, or just using the second parameter to ResultIter(). As always, trying different values with the profiler is probably a good idea...performance could vary based on schema, database type, and/or choice of python DB API 2.0 module.

댓글

이 블로그의 인기 게시물

[인코딩] MS949부터 유니코드까지

UHC = Unified Hangul Code = 통합형 한글 코드 = ks_c_5601-1987 이는 MS사가 기존 한글 2,350자밖에 지원하지 않던 KS X 1001이라는 한국 산업 표준 문자세트를 확장해 만든 것으로, 원래 문자세트의 기존 내용은 보존한 상태로 앞뒤에 부족한 부분을 채워넣었다. (따라서 KS X 1001에 대한 하위 호환성을 가짐) 그럼, cp949는 무엇일까? cp949는 본래 코드 페이지(code page)라는 뜻이라 문자세트라 생각하기 십상이지만, 실제로는 인코딩 방식이다. 즉, MS사가 만든 "확장 완성형 한글 ( 공식명칭 ks_c_5601-1987 ) "이라는 문자세트를 인코딩하는 MS사 만의 방식인 셈이다. cp949 인코딩은 표준 인코딩이 아니라, 인터넷 상의 문자 송수신에 사용되지는 않는다. 하지만, "확장 완성형 한글" 자체가 "완성형 한글"에 대한 하위 호환성을 고려해 고안됐듯, cp949는 euc-kr에 대해 (하위) 호환성을 가진다. 즉 cp949는 euc-kr을 포괄한다. 따라서, 윈도우즈에서 작성되어 cp949로 인코딩 되어있는 한글 문서들(txt, jsp 등등)은 사실, euc-kr 인코딩 방식으로 인터넷 전송이 가능하다. 아니, euc-kr로 전송해야만 한다.(UTF-8 인코딩도 있는데 이것은 엄밀히 말해서 한국어 인코딩은 아니고 전세계의 모든 문자들을 한꺼번에 인코딩하는 것이므로 euc-kr이 한국어 문자세트를 인코딩할 수 있는 유일한 방식임은 변하지 않는 사실이다.) 물론 이를 받아보는 사람도 euc-kr로 디코딩을 해야만 문자가 깨지지 않을 것이다. KS X 1001을 인코딩하는 표준 방식은 euc-kr이며 인터넷 상에서 사용 가능하며, 또한 인터넷상에서 문자를 송수신할때만 사용.(로컬하드에 저장하는데 사용하는 인코딩방식으로는 쓰이지 않는 듯하나, *nix계열의 운영체제에서는 LANG을 euc-kr로 설정 가능하기도 한걸...

세션아이디 session Id

source : http://jinuine.blogspot.kr/ A session ID is a unique number that a Web site's server assigns a specific user for the duration of that user's visit ( session ). The session ID can be stored as a cookie , form field, or URL (Uniform Resource Locator). Some Web servers generate session IDs by simply incrementing static numbers. However, most servers use algorithm s that involve more complex methods, such as factoring in the date and time of the visit along with other variable s defined by the server administrator. Every time an Internet user visits a specific Web site, a new session ID is assigned. Closing a browser and then reopening and visiting the site again generates a new session ID. However, the same session ID is sometimes maintained as long as the browser is open, even if the user leaves the site in question and returns. In some cases, Web servers terminate a session and assign a new session ID after a few minutes of inactivity. Session IDs, in t...

[맞춤법] 안돼(o) vs 안되(x); 안돼요(o) 안되요(x); 안되지(o) vs 안돼지(x);

source:  http://k.daum.net/qna/view.html?qid=0FKVD&l_cid=Q&l_st=1 쉽게 구분하는 방법만 말씀드리겠습니다.   안돼요는 안되어요가 줄어든 말입니다. 예를 들어보겠습니다.   당신이 그러면 안되지. 당신이 그러면 안돼지.   첫번째 문장이 맞고 두번째 문장이 틀립니다. 두번째 문장을 '당신이 그러면 안되어지'로 바꾸면 말이 이상하지요. '안돼지'로 쓸 수 있는 것은 '안되어지'로 쓸 수 있는 것입니다.   요즘 사업이 잘 안(돼서 되서) 죄송합니다. '돼서'와 '되서' 가운데 어떤 것이 맞을까요? 사업이 잘 '안되어서'가 말이 되니까 '안돼서'가 맞습니다.   즉, 이 두가지를 쉽게 구분하는 방법은 '돼' 자리에 '되어'를 넣어봐서 말이 되면 '돼'고 말이 안되면 '되'를 쓰면 됩니다.   아니면   되 자리에 하를 넣어보고   돼 자리에 해를 넣어서 어색하지 않으면 그대로 쓰면 됩니다.   안되요는 안하요가 되니 틀린 말이고   안돼요는 안해요가 되니 맞는 말입니다.   결국 안돼요가 표준어입니다.