기본 콘텐츠로 건너뛰기

[python] What is a metaclass in Python?

source:http://stackoverflow.com/questions/100003/what-is-a-metaclass-in-python

Classes as objects

Before understanding metaclasses, you need to master classes in Python. And Python has a very peculiar idea of what classes are, borrowed from the Smalltalk language.
In most languages, classes are just pieces of code that describe how to produce an object. That's kinda true in Python too:
>>> class ObjectCreator(object):
...       pass
... 

>>> my_object = ObjectCreator()
>>> print(my_object)
<__main__.ObjectCreator object at 0x8974f2c>
But classes are more than that in Python. Classes are objects too.
Yes, objects.
As soon as you use the keyword class, Python executes it and creates an OBJECT. The instruction
>>> class ObjectCreator(object):
...       pass
... 
creates in memory an object with the name "ObjectCreator".
This object (the class) is itself capable of creating objects (the instances), and this is why it's a class.
But still, it's an object, and therefore:
  • you can assign it to a variable
  • you can copy it
  • you can add attributes to it
  • you can pass it as a function parameter
e.g.:
>>> print(ObjectCreator) # you can print a class because it's an object
<class '__main__.ObjectCreator'>
>>> def echo(o):
...       print(o)
... 
>>> echo(ObjectCreator) # you can pass a class as a parameter
<class '__main__.ObjectCreator'>
>>> print(hasattr(ObjectCreator, 'new_attribute'))
False
>>> ObjectCreator.new_attribute = 'foo' # you can add attributes to a class
>>> print(hasattr(ObjectCreator, 'new_attribute'))
True
>>> print(ObjectCreator.new_attribute)
foo
>>> ObjectCreatorMirror = ObjectCreator # you can assign a class to a variable
>>> print(ObjectCreatorMirror.new_attribute)
foo
>>> print(ObjectCreatorMirror())
<__main__.ObjectCreator object at 0x8997b4c>

Creating classes dynamically

Since classes are objects, you can create them on the fly, like any object.
First, you can create a class in a function using class:
>>> def choose_class(name):
...     if name == 'foo':
...         class Foo(object):
...             pass
...         return Foo # return the class, not an instance
...     else:
...         class Bar(object):
...             pass
...         return Bar
...     
>>> MyClass = choose_class('foo') 
>>> print(MyClass) # the function returns a class, not an instance
<class '__main__.Foo'>
>>> print(MyClass()) # you can create an object from this class
<__main__.Foo object at 0x89c6d4c>
But it's not so dynamic, since you still have to write the whole class yourself.
Since classes are objects, they must be generated by something.
When you use the class keyword, Python creates this object automatically. But as with most things in Python, it gives you a way to do it manually.
Remember the function type? The good old function that lets you know what type an object is:
>>> print(type(1))
<type 'int'>
>>> print(type("1"))
<type 'str'>
>>> print(type(ObjectCreator))
<type 'type'>
>>> print(type(ObjectCreator()))
<class '__main__.ObjectCreator'>
Well, type has a completely different ability, it can also create classes on the fly. type can take the description of a class as parameters, and return a class.
(I know, it's silly that the same function can have two completely different uses according to the parameters you pass to it. It's an issue due to backwards compatibility in Python)
type works this way:
type(name of the class, 
     tuple of the parent class (for inheritance, can be empty), 
     dictionary containing attributes names and values)
e.g.:
>>> class MyShinyClass(object):
...       pass
can be created manually this way:
>>> MyShinyClass = type('MyShinyClass', (), {}) # returns a class object
>>> print(MyShinyClass)
<class '__main__.MyShinyClass'>
>>> print(MyShinyClass()) # create an instance with the class
<__main__.MyShinyClass object at 0x8997cec>
You'll notice that we use "MyShinyClass" as the name of the class and as the variable to hold the class reference. They can be different, but there is no reason to complicate things.
type accepts a dictionary to define the attributes of the class. So:
>>> class Foo(object):
...       bar = True
Can be translated to:
>>> Foo = type('Foo', (), {'bar':True})
And used as a normal class:
>>> print(Foo)
<class '__main__.Foo'>
>>> print(Foo.bar)
True
>>> f = Foo()
>>> print(f)
<__main__.Foo object at 0x8a9b84c>
>>> print(f.bar)
True
And of course, you can inherit from it, so:
>>>   class FooChild(Foo):
...         pass
would be:
>>> FooChild = type('FooChild', (Foo,), {})
>>> print(FooChild)
<class '__main__.FooChild'>
>>> print(FooChild.bar) # bar is inherited from Foo
True
Eventually you'll want to add methods to your class. Just define a function with the proper signature and assign it as an attribute.
>>> def echo_bar(self):
...       print(self.bar)
... 
>>> FooChild = type('FooChild', (Foo,), {'echo_bar': echo_bar})
>>> hasattr(Foo, 'echo_bar')
False
>>> hasattr(FooChild, 'echo_bar')
True
>>> my_foo = FooChild()
>>> my_foo.echo_bar()
True
You see where we are going: in Python, classes are objects, and you can create a class on the fly, dynamically.
This is what Python does when you use the keyword class, and it does so by using a metaclass.

What are metaclasses (finally)

Metaclasses are the 'stuff' that creates classes.
You define classes in order to create objects, right?
But we learned that Python classes are objects.
Well, metaclasses are what create these objects. They are the classes' classes, you can picture them this way:
MyClass = MetaClass()
MyObject = MyClass()
You've seen that type lets you do something like this:
MyClass = type('MyClass', (), {})
It's because the function type is in fact a metaclass. type is the metaclass Python uses to create all classes behind the scenes.
Now you wonder why the heck is it written in lowercase, and not Type?
Well, I guess it's a matter of consistency with str, the class that creates strings objects, and int the class that creates integer objects. type is just the class that creates class objects.
You see that by checking the __class__ attribute.
Everything, and I mean everything, is an object in Python. That includes ints, strings, functions and classes. All of them are objects. And all of them have been created from a class:
>>> age = 35
>>> age.__class__
<type 'int'>
>>> name = 'bob'
>>> name.__class__
<type 'str'>
>>> def foo(): pass
>>> foo.__class__
<type 'function'>
>>> class Bar(object): pass
>>> b = Bar()
>>> b.__class__
<class '__main__.Bar'>
Now, what is the __class__ of any __class__ ?
>>> age.__class__.__class__
<type 'type'>
>>> name.__class__.__class__
<type 'type'>
>>> foo.__class__.__class__
<type 'type'>
>>> b.__class__.__class__
<type 'type'>
So, a metaclass is just the stuff that creates class objects.
You can call it a 'class factory' if you wish.
type is the built-in metaclass Python uses, but of course, you can create your own metaclass.

The __metaclass__ attribute

You can add a __metaclass__ attribute when you write a class:
class Foo(object):
  __metaclass__ = something...
  [...]
If you do so, Python will use the metaclass to create the class Foo.
Careful, it's tricky.
You write class Foo(object) first, but the class object Foo is not created in memory yet.
Python will look for __metaclass__ in the class definition. If it finds it, it will use it to create the object class Foo. If it doesn't, it will use type to create the class.
Read that several times.
When you do:
class Foo(Bar):
  pass
Python does the following:
Is there a __metaclass__ attribute in Foo?
If yes, create in memory a class object (I said a class object, stay with me here), with the name Foo by using what is in __metaclass__.
If Python can't find __metaclass__, it will look for a __metaclass__ in Bar (the parent class), and try to do the same.
If Python can't find __metaclass__ in any parent, it will look for a __metaclass__ at the MODULE level, and try to do the same.
Then if it can't find any __metaclass__ at all, it will use type to create the class object.
Now the big question is, what can you put in __metaclass__ ?
The answer is: something that can create a class.
And what can create a class? type, or anything that subclasses or uses it.

Custom metaclasses

The main purpose of a metaclass is to change the class automatically, when it's created.
You usually do this for APIs, where you want to create classes matching the current context.
Imagine a stupid example, where you decide that all classes in your module should have their attributes written in uppercase. There are several ways to do this, but one way is to set __metaclass__ at the module level.
This way, all classes of this module will be created using this metaclass, and we just have to tell the metaclass to turn all attributes to uppercase.
Luckily, __metaclass__ can actually be any callable, it doesn't need to be a formal class (I know, something with 'class' in its name doesn't need to be a class, go figure... but it's helpful).
So we will start with a simple example, by using a function.
# the metaclass will automatically get passed the same argument
# that you usually pass to `type`
def upper_attr(future_class_name, future_class_parents, future_class_attr):
  """
    Return a class object, with the list of its attribute turned 
    into uppercase.
  """

  # pick up any attribute that doesn't start with '__' and uppercase it
  uppercase_attr = {}
  for name, val in future_class_attr.items():
      if not name.startswith('__'):
          uppercase_attr[name.upper()] = val
      else:
          uppercase_attr[name] = val

  # let `type` do the class creation
  return type(future_class_name, future_class_parents, uppercase_attr)

__metaclass__ = upper_attr # this will affect all classes in the module

class Foo(): # global __metaclass__ won't work with "object" though
  # but we can define __metaclass__ here instead to affect only this class
  # and this will work with "object" children
  bar = 'bip'

print(hasattr(Foo, 'bar'))
# Out: False
print(hasattr(Foo, 'BAR'))
# Out: True

f = Foo()
print(f.BAR)
# Out: 'bip'
Now, let's do exactly the same, but using a real class for a metaclass:
# remember that `type` is actually a class like `str` and `int`
# so you can inherit from it
class UpperAttrMetaclass(type): 
    # __new__ is the method called before __init__
    # it's the method that creates the object and returns it
    # while __init__ just initializes the object passed as parameter
    # you rarely use __new__, except when you want to control how the object
    # is created.
    # here the created object is the class, and we want to customize it
    # so we override __new__
    # you can do some stuff in __init__ too if you wish
    # some advanced use involves overriding __call__ as well, but we won't
    # see this
    def __new__(upperattr_metaclass, future_class_name, 
                future_class_parents, future_class_attr):

        uppercase_attr = {}
        for name, val in future_class_attr.items():
            if not name.startswith('__'):
                uppercase_attr[name.upper()] = val
            else:
                uppercase_attr[name] = val

        return type(future_class_name, future_class_parents, uppercase_attr)
But this is not really OOP. We call type directly and we don't override call the parent __new__. Let's do it:
class UpperAttrMetaclass(type): 

    def __new__(upperattr_metaclass, future_class_name, 
                future_class_parents, future_class_attr):

        uppercase_attr = {}
        for name, val in future_class_attr.items():
            if not name.startswith('__'):
                uppercase_attr[name.upper()] = val
            else:
                uppercase_attr[name] = val

        # reuse the type.__new__ method
        # this is basic OOP, nothing magic in there
        return type.__new__(upperattr_metaclass, future_class_name, 
                            future_class_parents, uppercase_attr)
You may have noticed the extra argument upperattr_metaclass. There is nothing special about it: a method always receives the current instance as first parameter. Just like you have self for ordinary methods.
Of course, the names I used here are long for the sake of clarity, but like for self, all the arguments have conventional names. So a real production metaclass would look like this:
class UpperAttrMetaclass(type): 

    def __new__(cls, clsname, bases, dct):

        uppercase_attr = {}
        for name, val in dct.items():
            if not name.startswith('__'):
                uppercase_attr[name.upper()] = val
            else:
                uppercase_attr[name] = val

        return type.__new__(cls, clsname, bases, uppercase_attr)
We can make it even cleaner by using super, which will ease inheritance (because yes, you can have metaclasses, inheriting from metaclasses, inheriting from type):
class UpperAttrMetaclass(type): 

    def __new__(cls, clsname, bases, dct):

        uppercase_attr = {}
        for name, val in dct.items():
            if not name.startswith('__'):
                uppercase_attr[name.upper()] = val
            else:
                uppercase_attr[name] = val

        return super(UpperAttrMetaclass, cls).__new__(cls, clsname, bases, uppercase_attr)
That's it. There is really nothing more about metaclasses.
The reason behind the complexity of the code using metaclasses is not because of metaclasses, it's because you usually use metaclasses to do twisted stuff relying on introspection, manipulating inheritance, vars such as __dict__, etc.
Indeed, metaclasses are especially useful to do black magic, and therefore complicated stuff. But by themselves, they are simple:
  • intercept a class creation
  • modify the class
  • return the modified class

Why would you use metaclasses classes instead of functions?

Since __metaclass__ can accept any callable, why would you use a class since it's obviously more complicated?
There are several reasons to do so:
  • The intention is clear. When you read UpperAttrMetaclass(type), you know what's going to follow
  • You can use OOP. Metaclass can inherit from metaclass, override parent methods. Metaclasses can even use metaclasses.
  • You can structure your code better. You never use metaclasses for something as trivial as the above example. It's usually for something complicated. Having the ability to make several methods and group them in one class is very useful to make the code easier to read.
  • You can hook on __new____init__ and __call__. Which will allow you to do different stuff. Even if usually you can do it all in __new__, some people are just more comfortable using__init__.
  • These are called metaclasses, damn it! It must mean something!

Why the hell would you use metaclasses?

Now the big question. Why would you use some obscure error prone feature?
Well, usually you don't:
Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't (the people who actually need them know with certainty that they need them, and don't need an explanation about why).
Python Guru Tim Peters
The main use case for a metaclass is creating an API. A typical example of this is the Django ORM.
It allows you to define something like this:
class Person(models.Model):
  name = models.CharField(max_length=30)
  age = models.IntegerField()
But if you do this:
guy = Person(name='bob', age='35')
print(guy.age)
It won't return an IntegerField object. It will return an int, and can even take it directly from the database.
This is possible because models.Model defines __metaclass__ and it uses some magic that will turn the Person you just defined with simple statements into a complex hook to a database field.
Django makes something complex look simple by exposing a simple API and using metaclasses, recreating code from this API to do the real job behind the scenes.

The last word

First, you know that classes are objects that can create instances.
Well in fact, classes are themselves instances. Of metaclasses.
>>> class Foo(object): pass
>>> id(Foo)
142630324
Everything is an object in Python, and they are all either instances of classes or instances of metaclasses.
Except for type.
type is actually its own metaclass. This is not something you could reproduce in pure Python, and is done by cheating a little bit at the implementation level.
Secondly, metaclasses are complicated. You may not want to use them for very simple class alterations. You can change classes by using two different techniques:
  • monkey patching
  • class decorators
99% of the time you need class alteration, you are better off using these.
But 99% of the time, you don't need class alteration at all.

댓글

이 블로그의 인기 게시물

[linux] 뻔하지 않은 파일 퍼미션(file permissions) 끄적임. 정말 속속들이 아니?

1. [특수w]내 명의의 디렉토리라면 제아무리 루트가 만든 파일에 rwxrwxrwx 퍼미션이라 할지라도 맘대로 지울 수 있다. 즉 내 폴더안의 파일은 뭐든 지울 수 있다. 2. [일반rx]하지만 읽기와 쓰기는 other의 권한을 따른다. 3.[일반rwx]단 남의 계정 폴더는 그 폴더의 퍼미션을 따른다. 4.[일반]만약 굳이 sudo로 내 소유로 파일을 넣어놓더라도 달라지는건 없고, 단지 그 폴더의 other퍼미션에 write권한이 있으면 파일을 만들고 삭제할 수 있다. 5.디렉토리의 r권한은 내부의 파일이름 정도만 볼 수있다. 하지만 ls 명령의 경우 소유자, 그룹, 파일크기 등의 정보를 보는 명령어므로 정상적인 실행은 불가능하고, 부분적으로 실행됨. frank@localhost:/export/frankdir$ ls rootdir/ ls: cannot access rootdir/root: 허가 거부 ls: cannot access rootdir/fa: 허가 거부 fa  root #이처럼 속한 파일(폴더)만 딸랑 보여준다. frank@localhost:/export/frankdir$ ls -al rootdir/ # al옵션이 모두 물음표 처리된다.. ls: cannot access rootdir/root: 허가 거부 ls: cannot access rootdir/..: 허가 거부 ls: cannot access rootdir/.: 허가 거부 ls: cannot access rootdir/fa: 허가 거부 합계 0 d????????? ? ? ? ?             ? . d????????? ? ? ? ?             ? .. -????????? ? ? ? ?             ? fa -????????? ? ? ? ?             ? root 하지만 웃긴건, r에는 읽기 기능이 가능하므로 그 폴더 안으로 cd가 되는 x권한이 없더라도 어떤 파일이 있는지 목록 정도는 알 수 있다. 하지만 r이라고

[인코딩] MS949부터 유니코드까지

UHC = Unified Hangul Code = 통합형 한글 코드 = ks_c_5601-1987 이는 MS사가 기존 한글 2,350자밖에 지원하지 않던 KS X 1001이라는 한국 산업 표준 문자세트를 확장해 만든 것으로, 원래 문자세트의 기존 내용은 보존한 상태로 앞뒤에 부족한 부분을 채워넣었다. (따라서 KS X 1001에 대한 하위 호환성을 가짐) 그럼, cp949는 무엇일까? cp949는 본래 코드 페이지(code page)라는 뜻이라 문자세트라 생각하기 십상이지만, 실제로는 인코딩 방식이다. 즉, MS사가 만든 "확장 완성형 한글 ( 공식명칭 ks_c_5601-1987 ) "이라는 문자세트를 인코딩하는 MS사 만의 방식인 셈이다. cp949 인코딩은 표준 인코딩이 아니라, 인터넷 상의 문자 송수신에 사용되지는 않는다. 하지만, "확장 완성형 한글" 자체가 "완성형 한글"에 대한 하위 호환성을 고려해 고안됐듯, cp949는 euc-kr에 대해 (하위) 호환성을 가진다. 즉 cp949는 euc-kr을 포괄한다. 따라서, 윈도우즈에서 작성되어 cp949로 인코딩 되어있는 한글 문서들(txt, jsp 등등)은 사실, euc-kr 인코딩 방식으로 인터넷 전송이 가능하다. 아니, euc-kr로 전송해야만 한다.(UTF-8 인코딩도 있는데 이것은 엄밀히 말해서 한국어 인코딩은 아니고 전세계의 모든 문자들을 한꺼번에 인코딩하는 것이므로 euc-kr이 한국어 문자세트를 인코딩할 수 있는 유일한 방식임은 변하지 않는 사실이다.) 물론 이를 받아보는 사람도 euc-kr로 디코딩을 해야만 문자가 깨지지 않을 것이다. KS X 1001을 인코딩하는 표준 방식은 euc-kr이며 인터넷 상에서 사용 가능하며, 또한 인터넷상에서 문자를 송수신할때만 사용.(로컬하드에 저장하는데 사용하는 인코딩방식으로는 쓰이지 않는 듯하나, *nix계열의 운영체제에서는 LANG을 euc-kr로 설정 가능하기도 한걸

[hooking, 후킹, 훅킹] Hooking이란?

source: http://jinhokwon.blogspot.kr/2013/01/hooking.html Hooking 이란? [출처] http://blog.daum.net/guyya/2444691 훅킹(Hooking)이란 이미 작성되어 있는 코드의 특정 지점을 가로채서 동작 방식에 변화를 주는 일체의 기술 이다. 훅이란 낚시바늘같은 갈고리 모양을 가지는데 여기서는 코드의 중간 부분을 낚아채는 도구라는 뜻으로 사용된다. 대상 코드의 소스를 수정하지 않고 원하는 동작을 하도록 해야 하므로 기술적으로 어렵기도 하고 운영체제의 통상적인 실행 흐름을 조작해야 하므로 때로는 위험하기도 하다. 훅킹을 하는 방법에는 여러 가지가 있는데 과거 도스 시절에 흔히 사용하던 인터럽터 가로채기 기법이나 바로 앞에서 알아본 서브클래싱도 훅킹 기법의 하나라고 할 수 있다. 이외에도 미리 약속된 레지스트리 위치에 훅 DLL의 이름을 적어 주거나 BHO(Browser Helper Object)나 응용 프로그램 고유의 추가 DLL(Add in)을 등록하는 간단한 방법도 있고 PE 파일의 임포트 함수 테이블을 자신의 함수로 변경하기, CreateRemoteThread 함수로 다른 프로세스의 주소 공간에 DLL을 주입(Injection)하는 방법, 메모리의 표준 함수 주소를 덮어 쓰는 꽤 어려운 방법들도 있다. 이런 고급 훅킹 기술은 이 책의 범위를 벗어나므로 여기서는 소개만 하고 다루지는 않기로 한다. 이 절에서 알아볼 메시지 훅은 윈도우로 전달되는 메시지를 가로채는 기법으로 다양한 훅킹 방법중의 하나이다. 메시지 기반의 윈도우즈에서는 운영체제와 응용 프로그램, 또는 응용 프로그램 사이나 응용 프로그램 내부의 컨트롤끼리도 많은 메시지들을 주고 받는다. 훅(Hook)이란 메시지가 목표 윈도우로 전달되기 전에 메시지를 가로채는 특수한 프로시저이다. 오고 가는 메시지를 감시하기 위한 일종의 덫(Trap)인 셈인데 일단 응용 프로그램이 훅 프로시저를 설치하면 메시지가 윈도우로 보내지기