Thanks @Hanaasagi for pointing out the errors #19 of this article
- related file
- how does attribute access work in python?
- method_descriptor
- how to change the behavior of attribute access?
- read more
- cpython/Objects/descrobject.c
- cpython/Include/descrobject.h
- cpython/Objects/object.c
- cpython/Include/object.h
- cpython/Objects/typeobject.c
- cpython/Include/cpython/object.h
let's see an example first before we look into how descriptor object implements
print(type(str.center)) # <class 'method_descriptor'>
print(type("str".center)) # <class 'builtin_function_or_method'>
what is type method_descriptor ? why will str.center
returns a method_descriptor object, but "str".center
returns a builtin_function_or_method ? how does attribute access work in python ?
this is the defination of inspect.ismethoddescriptor
and inspect.isdatadescriptor
def ismethoddescriptor(object):
if isclass(object) or ismethod(object) or isfunction(object):
# mutual exclusion
return False
tp = type(object)
return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
def isdatadescriptor(object):
"""Return true if the object is a data descriptor.
Data descriptors have both a __get__ and a __set__ attribute. Examples are
properties (defined in Python) and getsets and members (defined in C).
Typically, data descriptors will also have __name__ and __doc__ attributes
(properties, getsets, and members have both of these attributes), but this
is not guaranteed."""
if isclass(object) or ismethod(object) or isfunction(object):
# mutual exclusion
return False
tp = type(object)
return hasattr(tp, "__set__") and hasattr(tp, "__get__")
according to the comments, we know that data descriptor has both __set__
and __get__
attribute defined while method descriptor has only __get__
defined
if we dis the print(type(str.center))
./python.exe -m dis test.py
1 0 LOAD_NAME 0 (print)
2 LOAD_NAME 1 (type)
4 LOAD_NAME 2 (str)
6 LOAD_ATTR 3 (center)
8 CALL_FUNCTION 1
10 CALL_FUNCTION 1
12 POP_TOP
14 LOAD_CONST 0 (None)
16 RETURN_VALUE
we can see that the core opcode is LOAD_ATTR
, follow the LOAD_ATTR
to the Python/ceval.c
, we can find the definition
case TARGET(LOAD_ATTR): {
PyObject *name = GETITEM(names, oparg);
PyObject *owner = TOP();
PyObject *res = PyObject_GetAttr(owner, name);
Py_DECREF(owner);
SET_TOP(res);
if (res == NULL)
goto error;
DISPATCH();
}
// PyObject_GetAttr is defined in Objects/object.c
PyObject *
PyObject_GetAttr(PyObject *v, PyObject *name)
{
PyTypeObject *tp = Py_TYPE(v);
if (!PyUnicode_Check(name)) {
PyErr_Format(PyExc_TypeError,
"attribute name must be string, not '%.200s'",
name->ob_type->tp_name);
return NULL;
}
/* first call the tp_getattro function in C level */
if (tp->tp_getattro != NULL)
return (*tp->tp_getattro)(v, name);
/* if there's not a tp_getattro C function, try to call the tp_getattr function */
if (tp->tp_getattr != NULL) {
const char *name_str = PyUnicode_AsUTF8(name);
if (name_str == NULL)
return NULL;
return (*tp->tp_getattr)(v, (char *)name_str);
}
/* If there's not a tp_getattr C function either, raise an exception */
PyErr_Format(PyExc_AttributeError,
"'%.50s' object has no attribute '%U'",
tp->tp_name, name);
return NULL;
}
tp_getattro is
an optional pointer to the get-attribute function.
it accepts two parameters, PyObject *o, PyObject *attr_name
tp_getattr is
an optional pointer to the get-attribute-string function. This field is deprecated. When it is defined, it should point to a function that acts the same as the tp_getattro function, but taking a C string instead of a Python string object to give the attribute name.
it accepts two parameters, PyObject *o, char *attr_name
The only difference between them is the second parameter, which is of type PyObject *
in tp_getattro and type char *
in tp_getattr
From the above PyObject_GetAttr
function, we can learn that tp_getattro
have a higher priority than tp_getattr
we can see how type str is defined in Objects/unicodeobject.c
PyTypeObject PyUnicode_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"str", /* tp_name */
sizeof(PyUnicodeObject), /* tp_basicsize */
0, /* tp_itemsize */
/* Slots */
(destructor)unicode_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
unicode_repr, /* tp_repr */
&unicode_as_number, /* tp_as_number */
&unicode_as_sequence, /* tp_as_sequence */
&unicode_as_mapping, /* tp_as_mapping */
(hashfunc) unicode_hash, /* tp_hash*/
0, /* tp_call*/
(reprfunc) unicode_str, /* tp_str */
PyObject_GenericGetAttr, /* tp_getattro */
0, /* tp_setattro */
...
it's using the widely used c function PyObject_GenericGetAttr
in cpython as it's tp_getattro
, which is defined in Objects/object.c
PyObject *
PyObject_GenericGetAttr(PyObject *obj, PyObject *name)
{
return _PyObject_GenericGetAttrWithDict(obj, name, NULL, 0);
}
PyObject *
_PyObject_GenericGetAttrWithDict(PyObject *obj, PyObject *name,
PyObject *dict, int suppress)
{
/* Make sure the logic of _PyObject_GetMethod is in sync with
this method.
When suppress=1, this function suppress AttributeError.
*/
PyTypeObject *tp = Py_TYPE(obj);
PyObject *descr = NULL;
PyObject *res = NULL;
descrgetfunc f;
Py_ssize_t dictoffset;
PyObject **dictptr;
if (!PyUnicode_Check(name)){
PyErr_Format(PyExc_TypeError,
"attribute name must be string, not '%.200s'",
name->ob_type->tp_name);
return NULL;
}
Py_INCREF(name);
if (tp->tp_dict == NULL) {
if (PyType_Ready(tp) < 0)
goto done;
}
/* look for a name through the MRO */
descr = _PyType_Lookup(tp, name);
f = NULL;
if (descr != NULL) {
/* the name in MRO can be found */
Py_INCREF(descr);
/* get the tp_descr_get field of descr object, which is defined as __get__ in python level */
f = descr->ob_type->tp_descr_get;
/* check if the object is data descriptor, PyDescr_IsData is defined as #define PyDescr_IsData(d) (Py_TYPE(d)->tp_descr_set != NULL) */
if (f != NULL && PyDescr_IsData(descr)) {
/* check success, means the object define both __get__ and __set__ */
/* if it's data descriptor, try to call the __get__ function and stores the result in variable res */
res = f(descr, obj, (PyObject *)obj->ob_type);
if (res == NULL && suppress &&
PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear();
}
/* finish */
goto done;
}
}
/* down here, the descr is NULL(means not able to find the name through the MRO) or descr is not NULL, but without __set__ defined */
if (dict == NULL) {
/* find the __dict__ variables inside the object */
/* Inline _PyObject_GetDictPtr */
dictoffset = tp->tp_dictoffset;
if (dictoffset != 0) {
if (dictoffset < 0) {
Py_ssize_t tsize;
size_t size;
tsize = ((PyVarObject *)obj)->ob_size;
if (tsize < 0)
tsize = -tsize;
size = _PyObject_VAR_SIZE(tp, tsize);
_PyObject_ASSERT(obj, size <= PY_SSIZE_T_MAX);
dictoffset += (Py_ssize_t)size;
_PyObject_ASSERT(obj, dictoffset > 0);
_PyObject_ASSERT(obj, dictoffset % SIZEOF_VOID_P == 0);
}
dictptr = (PyObject **) ((char *)obj + dictoffset);
dict = *dictptr;
}
}
if (dict != NULL) {
/* if the __dict__ is not NULL, try to find whether the name is stored inside the __dict__ */
Py_INCREF(dict);
res = PyDict_GetItem(dict, name);
if (res != NULL) {
Py_INCREF(res);
Py_DECREF(dict);
goto done;
}
Py_DECREF(dict);
}
/* down here, either __dict__ is NULL or name is not stored inside the __dict__ or __set__ not defined in descr */
if (f != NULL) {
/* __get__ is defined, but __set__ is not defined */
/* try to call the __get__ function */
res = f(descr, obj, (PyObject *)Py_TYPE(obj));
if (res == NULL && suppress &&
PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear();
}
goto done;
}
/* down here, the descr doesn't define __get__ method and can't find name inside __dict__*/
if (descr != NULL) {
/* if name is found in MRO, return the found object */
res = descr;
descr = NULL;
goto done;
}
if (!suppress) {
PyErr_Format(PyExc_AttributeError,
"'%.50s' object has no attribute '%U'",
tp->tp_name, name);
}
done:
Py_XDECREF(descr);
Py_DECREF(name);
return res;
}
we can draw the process according to the code above
until now, I made a mistake, the tp_getattro
in PyUnicode_Type
will not be called in str.center
, otherwise, it will be called in "str".center
, you can tell the differences in the following codes
>>> type("str")
<class 'str'>
>>> "str".center # it calls the tp_getattro in PyUnicode_Type
<built-in method center of str object at 0x10360f500>
>>> type("str".center)
<class 'builtin_function_or_method'>
>>> type(str)
<class 'type'>
>>> str.center # it calls the tp_getattro in a type named PyType_Type
<method 'center' of 'str' objects>
>>> type(str.center)
<class 'method_descriptor'>
so, the procedure above describes the attribute access of "str".center
let's find the definition of <class 'type'>
and how exactly str.center
works (mostly same as "str".center
)
for the type <class 'type'>
, the LOAD_ATTR
calls the type_getattro
PyTypeObject PyType_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"type", /* tp_name */
sizeof(PyHeapTypeObject), /* tp_basicsize */
sizeof(PyMemberDef), /* tp_itemsize */
(destructor)type_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
(reprfunc)type_repr, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
(ternaryfunc)type_call, /* tp_call */
0, /* tp_str */
(getattrofunc)type_getattro, /* tp_getattro */
(setattrofunc)type_setattro, /* tp_setattro */
0, /* tp_as_buffer */
...
}
static PyObject *
type_getattro(PyTypeObject *type, PyObject *name)
{
/*
logic mostly same as _PyObject_GenericGetAttrWithDict,
for thoes who are interested, read Objects/typeobject.c directly
*/
}
from the above pictures and codes, we can know that even if the tp_getattro
function of str
and "str"
are different, they both calls the same tp_descr_get
(alias of __get__
) defined in a type named method_descriptor
/* in cpython/Objects/descrobject.c */
static PyObject *
method_get(PyMethodDescrObject *descr, PyObject *obj, PyObject *type)
{
PyObject *res;
/* descr_check checks whether the descriptor was found on the target object itself (or a base) */
if (descr_check((PyDescrObject *)descr, obj, &res))
/* str.center goes into this branch, returns the type of PyMethodDescrObject */
return res;
/* while "str".center goes into this branch, returns a type of PyCFunction */
return PyCFunction_NewEx(descr->d_method, obj, NULL);
}
now, we have the answers of
- why will
str.center
returns a method_descriptor object, but"str".center
returns a builtin_function_or_method ? - how does attribute access work in python ?
Assume you've read the following two parts
From the above analyzation, we can learn that the key point is tp_getattro
defined in type(instance)
, tp_getattro
function of those built-in types is pre defined in C, while user defined type is created on the fly, some function will be attached to the newly created type in the tp_getattro
field
class A(object):
def __getattr__(self, item):
pass
class B(object):
def __getattribute__(self, item):
pass
class C(object):
pass
What is in the tp_getattro
field of class A
, class B
and class C
?
The following codes are snippests of the creation of a type
/* cpython/Objects/typeobject.c */
static slotdef slotdefs[] = {
/* ... */
TPSLOT("__getattribute__", tp_getattro, slot_tp_getattr_hook,
wrap_binaryfunc,
"__getattribute__($self, name, /)\n--\n\nReturn getattr(self, name)."),
TPSLOT("__getattr__", tp_getattro, slot_tp_getattr_hook, NULL, ""),
TPSLOT("__setattr__", tp_setattro, slot_tp_setattro, wrap_setattr,
/* ... */
}
static PyObject *
type_new(PyTypeObject *metatype, PyObject *args, PyObject *kwds)
{
/* ... */
/* Put the proper slots in place */
fixup_slot_dispatchers(type);
/* ... */
}
/* Store the proper functions in the slot dispatches at class (type)
definition time, based upon which operations the class overrides in its
dict. */
static void
fixup_slot_dispatchers(PyTypeObject *type)
{
slotdef *p;
init_slotdefs();
for (p = slotdefs; p->name; )
p = update_one_slot(type, p);
}
static slotdef *
update_one_slot(PyTypeObject *type, slotdef *p)
{
PyObject *descr;
PyWrapperDescrObject *d;
void *generic = NULL, *specific = NULL;
int use_generic = 0;
int offset = p->offset;
int error;
void **ptr = slotptr(type, offset);
if (ptr == NULL) {
do {
++p;
} while (p->offset == offset);
return p;
}
assert(!PyErr_Occurred());
do {
descr = find_name_in_mro(type, p->name_strobj, &error);
if (descr == NULL) {
if (error == -1) {
PyErr_Clear();
}
if (ptr == (void**)&type->tp_iternext) {
specific = (void *)_PyObject_NextNotImplemented;
}
continue;
}
if (Py_TYPE(descr) == &PyWrapperDescr_Type &&
((PyWrapperDescrObject *)descr)->d_base->name_strobj == p->name_strobj) {
void **tptr = resolve_slotdups(type, p->name_strobj);
if (tptr == NULL || tptr == ptr)
generic = p->function;
d = (PyWrapperDescrObject *)descr;
if (d->d_base->wrapper == p->wrapper &&
PyType_IsSubtype(type, PyDescr_TYPE(d)))
{
if (specific == NULL ||
specific == d->d_wrapped)
specific = d->d_wrapped;
else
use_generic = 1;
}
}
else if (Py_TYPE(descr) == &PyCFunction_Type &&
PyCFunction_GET_FUNCTION(descr) ==
(PyCFunction)(void(*)(void))tp_new_wrapper &&
ptr == (void**)&type->tp_new)
{
/* ... */
specific = (void *)type->tp_new;
/* ... */
}
else if (descr == Py_None &&
ptr == (void**)&type->tp_hash) {
/* ... */
specific = (void *)PyObject_HashNotImplemented;
/* ... */
}
else {
use_generic = 1;
generic = p->function;
}
} while ((++p)->offset == offset);
if (specific && !use_generic)
*ptr = specific;
else
*ptr = generic;
return p;
}
We can learn that for every pre defined slot attribute in the slotdefs
, update_one_slot
will install the proper function/object to the newly created type
__getattribute__
and __getattr__
have same offset in every newly created type(you can learn from slotdefs
structure)
For class A
, when installing __getattribute__
, offset
is 144
, the descr
is <slot wrapper '__getattribute__' of 'object' objects>
, PyType_IsSubtype(type, PyDescr_TYPE(d))
is True, so specific
will be d->d_wrapped
, which is PyObject_GenericGetAttr
. In the next while loop, when installing __getattr__
, offset
is also 144
, this time descr
will be <function A.__getattr__ at 0x1013260c0>
, the final else
inside the while loop will set generic
to p->function
, which is slot_tp_getattr_hook
, now the while loop terminate and offset 144
stores slot_tp_getattr_hook
For class B
, when installing __getattribute__
, offset
is 144
, the descr
is <function B.__getattribute__ at 0x103ad6140>
, the final else
inside the while loop will set generic
to p->function
, which is slot_tp_getattr_hook
, In the next while loop, when installing __getattr__
, offset
is also 144
, this time descr
is a null
pointer, so the continue
statement will terminate the while loop, offset 144
stores slot_tp_getattr_hook
For class C
, when installing __getattribute__
, offset
is 144
, the descr
is <slot wrapper '__getattribute__' of 'object' objects>
, PyType_IsSubtype(type, PyDescr_TYPE(d))
is True, so specific
will be d->d_wrapped
, which is PyObject_GenericGetAttr
, In the next while loop, when installing __getattr__
, offset
is also 144
, this time descr
is a null
pointer, so the continue
statement will terminate the while loop, offset 144
stores PyObject_GenericGetAttr
slot_tp_getattr_hook
is defined as
/* python/Objects/typeobject.c */
static PyObject *
slot_tp_getattro(PyObject *self, PyObject *name)
{
PyObject *stack[1] = {name};
return call_method(self, &PyId___getattribute__, stack, 1);
}
static PyObject *
slot_tp_getattr_hook(PyObject *self, PyObject *name)
{
PyTypeObject *tp = Py_TYPE(self);
PyObject *getattr, *getattribute, *res;
_Py_IDENTIFIER(__getattr__);
getattr = _PyType_LookupId(tp, &PyId___getattr__);
if (getattr == NULL) {
/* No __getattr__ hook: use a simpler dispatcher */
tp->tp_getattro = slot_tp_getattro;
return slot_tp_getattro(self, name);
}
Py_INCREF(getattr);
getattribute = _PyType_LookupId(tp, &PyId___getattribute__);
if (getattribute == NULL ||
(Py_TYPE(getattribute) == &PyWrapperDescr_Type &&
((PyWrapperDescrObject *)getattribute)->d_wrapped ==
(void *)PyObject_GenericGetAttr))
res = PyObject_GenericGetAttr(self, name);
else {
Py_INCREF(getattribute);
res = call_attribute(self, getattribute, name);
Py_DECREF(getattribute);
}
if (res == NULL && PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear();
res = call_attribute(self, getattr, name);
}
Py_DECREF(getattr);
return res;
}
For slot_tp_getattr_hook
-
If there's no
___getattr__
method,slot_tp_getattr_hook
will only call___getattribute__
directly -
If there defines
__getattr__
,slot_tp_getattr_hook
will call___getattribute__
, if there's no result and occursPyExc_AttributeError
, try to call__getattr__
If you override any of __getattribute__
or __getattr__
, tp_getattro
of the newly created class will be slot_tp_getattr_hook
If none of __getattribute__
or __getattr__
are override, tp_getattro
of the newly created class will be PyObject_GenericGetAttr
(same as built-in types)
Because type(newly_created_class)
will always return <class 'type'>
, and the tp_getattro
of <class 'type'>
is pre defined in C and not able to be customized, the behaviour of attribute accessing is same as class attribute access
let's find out the answer of
- what is type method_descriptor ?
it's defined in Include/descrobject.h
we can see that when you try to access "str".center
and str.center
, they both shares the same PyMethodDescrObject, which is a wrapper of the PyMethodDef object
for those who are interested in PyMethodDef, please refer to method
>>> str.center
descr->d_common->d_type: 0x10fdefc10, descr->d_common->d_type.tp_name: str, repr(d_name): 'center', repr(d_qualname): NULL, PyMethodDef: 0x10fdf0370
>>> "str".center
descr->d_common->d_type: 0x10fdefc10, descr->d_common->d_type.tp_name: str, repr(d_name): 'center', repr(d_qualname): NULL, PyMethodDef: 0x10fdf0370
there exists various descriptor type
PyMemberDescrObject: wrapper of PyMemberDef
PyGetSetDescrObject: wrapper of PyGetSetDef
we know that when you try to access the attribute of an object, the python virtual machine will
- execute the opcode
LOAD_ATTR
LOAD_ATTR
will try to calltp_getattro
function of the object, if success go to 4- raise an exception
- return what's returned
the default tp_getattro
will be installed in the creation process of the newly created type, different behaviour depends on what methods are override by user, generally, the default tp_getattro
is PyObject_GenericGetAttr
which implements the descriptor protocol(we learned above from the source code)
when we define a python object, if we need to change the behavior of attribute access
we are not able to change the behavior of opcode LOAD_ATTR
, it's written in C
instead, we can provide our own __getattribute__
and __getattr__
to change the function installed in the tp_getattro
slot of the newly created type
notice, provide your own __getattribute__
may violate the descriptor protocol, I will not recommend you to do that(usually we only need to define our own __getattr__
)
class A(object):
def __getattribute__(self, item):
print("in __getattribute__", item)
if item in ("noA", "noB"):
raise AttributeError
return "__getattribute__" + str(item)
def __getattr__(self, item):
print("in __getattr__", item)
if item == "noB":
raise AttributeError
return "__getattr__" + str(item)
>>> a = A()
>>> a.x
in __getattribute__ x
'__getattribute__x'
>>> a.noA
in __getattribute__ noA
in __getattr__ noA
'__getattr__noA'
>>> a.noB
in __getattribute__ noB
in __getattr__ noB
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 11, in __getattr__
AttributeError