-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value #89639
Comments
The Python C API provides "AS" functions to convert an object to another type, like PyFloat_AS_DOUBLE(). These macros can be abused to be used as l-value: "PyFloat_AS_DOUBLE(obj) = new_value;". It prevents to change the PyFloat implementation and makes life harder for Python implementations other than CPython. I propose to convert these macros to static inline functions to disallow using them as l-value. I made a similar change for Py_REFCNT(), Py_TYPE() and Py_SIZE(). For these functions, I added "SET" variants: Py_SET_REFCNT(), Py_SET_TYPE(), Py_SET_SIZE(). Here, I don't think that the l-value case is legit, and so I don't see the need to add a way to *set* a value. For example, I don't think that PyFloat_SET_DOUBLE(obj, value) would make sense. A Python float object is supposed to be immutable. |
You could simply document, "don't do that". Also if these is a need to make an assignment, you're now going to have to create a new setter function to fill the need. We really don't have to go on thin ice converting to functions that might or might not be inlined depending on compiler specific nuances. AFAICT, no one has ever has problems with these being macros. There really isn't a problem to be solved and the "solution" may in fact introduce new problems that we didn't have before. Put me down for a -1 on the these blanket macro-to-inline function rewrites. The premise is almost entirely a matter of option, "macros are bad, functions are good" and a naive assumption, "inline functions" always inline. |
Raymond:
This issue is about the API of PyFloat_AS_DOUBLE(). Implementing it as a macro or a static inline function is an implementation detail which doesn't matter. But I don't know how to disallow "PyFloat_AS_DOUBLE(obj) = value" if it is defined as a macro. Have a look at the Facebook "nogil" project which is incompatible with accessing directly the PyObject.ob_refcnt member: Raymond:
Documentation doesn't work. Developers easily fall into traps when it's possible to fall. See bpo-30459 for such trap with PyList_SET_ITEM() and PyCell_SET() macros. They were misused by two Python projects. Raymond:
Do you have a concrete example where a static inline function is not inlined, whereas it was inlined when it was a macro? So far, I'm not aware of any performance issue like that. There were attempts to use __attribute__((always_inline)) (Py_ALWAYS_INLINE), but so far, using it was not a clear win. |
I searched for "PyFloat_AS_DOUBLE.*=" regex in the PyPI top 5000 projects. I couldn't find any project doing that. I only found perfectly safe comparisons: traits/ctraits.c: if (PyFloat_AS_DOUBLE(value) <= PyFloat_AS_DOUBLE(low)) { |
I am with Raymond on this one. If "protecting against wrong use" is the only reason to go down the slippery path of starting to rely on compiler optimizations for performance critical operations, the argument is not good enough. If people do use macros in l-value mode, it's their problem when their code breaks, not ours. Please don't forget that we are operating under the consenting adults principle: we expect users of the CPython API to use it as documented and expect them to take care of the fallout, if things break when they don't. We don't need to police developers into doing so. |
For PyObject, I converted Py_REFCNT(), Py_TYPE() and Py_SIZE() to static inline functions to enforce the usage of Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE(). Only a minority of C extensions are affected by these changes. Also, there is more pressure from recent optimization projects to abstract accesses to PyObject members. I agree that it doesn't seem that "AS" functions are abused to *set* the inner string:
Again, I'm not aware of any performance issue caused by short static inline functions like Py_TYPE() or the proposed PyFloat_AS_DOUBLE(). If there is a problem, it should be addressed, since Python uses more and more static inline functions. static inline functions is a common feature of C language. I'm not sure where your doubts of bad performance come from. Using static inline functions has other advantages. It helps debugging and profiling, since the function name can be retrieved by debuggers and profilers when analysing the machine code. It also avoids macro pitfalls (like abusing a macro to use it as an l-value ;-)). |
On 15.10.2021 11:43, STINNER Victor wrote:
Inlining is something that is completely under the control of the The reason why we have those macros is because we want the developers to be If the developer wants to pass control over to the compiler s/he can use
Perhaps, but then I never had to profile macro use in the past. Instead, That said, the macros you have inlined so far were all really trivial, Perhaps we ought to have a threshold for making such decisions, e.g. A blanket "static inline" is always better than a macro is not good Esp. in PGO driven optimizations the compiler could opt for using |
If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an lvalue, why not use the comma operator to ensure that the result is an rvalue? The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I imagine there is similar text in other versions of the standard. The idea would be to define a helper macro like this: /* As expr, but can only be used as an rvalue. */
#define Py_RVALUE(expr) ((void)0, (expr)) and then use the helper where needed, for example: #define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval) |
Oh, that's a clever trick! I wrote #73162 which uses it. |
I created bpo-45490: "[meta][C API] Avoid C macro pitfalls and usage of static inline functions" to discuss macros and static inline functions more generally. |
Marc-Andre:
I checked the following C snippet on gcc.godbolt.org using GCC 4.1.2 and Clang 3.0.0 with <no flags>/-O0/-O1/-Os, and both compilers inline a function marked as static inline: static inline int foo(int a)
{
return a * 2;
}
int bar(int a)
{
return foo(a) < 0;
} So even with -O0, GCC from 2007 and Clang from 2011 perform inlining. Though, old versions of CLang leave a dangling original copy of foo for some reason. I hope a linker removes it later. As for other compilers, I believe that if somebody specifies -O0, that person has a sound reason to do so (like per-line debugging, building precise flame graphs, or other specific scenario where execution performance does not matter), so inlining interferes here anyway. |
On 15.11.2021 08:54, Oleg Iarygin wrote:
That's a great website :-) Thanks for sharing. However, even with x86-64 gcc 11.2, I get assembler which does not inline Only with -O1, the site reports inlining foo().
Sure, but my point was a different one: even with higher optimization With macros the compiler has no choice and we are in control and even |
I wrote PEP-670 "Convert macros to functions in the Python C API" for this issue: |
I don't understand what you are trying to prove about compilers not inlining when you explicitly ask them... not to inline. The purpose of the -O0 option is to minimize the build time, with a trade-off: don't expect the built executable to be fast. If you care about Python performance... well, don't use -O0? Python ./configure --with-pydebug builds Python with -Og which is not -O0. The -Og level is special, it's a different trade-off between the compiler build time and Python runtime performance. If you want a Python debug build (Py_DEBUG macro defined, ./configure --with-pydebug), it's perfectly fine to build it with -O2 or -O3 to make sure that static inline functions are inlined. You can also enable LTO and PGO on a debug build. GCC -Og option:
""" I prefer to use gcc -O0 when I develop on Python because the build time matters a lot in my very specific use case, and gcc -O0 is the best to debug Python in a debugger. See my article: On RHEL8, the Python 3.9 debug build is now built with -O0 to be fully usable in gdb (to debug C extensions). In RHEL, the main motivation to use -O0 rather than -Og was to get a fully working gdb debugger on C extensions. With -Og, we get too many <optimized out> values which are blocking debugging :-( |
On 15.11.2021 10:54, STINNER Victor wrote:
I'm not trying to prove anything, Victor. I'm only stating the fact that by switching from macros to inline I've heard all your arguments against macros, but don't believe the I also don't believe that we should assume that Python C extension IMO, conversion to inline functions should only happen, when a) the core language implementation has a direct benefit, and b) it is very unlikely that compilers will not inline the code Overall, I think the PEP-670 should get some more attentions from the BTW: Thanks for the details about -O0 vs. -Og. |
I decided to exclude macros which can be used as l-value from the PEP-670, since the motivation to disallow using them as l-value is different, and I prefer to restrict PEP-670 scope. Disallowing using macros as l-value is more about hide implementation details and improving compatibility with Python implementations other than CPython, like PyPy or RustPython. The PEP-670 is restricted to advantages and disavantages of converting macros to functions (static inline or regular functions). |
PyBytes_AS_STRING() and PyByteArray_AS_STRING() are used to modify string characters, but not used directly as l-value. Search in PyPI top 5000 packages: $ ./search_pypi_top_5000.sh '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]'
pypi-top-5000_2021-08-17/plyvel-1.3.0.tar.gz
pypi-top-5000_2021-08-17/numexpr-2.7.3.tar.gz
pypi-top-5000_2021-08-17/Cython-0.29.24.tar.gz numexpr-2.7.3, numexpr/numexpr_object.cpp: PyBytes_AS_STRING(constsig)[i] = 'b'; plyvel-1.3.0, plyvel/_plyvel.cpp: PyByteArray_AS_STRING(string)[i] = (char) v; Cython-0.29.24: $ grep -E '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]' -R .
./Cython/Utility/StringTools.c: PyByteArray_AS_STRING(string)[i] = (char) v;
./Cython/Utility/StringTools.c: PyByteArray_AS_STRING(string)[i] = (char) v;
./Cython/Utility/StringTools.c: PyByteArray_AS_STRING(bytearray)[n] = value; |
I created this issue to disallow macros like PyFloat_AS_DOUBLE() and PyDict_GET_SIZE() as l-value. It seems like this change by itself is controversial. I proposed one way to implement this change: convert macros to static inline functions. I didn't expect that this conversion would be also controversial. For now, I abandon the static inline approach, to focus on the implementation which keeps macros: modify macros to use _Py_RVALUE() => PR 28976. Once the PR 28976 will be merged and the PEP-670 will be accepted, we can reconsider converting these macros to static inline functions, then it should be non controversial. |
I wrote PEP-674 "Disallow using macros as l-value" for this change: https://python.github.io/peps/pep-0674/ |
In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG:
M2Crypto-0.38.0/src/SWIG/_m2crypto_wrap.c and mecab-python3-1.0.4/src/MeCab/MeCab_wrap.cpp contain the function: SWIGINTERN PyGetSetDescrObject * PyGetSetDescrObject *descr;
descr = (PyGetSetDescrObject *)PyType_GenericAlloc(SwigPyStaticVar_Type(), 0);
assert(descr);
Py_XINCREF(type);
PyDescr_TYPE(descr) = type;
PyDescr_NAME(descr) = PyString_InternFromString(getset->name);
descr->d_getset = getset;
if (PyDescr_NAME(descr) == NULL) {
Py_DECREF(descr);
descr = NULL;
}
return descr;
} |
I found 4 projects using "Py_TYPE(obj) = new_type;" in the PyPI top 5000: mypy-0.910:
recordclass-0.16.3:
pysha3-1.0.2:
datatable-1.0.0.tar.gz:
|
In the PyPI top 5000 projects, I found 32 projects using "Py_SIZE(obj) = 8 projects using "Py_SIZE(obj) = new_size":
24 projects using "Py_SIZE(obj) = new_size" generated by an outdated Cython:
|
Attached pep674_regex.py generated a regex to search for code incompatible with the PEP-674. To download PyPI top 5000, you can use my script: To grep a regex in tarball and ZIP archives, you can use the rg command: $ rg -zl REGEX DIRECTORY/*.{zip,gz,bz2,tgz} Or you can try my script: |
I updated my ./search_pypi_top_5000.py script to ignore files generated by Cython. On PyPI top 5000, I only found 16 projects impacted by the PEP-674 (16/5000 = 0.3%):
I ignored manually two false positives in 3 projects:
|
Oops, sorry, pycurl-7.44.1 and PyGObject-3.42.0 are not affected, they only define Py_SET_TYPE() macro for backward compatibility. So right now, only 14 projects are affected. |
I proposed a fix upstream: sergey-dryabzhinsky/python-zstd#70 |
If I understand correctly, this module is compatible with the PEP-674, it only has to copy Python 3.11 header files once Python 3.11 is released, to port the project to Python 3.11. Incompatable code is not part of the "frozendict" implementation, but only in copies of the Python header files (Python 3.6, 3.7, 3.8, 3.9 and 3.10). For example, it contains the frozendict/src/3_10/cpython_src/Include/object.h header: copy of CPython Include/object.h file. Source code: https://github.com/Marco-Sulla/python-frozendict |
This module must not be used on Python 3.6 and newer which has a built-in support for SHA-3 hash functions. Example: $ python3.6
Python 3.6.15 (default, Sep 5 2021, 00:00:00)
>>> import hashlib
>>> h=hashlib.new('sha3_224'); h.update(b'hello'); print(h.hexdigest())
b87f88c72702fff1748e58b87e9141a42c0dbedc29a78cb0d4a5cd81 By the way, building pysha3 on Python 3.11 now fails with:
The pystrhex.h header file has been removed in Python 3.11 by bpo-45434. But I don't think that it's worth it trying to port it to Python 3.11, if the module must not be used on Python 3.6 and newer. Environment markers can be used to skip the pysha3 dependency on Python 3.6 on newer. Example: "pysha3; python_version < '3.6'" |
Affected code is only code generated by Cython: the project only has to regenerated code with a recent Cython version. |
I proposed a fix: python/mypy#11652 |
I proposed a first PR for Py_TYPE(): |
I proposed a fix: intake/python-snappy#114 |
I created bpo-46538 "[C API] Make the PyDescrObject structure opaque" to handle PyDescr_NAME() and PyDescr_TYPE() macros. But IMO it's not really worth it to make the PyDescrObject structure opaque. It's just too much work, whereas PyDescrObject is not performance sensitive. It's ok to continue exposing this structure in public for now. I will exclude PyDescr_NAME() and PyDescr_TYPE() from the PEP-674. |
I created h2oai/datatable#3231 |
This project is a backport targeting Python 3.7 and older. I'm not sure if it makes sense to update to it to Python 3.11. It's the same for pysha3 which targets Python <= 3.5. |
I created: zhuyifei1999/guppy3#40 |
This is a vendored the Boost.org python module which has already been fixed in boost 1.78.0 (commit: January 2021) by: scipy should just update its scipy/_lib/boost copy. |
I created: https://bitbucket.org/intellimath/recordclass/pull-requests/1/python-311-support-use-py_set_size |
Technically, zodbpickle works on Python 3.11 and is not impacted by the Py_SIZE() change. _pickle_33.c redefines the Py_SIZE() macro to continue using as an l-value: I proposed a PR to use Py_SET_SIZE() explicitly: |
Strictly speaking, However, since pystrhex.h removed from Python 3.11, the pycryptodome is the only legit library supporting Ethereum keccak. |
Technically, these functions are still exported and remain available in the internal C API: Include/internal/pycore_strhex.h If someone wants a public C API for these, it should be asked. But by default, the private functions must not be used outside Python itself: https://docs.python.org/dev/c-api/stable.html It's not hard to copy Python/pystrhex.c (174 lines) or reimplement it. |
That's not practical because For anyone searching the errors, just use |
https://peps.python.org/pep-0674/ was deferred by the SC. For now, I prefer to close this issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: