Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value #89639

Closed
vstinner opened this issue Oct 14, 2021 · 45 comments
Closed

[C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value #89639

vstinner opened this issue Oct 14, 2021 · 45 comments
Labels
3.11 only security fixes topic-C-API

Comments

@vstinner
Copy link
Member

BPO 45476
Nosy @malemburg, @rhettinger, @vstinner, @gareth-rees, @erlend-aasland, @arhadthedev
PRs
  • bpo-45476: Convert PyFloat_AS_DOUBLE() to static inline #28961
  • bpo-45476: Disallow using PyFloat_AS_DOUBLE() as l-value #28976
  • bpo-45476: Add _Py_RVALUE() macro #29860
  • bpo-45476: Disallow using asdl_seq_GET() as l-value #29866
  • Files
  • pep674_regex.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2021-10-14.21:17:31.326>
    labels = ['expert-C-API', '3.11']
    title = '[C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value'
    updated_at = <Date 2022-01-27.20:29:40.417>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-01-27.20:29:40.417>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['C API']
    creation = <Date 2021-10-14.21:17:31.326>
    creator = 'vstinner'
    dependencies = []
    files = ['50462']
    hgrepos = []
    issue_num = 45476
    keywords = ['patch']
    message_count = 41.0
    messages = ['403950', '403956', '403961', '403965', '403990', '403995', '403999', '404001', '404010', '404039', '406343', '406344', '406345', '406346', '406348', '406996', '407283', '407358', '407374', '407375', '407395', '407410', '407412', '407416', '407417', '407456', '407463', '407528', '407529', '407531', '407532', '407536', '407872', '407875', '411774', '411802', '411803', '411826', '411827', '411831', '411921']
    nosy_count = 6.0
    nosy_names = ['lemburg', 'rhettinger', 'vstinner', '[email protected]', 'erlendaasland', 'arhadthedev']
    pr_nums = ['28961', '28976', '29860', '29866']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue45476'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    The Python C API provides "AS" functions to convert an object to another type, like PyFloat_AS_DOUBLE(). These macros can be abused to be used as l-value: "PyFloat_AS_DOUBLE(obj) = new_value;". It prevents to change the PyFloat implementation and makes life harder for Python implementations other than CPython.

    I propose to convert these macros to static inline functions to disallow using them as l-value.

    I made a similar change for Py_REFCNT(), Py_TYPE() and Py_SIZE(). For these functions, I added "SET" variants: Py_SET_REFCNT(), Py_SET_TYPE(), Py_SET_SIZE(). Here, I don't think that the l-value case is legit, and so I don't see the need to add a way to *set* a value.

    For example, I don't think that PyFloat_SET_DOUBLE(obj, value) would make sense. A Python float object is supposed to be immutable.

    @vstinner vstinner added 3.11 only security fixes topic-C-API labels Oct 14, 2021
    @rhettinger
    Copy link
    Contributor

    These macros can be abused to be used as l-value

    You could simply document, "don't do that". Also if these is a need to make an assignment, you're now going to have to create a new setter function to fill the need.

    We really don't have to go on thin ice converting to functions that might or might not be inlined depending on compiler specific nuances.

    AFAICT, no one has ever has problems with these being macros. There really isn't a problem to be solved and the "solution" may in fact introduce new problems that we didn't have before.

    Put me down for a -1 on the these blanket macro-to-inline function rewrites. The premise is almost entirely a matter of option, "macros are bad, functions are good" and a naive assumption, "inline functions" always inline.

    @vstinner
    Copy link
    Member Author

    Raymond:

    AFAICT, no one has ever has problems with these being macros.

    This issue is about the API of PyFloat_AS_DOUBLE(). Implementing it as a macro or a static inline function is an implementation detail which doesn't matter. But I don't know how to disallow "PyFloat_AS_DOUBLE(obj) = value" if it is defined as a macro.

    Have a look at the Facebook "nogil" project which is incompatible with accessing directly the PyObject.ob_refcnt member:
    "Extensions must use Py_REFCNT and Py_SET_REFCNT instead of directly accessing reference count fields"
    https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

    Raymond:

    You could simply document, "don't do that".

    Documentation doesn't work. Developers easily fall into traps when it's possible to fall. See bpo-30459 for such trap with PyList_SET_ITEM() and PyCell_SET() macros. They were misused by two Python projects.

    Raymond:

    We really don't have to go on thin ice converting to functions that might or might not be inlined depending on compiler specific nuances.

    Do you have a concrete example where a static inline function is not inlined, whereas it was inlined when it was a macro? So far, I'm not aware of any performance issue like that.

    There were attempts to use __attribute__((always_inline)) (Py_ALWAYS_INLINE), but so far, using it was not a clear win.

    @vstinner
    Copy link
    Member Author

    I searched for "PyFloat_AS_DOUBLE.*=" regex in the PyPI top 5000 projects. I couldn't find any project doing that.

    I only found perfectly safe comparisons:

    traits/ctraits.c: if (PyFloat_AS_DOUBLE(value) <= PyFloat_AS_DOUBLE(low)) {
    traits/ctraits.c: if (PyFloat_AS_DOUBLE(value) >= PyFloat_AS_DOUBLE(high)) {
    c/_cffi_backend.c: return PyFloat_AS_DOUBLE(ob) != 0.0;
    pandas/_libs/src/klib/khash_python.h: ( PyFloat_AS_DOUBLE(a) == PyFloat_AS_DOUBLE(b) );

    @malemburg
    Copy link
    Member

    I am with Raymond on this one.

    If "protecting against wrong use" is the only reason to go down the slippery path of starting to rely on compiler optimizations for performance critical operations, the argument is not good enough.

    If people do use macros in l-value mode, it's their problem when their code breaks, not ours. Please don't forget that we are operating under the consenting adults principle: we expect users of the CPython API to use it as documented and expect them to take care of the fallout, if things break when they don't.

    We don't need to police developers into doing so.

    @vstinner
    Copy link
    Member Author

    For PyObject, I converted Py_REFCNT(), Py_TYPE() and Py_SIZE() to static inline functions to enforce the usage of Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE(). Only a minority of C extensions are affected by these changes. Also, there is more pressure from recent optimization projects to abstract accesses to PyObject members.

    I agree that it doesn't seem that "AS" functions are abused to *set* the inner string:

    • PyByteArray_AS_STRING()
    • PyBytes_AS_STRING()
    • PyFloat_AS_DOUBLE()

    If "protecting against wrong use" is the only reason to go down the slippery path of starting to rely on compiler optimizations for performance critical operations, the argument is not good enough.

    Again, I'm not aware of any performance issue caused by short static inline functions like Py_TYPE() or the proposed PyFloat_AS_DOUBLE(). If there is a problem, it should be addressed, since Python uses more and more static inline functions.

    static inline functions is a common feature of C language. I'm not sure where your doubts of bad performance come from.

    Using static inline functions has other advantages. It helps debugging and profiling, since the function name can be retrieved by debuggers and profilers when analysing the machine code. It also avoids macro pitfalls (like abusing a macro to use it as an l-value ;-)).

    @malemburg
    Copy link
    Member

    On 15.10.2021 11:43, STINNER Victor wrote:

    Again, I'm not aware of any performance issue caused by short static inline functions like Py_TYPE() or the proposed PyFloat_AS_DOUBLE(). If there is a problem, it should be addressed, since Python uses more and more static inline functions.

    static inline functions is a common feature of C language. I'm not sure where your doubts of bad performance come from.

    Inlining is something that is completely under the control of the
    used compilers. Compilers are free to not inline function marked for
    inlining, which can result in significant slowdowns on platforms
    which are e.g. restricted in RAM and thus emphasize on small code size,
    or where the CPUs have small caches or not enough registers (think
    micro-controllers).

    The reason why we have those macros is because we want the developers to be
    able to make a conscious decision "please inline this code unconditionally
    and regardless of platform or compiler". The developer will know better
    what to do than the compiler.

    If the developer wants to pass control over to the compiler s/he can use
    the corresponding C function, which is usually available (and then, in many
    cases, also provides error handling).

    Using static inline functions has other advantages. It helps debugging and profiling, since the function name can be retrieved by debuggers and profilers when analysing the machine code. It also avoids macro pitfalls (like abusing a macro to use it as an l-value ;-)).

    Perhaps, but then I never had to profile macro use in the past. Instead,
    what I typically found was that using macros results in faster code when
    used in inner loops, so profiling usually guided me to use macros instead
    of functions.

    That said, the macros you have inlined so far were all really trivial,
    so a compiler will most likely always inline them (the number of machine
    code instructions for the call would be more than needed for
    the actual operation).

    Perhaps we ought to have a threshold for making such decisions, e.g.
    number of machine code instructions generated for the macro or so, to
    not get into discussions every time :-)

    A blanket "static inline" is always better than a macro is not good
    enough as an argument, though.

    Esp. in PGO driven optimizations the compiler could opt for using
    the function call rather than inlining if it finds that the code
    in question is not used much and it needs to save space to have
    loops fit into CPU caches.

    @gareth-rees
    Copy link
    Mannequin

    gareth-rees mannequin commented Oct 15, 2021

    If the problem is accidental use of the result of PyFloat_AS_DOUBLE() as an lvalue, why not use the comma operator to ensure that the result is an rvalue?

    The C99 standard says "A comma operator does not yield an lvalue" in §6.5.17; I imagine there is similar text in other versions of the standard.

    The idea would be to define a helper macro like this:

        /* As expr, but can only be used as an rvalue. */
        #define Py_RVALUE(expr) ((void)0, (expr))

    and then use the helper where needed, for example:

        #define PyFloat_AS_DOUBLE(op) Py_RVALUE(((PyFloatObject *)(op))->ob_fval)

    @vstinner
    Copy link
    Member Author

    #define Py_RVALUE(expr) ((void)0, (expr))

    Oh, that's a clever trick!

    I wrote #73162 which uses it.

    @vstinner vstinner changed the title [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions [C API] Disallow using PyFloat_AS_DOUBLE() as l-value Oct 15, 2021
    @vstinner vstinner changed the title [C API] Convert "AS" functions, like PyFloat_AS_DOUBLE(), to static inline functions [C API] Disallow using PyFloat_AS_DOUBLE() as l-value Oct 15, 2021
    @vstinner
    Copy link
    Member Author

    I created bpo-45490: "[meta][C API] Avoid C macro pitfalls and usage of static inline functions" to discuss macros and static inline functions more generally.

    @arhadthedev
    Copy link
    Member

    Marc-Andre:

    Inlining is something that is completely under the control of the
    used compilers. Compilers are free to not inline function marked for
    inlining [...]

    I checked the following C snippet on gcc.godbolt.org using GCC 4.1.2 and Clang 3.0.0 with <no flags>/-O0/-O1/-Os, and both compilers inline a function marked as static inline:

        static inline int foo(int a)
        {
            return a * 2;
        }
    
        int bar(int a)
        {
            return foo(a) < 0;
        }

    So even with -O0, GCC from 2007 and Clang from 2011 perform inlining. Though, old versions of CLang leave a dangling original copy of foo for some reason. I hope a linker removes it later.

    As for other compilers, I believe that if somebody specifies -O0, that person has a sound reason to do so (like per-line debugging, building precise flame graphs, or other specific scenario where execution performance does not matter), so inlining interferes here anyway.

    @malemburg
    Copy link
    Member

    On 15.11.2021 08:54, Oleg Iarygin wrote:

    Oleg Iarygin <[email protected]> added the comment:

    Marc-Andre:
    > Inlining is something that is completely under the control of the
    used compilers. Compilers are free to not inline function marked for
    inlining [...]

    I checked the following C snippet on gcc.godbolt.org using GCC 4.1.2 and Clang 3.0.0 with <no flags>/-O0/-O1/-Os, and both compilers inline a function marked as static inline:

    static inline int foo(int a)
    {
        return a * 2;
    }
    
    int bar(int a)
    {
        return foo(a) \< 0;
    }
    

    So even with -O0, GCC from 2007 and Clang from 2011 perform inlining. Though, old versions of CLang leave a dangling original copy of foo for some reason. I hope a linker removes it later.

    That's a great website :-) Thanks for sharing.

    However, even with x86-64 gcc 11.2, I get assembler which does not inline
    foo() without compiler options or with -O0: https://gcc.godbolt.org/z/oh6qnffh7

    Only with -O1, the site reports inlining foo().

    As for other compilers, I believe that if somebody specifies -O0, that person has a sound reason to do so (like per-line debugging, building precise flame graphs, or other specific scenario where execution performance does not matter), so inlining interferes here anyway.

    Sure, but my point was a different one: even with higher optimization
    levels, the compiler can decide whether or not to inline. We expect
    the compiler to inline, but cannot be sure.

    With macros the compiler has no choice and we are in control and even
    when using -O0, you will still want e.g. Py_INCREF() and Py_DECREF()
    inlined.

    @vstinner
    Copy link
    Member Author

    I wrote PEP-670 "Convert macros to functions in the Python C API" for this issue:
    https://www.python.org/dev/peps/pep-0670/

    @vstinner
    Copy link
    Member Author

    I don't understand what you are trying to prove about compilers not inlining when you explicitly ask them... not to inline.

    The purpose of the -O0 option is to minimize the build time, with a trade-off: don't expect the built executable to be fast. If you care about Python performance... well, don't use -O0? Python ./configure --with-pydebug builds Python with -Og which is not -O0. The -Og level is special, it's a different trade-off between the compiler build time and Python runtime performance.

    If you want a Python debug build (Py_DEBUG macro defined, ./configure --with-pydebug), it's perfectly fine to build it with -O2 or -O3 to make sure that static inline functions are inlined. You can also enable LTO and PGO on a debug build.

    GCC -Og option:
    """
    -Og

    Optimize debugging experience. -Og should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0.
    
    Like -O0, -Og completely disables a number of optimization passes so that individual options controlling them have no effect. Otherwise -Og enables all -O1 optimization flags except for those that may interfere with debugging:
    
    -fbranch-count-reg  -fdelayed-branch 
    -fdse  -fif-conversion  -fif-conversion2  
    -finline-functions-called-once 
    -fmove-loop-invariants  -fmove-loop-stores  -fssa-phiopt 
    -ftree-bit-ccp  -ftree-dse  -ftree-pta  -ftree-sra
    

    """
    https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

    I prefer to use gcc -O0 when I develop on Python because the build time matters a lot in my very specific use case, and gcc -O0 is the best to debug Python in a debugger. See my article:
    https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb

    On RHEL8, the Python 3.9 debug build is now built with -O0 to be fully usable in gdb (to debug C extensions).

    In RHEL, the main motivation to use -O0 rather than -Og was to get a fully working gdb debugger on C extensions. With -Og, we get too many <optimized out> values which are blocking debugging :-(

    @malemburg
    Copy link
    Member

    On 15.11.2021 10:54, STINNER Victor wrote:

    I don't understand what you are trying to prove about compilers not inlining when you explicitly ask them... not to inline.

    I'm not trying to prove anything, Victor.

    I'm only stating the fact that by switching from macros to inline
    functions we are giving away control to the compilers and should not
    be surprised that Python now suddenly runs a lot slower on systems
    which either have inlining optimizations switched off or where the
    compiler (wrongly) assumes that creating more assembler would result
    in slower code.

    I've heard all your arguments against macros, but don't believe the
    blanket approach to convert to inline functions is warranted in all
    cases, in particular not for code which is private to the interpreter
    and where we know that we need the code inlined to not result in
    unexpected performance regressions.

    I also don't believe that we should assume that Python C extension
    authors will unintentionally misuse Python API macros. If they do,
    it's their business to sort out any issues, not ours. If we document
    that macros may not be used as l-values, that's clear enough. We don't
    need to use compiler restrictions to impose such limitations.

    IMO, conversion to inline functions should only happen, when

    a) the core language implementation has a direct benefit, and

    b) it is very unlikely that compilers will not inline the code
    with -O2 settings, e.g. perhaps using a threshold of LOCs
    or by testing with the website Oleg mentioned.

    Overall, I think the PEP-670 should get some more attentions from the
    SC to have a guideline to use as basis for deciding whether or not
    to use the static inline function approach. That way we could avoid
    these discussions :-)

    BTW: Thanks for the details about -O0 vs. -Og.

    @vstinner
    Copy link
    Member Author

    I decided to exclude macros which can be used as l-value from the PEP-670, since the motivation to disallow using them as l-value is different, and I prefer to restrict PEP-670 scope.

    Disallowing using macros as l-value is more about hide implementation details and improving compatibility with Python implementations other than CPython, like PyPy or RustPython.

    The PEP-670 is restricted to advantages and disavantages of converting macros to functions (static inline or regular functions).

    @vstinner
    Copy link
    Member Author

    PyBytes_AS_STRING() and PyByteArray_AS_STRING() are used to modify string characters, but not used directly as l-value.

    Search in PyPI top 5000 packages:

    $ ./search_pypi_top_5000.sh '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]'
    pypi-top-5000_2021-08-17/plyvel-1.3.0.tar.gz
    pypi-top-5000_2021-08-17/numexpr-2.7.3.tar.gz
    pypi-top-5000_2021-08-17/Cython-0.29.24.tar.gz

    numexpr-2.7.3, numexpr/numexpr_object.cpp:

    PyBytes_AS_STRING(constsig)[i] = 'b';
    PyBytes_AS_STRING(constsig)[i] = 'i';
    PyBytes_AS_STRING(constsig)[i] = 'l';
    PyBytes_AS_STRING(constsig)[i] = 'f';
    PyBytes_AS_STRING(constsig)[i] = 'd';
    PyBytes_AS_STRING(constsig)[i] = 'c';
    PyBytes_AS_STRING(constsig)[i] = 's';

    plyvel-1.3.0, plyvel/_plyvel.cpp:

    PyByteArray_AS_STRING(string)[i] = (char) v;
    PyByteArray_AS_STRING(string)[i] = (char) v;

    Cython-0.29.24:

    $ grep -E '(PyByteArray|PyBytes)_AS_.*[^!<>=]=[^=]' -R .
    ./Cython/Utility/StringTools.c:            PyByteArray_AS_STRING(string)[i] = (char) v;
    ./Cython/Utility/StringTools.c:        PyByteArray_AS_STRING(string)[i] = (char) v;
    ./Cython/Utility/StringTools.c:            PyByteArray_AS_STRING(bytearray)[n] = value;

    @vstinner
    Copy link
    Member Author

    New changeset c19c3a0 by Victor Stinner in branch 'main':
    bpo-45476: Add _Py_RVALUE() macro (GH-29860)
    c19c3a0

    @vstinner
    Copy link
    Member Author

    I created this issue to disallow macros like PyFloat_AS_DOUBLE() and PyDict_GET_SIZE() as l-value. It seems like this change by itself is controversial.

    I proposed one way to implement this change: convert macros to static inline functions. I didn't expect that this conversion would be also controversial. For now, I abandon the static inline approach, to focus on the implementation which keeps macros: modify macros to use _Py_RVALUE() => PR 28976.

    Once the PR 28976 will be merged and the PEP-670 will be accepted, we can reconsider converting these macros to static inline functions, then it should be non controversial.

    @vstinner
    Copy link
    Member Author

    New changeset 4b97d97 by Victor Stinner in branch 'main':
    bpo-45476: Disallow using asdl_seq_GET() as l-value (GH-29866)
    4b97d97

    @vstinner
    Copy link
    Member Author

    I wrote PEP-674 "Disallow using macros as l-value" for this change: https://python.github.io/peps/pep-0674/

    @vstinner vstinner changed the title [C API] Disallow using PyFloat_AS_DOUBLE() as l-value [C API] PEP 674: Disallow using macros as l-value Nov 30, 2021
    @vstinner vstinner changed the title [C API] Disallow using PyFloat_AS_DOUBLE() as l-value [C API] PEP 674: Disallow using macros as l-value Nov 30, 2021
    @vstinner
    Copy link
    Member Author

    In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG:

    M2Crypto-0.38.0/src/SWIG/_m2crypto_wrap.c and mecab-python3-1.0.4/src/MeCab/MeCab_wrap.cpp contain the function:

    SWIGINTERN PyGetSetDescrObject *
    SwigPyStaticVar_new_getset(PyTypeObject *type, PyGetSetDef *getset) {

      PyGetSetDescrObject *descr;
      descr = (PyGetSetDescrObject *)PyType_GenericAlloc(SwigPyStaticVar_Type(), 0);
      assert(descr);
      Py_XINCREF(type);
      PyDescr_TYPE(descr) = type;
      PyDescr_NAME(descr) = PyString_InternFromString(getset->name);
      descr->d_getset = getset;
      if (PyDescr_NAME(descr) == NULL) {
        Py_DECREF(descr);
        descr = NULL;
      }
      return descr;
    }

    @vstinner
    Copy link
    Member Author

    I found 4 projects using "Py_TYPE(obj) = new_type;" in the PyPI top 5000:

    mypy-0.910:

    • mypyc/lib-rt/misc_ops.c: Py_TYPE(template_) = &PyType_Type;
    • mypyc/lib-rt/misc_ops.c: Py_TYPE(t) = metaclass;

    recordclass-0.16.3:

    • lib/recordclass/_dataobject.c: Py_TYPE(op) = type;
    • lib/recordclass/_dataobject.c: Py_TYPE(op) = type;
    • lib/recordclass/_litetuple.c: // Py_TYPE(ob) = &PyLiteTupleType_Type;

    pysha3-1.0.2:

    datatable-1.0.0.tar.gz:

    • src/core/python/namedtuple.cc: Py_TYPE(v) = type.v;
    • src/core/python/tuple.cc: Py_TYPE(v_new) = v_type;

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 1, 2021

    In the PyPI top 5000 projects, I found 32 projects using "Py_SIZE(obj) =
    new_size": 8 of them are written manually, 24 use Cython.

    8 projects using "Py_SIZE(obj) = new_size":

    • guppy3-3.1.2: src/sets/bitset.c and src/sets/nodeset.c
    • mypy-0.910: list_resize() in mypyc/lib-rt/pythonsupport.h
    • pickle5-0.0.12: pickle5/_pickle.c
    • python-snappy-0.6.0: maybe_resize() in snappy/snappymodule.cc
    • recordclass-0.16.3: lib/recordclass/_dataobject.c + code generated by Cython
    • scipy-1.7.3: scipy/_lib/boost/boost/python/object/make_instance.hpp
    • zodbpickle-2.2.0: src/zodbpickle/_pickle_33.c
    • zstd-1.5.0.2: src/python-zstd.c

    24 projects using "Py_SIZE(obj) = new_size" generated by an outdated Cython:

    • Naked-0.1.31
    • Shapely-1.8.0
    • dedupe-hcluster-0.3.8
    • fastdtw-0.3.4
    • fuzzyset-0.0.19
    • gluonnlp-0.10.0
    • hdbscan-0.8.27
    • jenkspy-0.2.0
    • lightfm-1.16
    • neobolt-1.7.17
    • orderedset-2.0.3
    • ptvsd-4.3.2
    • py_spy-0.3.11
    • pyemd-0.5.1
    • pyhacrf-datamade-0.2.5
    • pyjq-2.5.2
    • pypcap-1.2.3
    • python-crfsuite-0.9.7
    • reedsolo-1.5.4
    • tables-3.6.1
    • thriftpy-0.3.9
    • thriftrw-1.8.1
    • tinycss-0.4
    • triangle-20200424

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 1, 2021

    Attached pep674_regex.py generated a regex to search for code incompatible with the PEP-674.

    To download PyPI top 5000, you can use my script:
    https://github.com/vstinner/misc/blob/main/cpython/download_pypi_top.py

    To grep a regex in tarball and ZIP archives, you can use the rg command:

    $ rg -zl REGEX DIRECTORY/*.{zip,gz,bz2,tgz}

    Or you can try my script:
    https://github.com/vstinner/misc/blob/main/cpython/search_pypi_top.py

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 1, 2021

    I updated my ./search_pypi_top_5000.py script to ignore files generated by Cython.

    On PyPI top 5000, I only found 16 projects impacted by the PEP-674 (16/5000 = 0.3%):

    • datatable-1.0.0
    • frozendict-2.1.1
    • guppy3-3.1.2
    • M2Crypto-0.38.0
    • mecab-python3-1.0.4
    • mypy-0.910
    • Naked-0.1.31
    • pickle5-0.0.12
    • pycurl-7.44.1
    • PyGObject-3.42.0
    • pysha3-1.0.2
    • python-snappy-0.6.0
    • recordclass-0.16.3
    • scipy-1.7.3
    • zodbpickle-2.2.0
    • zstd-1.5.0.2

    I ignored manually two false positives in 3 projects:

    • "#define __Pyx_SET_SIZE(obj, size) Py_SIZE(obj) = (size)" in Cython
    • "* Py_TYPE(obj) = new_type must be replaced with Py_SET_TYPE(obj, new_type)": comment in psycopg2 and psycopg2-binary

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 1, 2021

    Oops, sorry, pycurl-7.44.1 and PyGObject-3.42.0 are not affected, they only define Py_SET_TYPE() macro for backward compatibility. So right now, only 14 projects are affected.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 2, 2021

    • zstd-1.5.0.2: src/python-zstd.c

    I proposed a fix upstream: sergey-dryabzhinsky/python-zstd#70

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 2, 2021

    frozendict-2.1.1

    If I understand correctly, this module is compatible with the PEP-674, it only has to copy Python 3.11 header files once Python 3.11 is released, to port the project to Python 3.11.

    Incompatable code is not part of the "frozendict" implementation, but only in copies of the Python header files (Python 3.6, 3.7, 3.8, 3.9 and 3.10). For example, it contains the frozendict/src/3_10/cpython_src/Include/object.h header: copy of CPython Include/object.h file.

    Source code: https://github.com/Marco-Sulla/python-frozendict

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 2, 2021

    pysha3-1.0.2

    This module must not be used on Python 3.6 and newer which has a built-in support for SHA-3 hash functions. Example:

    $ python3.6
    Python 3.6.15 (default, Sep  5 2021, 00:00:00) 
    >>> import hashlib
    >>> h=hashlib.new('sha3_224'); h.update(b'hello'); print(h.hexdigest())
    b87f88c72702fff1748e58b87e9141a42c0dbedc29a78cb0d4a5cd81

    By the way, building pysha3 on Python 3.11 now fails with:

    [Modules/_sha3/backport.inc:78](https://github.com/python/cpython/blob/main/Modules/_sha3/backport.inc#L78):10: fatal error: pystrhex.h: No such file or directory
    

    The pystrhex.h header file has been removed in Python 3.11 by bpo-45434. But I don't think that it's worth it trying to port it to Python 3.11, if the module must not be used on Python 3.6 and newer.

    Environment markers can be used to skip the pysha3 dependency on Python 3.6 on newer.

    Example: "pysha3; python_version < '3.6'"

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 2, 2021

    Naked-0.1.31

    Affected code is only code generated by Cython: the project only has to regenerated code with a recent Cython version.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 2, 2021

    mypy-0.910

    I proposed a fix: python/mypy#11652

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 6, 2021

    In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG

    I proposed a first PR for Py_TYPE():
    swig/swig#2116

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 6, 2021

    python-snappy-0.6.0: maybe_resize() in snappy/snappymodule.cc

    I proposed a fix: intake/python-snappy#114

    @vstinner
    Copy link
    Member Author

    In the PyPI top 5000, I found two projects using PyDescr_TYPE() and PyDescr_NAME() as l-value: M2Crypto and mecab-python3. In both cases, it was code generated by SWIG

    I created bpo-46538 "[C API] Make the PyDescrObject structure opaque" to handle PyDescr_NAME() and PyDescr_TYPE() macros. But IMO it's not really worth it to make the PyDescrObject structure opaque. It's just too much work, whereas PyDescrObject is not performance sensitive. It's ok to continue exposing this structure in public for now.

    I will exclude PyDescr_NAME() and PyDescr_TYPE() from the PEP-674.

    @vstinner
    Copy link
    Member Author

    datatable-1.0.0.tar.gz

    I created h2oai/datatable#3231

    @vstinner
    Copy link
    Member Author

    pickle5-0.0.12: pickle5/_pickle.c

    This project is a backport targeting Python 3.7 and older. I'm not sure if it makes sense to update to it to Python 3.11.

    It's the same for pysha3 which targets Python <= 3.5.

    @vstinner
    Copy link
    Member Author

    • guppy3-3.1.2: src/sets/bitset.c and src/sets/nodeset.c

    I created: zhuyifei1999/guppy3#40

    @vstinner
    Copy link
    Member Author

    • scipy-1.7.3: scipy/_lib/boost/boost/python/object/make_instance.hpp

    This is a vendored the Boost.org python module which has already been fixed in boost 1.78.0 (commit: January 2021) by:
    boostorg/python@500194e

    scipy should just update its scipy/_lib/boost copy.

    @vstinner
    Copy link
    Member Author

    recordclass-0.16.3: lib/recordclass/_dataobject.c + code generated by Cython

    I created: https://bitbucket.org/intellimath/recordclass/pull-requests/1/python-311-support-use-py_set_size

    @vstinner
    Copy link
    Member Author

    • zodbpickle-2.2.0: src/zodbpickle/_pickle_33.c

    Technically, zodbpickle works on Python 3.11 and is not impacted by the Py_SIZE() change.

    _pickle_33.c redefines the Py_SIZE() macro to continue using as an l-value:
    zopefoundation/zodbpickle@8d99afc

    I proposed a PR to use Py_SET_SIZE() explicitly:
    zopefoundation/zodbpickle#64

    @vstinner vstinner changed the title [C API] PEP 674: Disallow using macros as l-value [C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value Jan 27, 2022
    @vstinner vstinner changed the title [C API] PEP 674: Disallow using macros as l-value [C API] PEP 674: Disallow using macros (Py_TYPE and Py_SIZE) as l-value Jan 27, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kigawas
    Copy link

    kigawas commented Jul 27, 2022

    pysha3-1.0.2

    This module must not be used on Python 3.6 and newer which has a built-in support for SHA-3 hash functions. Example:

    $ python3.6
    Python 3.6.15 (default, Sep  5 2021, 00:00:00) 
    >>> import hashlib
    >>> h=hashlib.new('sha3_224'); h.update(b'hello'); print(h.hexdigest())
    b87f88c72702fff1748e58b87e9141a42c0dbedc29a78cb0d4a5cd81
    

    By the way, building pysha3 on Python 3.11 now fails with:

    [Modules/_sha3/backport.inc:78](https://github.com/python/cpython/blob/main/Modules/_sha3/backport.inc#L78):10: fatal error: pystrhex.h: No such file or directory
    

    The pystrhex.h header file has been removed in Python 3.11 by bpo-45434. But I don't think that it's worth it trying to port it to Python 3.11, if the module must not be used on Python 3.6 and newer.

    Environment markers can be used to skip the pysha3 dependency on Python 3.6 on newer.

    Example: "pysha3; python_version < '3.6'"

    Strictly speaking, pysha3 is not equivalent to hashlib because keccak functions are different. If the improvement in #83720 is done, pysha3 may be completely replaced.

    However, since pystrhex.h removed from Python 3.11, the pycryptodome is the only legit library supporting Ethereum keccak.

    @vstinner
    Copy link
    Member Author

    vstinner commented Aug 3, 2022

    However, since pystrhex.h removed from Python 3.11

    Technically, these functions are still exported and remain available in the internal C API: Include/internal/pycore_strhex.h

    If someone wants a public C API for these, it should be asked. But by default, the private functions must not be used outside Python itself: https://docs.python.org/dev/c-api/stable.html

    It's not hard to copy Python/pystrhex.c (174 lines) or reimplement it.

    @kigawas
    Copy link

    kigawas commented Aug 4, 2022

    @vstinner

    That's not practical because pysha3 is not actively being maintained. Many downstream libraries will break from 3.11.

    For anyone searching the errors, just use pycryptodome instead to save your time.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 3, 2022

    https://peps.python.org/pep-0674/ was deferred by the SC. For now, I prefer to close this issue.

    @vstinner vstinner closed this as completed Nov 3, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants