Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Copy on Writes from reference counting and immortal objects #84436

Closed
eduardo-elizondo mannequin opened this issue Apr 11, 2020 · 71 comments
Closed

Fixing Copy on Writes from reference counting and immortal objects #84436

eduardo-elizondo mannequin opened this issue Apr 11, 2020 · 71 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-multiprocessing type-feature A feature request or enhancement

Comments

@eduardo-elizondo
Copy link
Mannequin

eduardo-elizondo mannequin commented Apr 11, 2020

BPO 40255
Nosy @tim-one, @nascheme, @gpshead, @pitrou, @vstinner, @carljm, @DinoV, @methane, @markshannon, @ericsnowcurrently, @zooba, @corona10, @pablogsal, @eduardo-elizondo, @remilapeyre, @shihai1991
PRs
  • gh-84436: Implement Immortal Objects #19474
  • bpo-43503: Make limited API objects effectively immutable. #24828
  • bpo-40255: Implement Immortal Instances - Optimization 1 #31488
  • bpo-40255: Implement Immortal Instances - Optimization 2 #31489
  • bpo-40255: Implement Immortal Instances - Optimization 3 #31490
  • bpo-40255: Implement Immortal Instances - Optimizations Combined #31491
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-04-11.15:40:21.809>
    labels = ['interpreter-core', 'type-feature', '3.10']
    title = 'Fixing Copy on Writes from reference counting and immortal objects'
    updated_at = <Date 2022-02-22.16:42:58.428>
    user = 'https://github.com/eduardo-elizondo'

    bugs.python.org fields:

    activity = <Date 2022-02-22.16:42:58.428>
    actor = 'eric.snow'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2020-04-11.15:40:21.809>
    creator = 'eelizondo'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40255
    keywords = ['patch']
    message_count = 69.0
    messages = ['366216', '366221', '366224', '366325', '366326', '366328', '366329', '366330', '366331', '366341', '366344', '366345', '366387', '366388', '366390', '366402', '366403', '366404', '366406', '366407', '366408', '366412', '366413', '366416', '366422', '366423', '366428', '366456', '366518', '366519', '366520', '366536', '366561', '366565', '366566', '366573', '366577', '366578', '366579', '366593', '366605', '366623', '366624', '366625', '366628', '366629', '366744', '366818', '366825', '366826', '366828', '366834', '366835', '366837', '366852', '379644', '379871', '379917', '388525', '388526', '388760', '410592', '410613', '412956', '413080', '413116', '413693', '413694', '413717']
    nosy_count = 16.0
    nosy_names = ['tim.peters', 'nascheme', 'gregory.p.smith', 'pitrou', 'vstinner', 'carljm', 'dino.viehland', 'methane', 'Mark.Shannon', 'eric.snow', 'steve.dower', 'corona10', 'pablogsal', 'eelizondo', 'remi.lapeyre', 'shihai1991']
    pr_nums = ['19474', '24828', '31488', '31489', '31490', '31491']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40255'
    versions = ['Python 3.10']

    Linked PRs

    @eduardo-elizondo
    Copy link
    Mannequin Author

    eduardo-elizondo mannequin commented Apr 11, 2020

    Copy on writes are a big problem in large Python application that rely on multiple processes sharing the same memory.

    With the implementation of gc.freeze, we've attenuated the problem by removing the CoW coming from the GC Head. However, reference counting still causes CoW.

    This introduces Immortal Instances which allows the user to bypass reference counting on specific objects and avoid CoW on forked processes.

    Immortal Instances are specially useful for applications that cache heap objects in shared memory which live throughout the entire execution (i.e modules, code, bytecode, etc.)

    @eduardo-elizondo eduardo-elizondo mannequin added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 11, 2020
    @gpshead
    Copy link
    Member

    gpshead commented Apr 11, 2020

    Thanks for sharing! It's good to have a patch implementing for others who might need it to try out.

    We experimented with an implementation of this on CPython 2.7 that we called Eternal Refcounts for YouTube many years ago. For the same reason (saving memory in forked workers by not dirtying pages of known needed objects via refcount changes).

    I don't have that old patch handy right now, but looking at yours it was the same concept: A magic refcount value to branch based on.

    We wound up not deploying it or pushing for it in CPython because the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high. I'd have to go digging to find numbers but it _believe_ it was on the order of a 10% slowdown on real world application code. (microbenchmarks don't tell a good story, the python performance test suite should)

    Given that most people's applications don't fork workers, I do not expect to see such an implementation ever become the default. It is a very appropriate hack, but the use case is limited to applications that don't use threads, are on POSIX, and always fork() many workers.

    Also note that this is an ABI change as those INCREF and DECREF definitions are intentionally public .h file macros or inlines and thus included within any compiledcode rather than being an expensive function call elsewhere (vstinner's new alternate higher level API proposal presumably changes this). If an interpreter with this loaded a binary using the CPython API built against headers without it, your refcounts could get messed up.

    @gpshead gpshead added type-feature A feature request or enhancement labels Apr 11, 2020
    @eduardo-elizondo
    Copy link
    Mannequin Author

    eduardo-elizondo mannequin commented Apr 12, 2020

    the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high.

    Yeah, makes sense. I guess it really depends on the specific profile of your application.

    For Instagram this was an overall net positive change and we still have it in prod. The amount of page faults from Copy on Writes was so large that reducing it resulted in a net CPU win (even with the added branching). And of course, a huge reduction in memory usage.

    Microbenchmarks don't tell a good story, the python performance test suite should

    Agreed. I only added the Richards benchmark as a reference. I'm hoping someone can pick it up and have more concrete numbers on an average Python workload.

    Given that most people's applications don't fork workers, I do not expect to see such an implementation ever become the default.

    In any case, I gated this change with ifdefs. In case we don't have it by default, we can always can easily enable it with a simple -DPy_IMMORTAL_INSTANCES flag to the compiler.

    Also note that this is an ABI change as those INCREF and DECREF definitions are intentionally public .h file

    This has indeed been a bit of a pain for us as well. Due to how our tooling is set-up, there's a small number of third party libraries that are still causing Copy on Writes. Fortunately, that's the only drawback. Even if your immortalized object goes through an extension that has a different Py_{DECREF,INCREF} implementation, the reference count will still be so large that it will never be deallocated.

    @pablogsal
    Copy link
    Member

    I run the pyperformance test suite with PGO + LTO + full cpu isolation in the speed.python.org machine and these were the results:

    +-------------------------+--------------------------------------+---------------------------------------+
    | Benchmark | master | immortal_refs_patch |
    +=========================+======================================+=======================================+
    | pidigits | 289 ms | 290 ms: 1.01x slower (+1%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | pickle | 20.2 us | 20.3 us: 1.01x slower (+1%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | xml_etree_parse | 253 ms | 258 ms: 1.02x slower (+2%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | json_loads | 52.0 us | 53.1 us: 1.02x slower (+2%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | scimark_fft | 708 ms | 723 ms: 1.02x slower (+2%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | python_startup_no_site | 7.83 ms | 8.04 ms: 1.03x slower (+3%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | scimark_sparse_mat_mult | 9.28 ms | 9.57 ms: 1.03x slower (+3%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | unpickle | 28.0 us | 28.9 us: 1.03x slower (+3%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | regex_effbot | 4.70 ms | 4.86 ms: 1.03x slower (+3%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | xml_etree_iterparse | 178 ms | 184 ms: 1.04x slower (+4%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | python_startup | 11.5 ms | 12.0 ms: 1.04x slower (+4%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | scimark_monte_carlo | 212 ms | 221 ms: 1.04x slower (+4%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | pathlib | 38.6 ms | 40.5 ms: 1.05x slower (+5%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sqlite_synth | 7.85 us | 8.26 us: 1.05x slower (+5%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sqlalchemy_imperative | 62.9 ms | 66.2 ms: 1.05x slower (+5%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | richards | 138 ms | 145 ms: 1.05x slower (+5%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | crypto_pyaes | 199 ms | 210 ms: 1.06x slower (+6%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | nbody | 249 ms | 265 ms: 1.06x slower (+6%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | raytrace | 937 ms | 995 ms: 1.06x slower (+6%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | deltablue | 13.4 ms | 14.2 ms: 1.06x slower (+6%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | tornado_http | 276 ms | 295 ms: 1.07x slower (+7%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | scimark_lu | 247 ms | 264 ms: 1.07x slower (+7%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sqlalchemy_declarative | 280 ms | 300 ms: 1.07x slower (+7%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | dulwich_log | 143 ms | 153 ms: 1.07x slower (+7%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | chaos | 210 ms | 226 ms: 1.07x slower (+7%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | 2to3 | 594 ms | 639 ms: 1.08x slower (+8%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | unpickle_list | 6.60 us | 7.10 us: 1.08x slower (+8%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | float | 213 ms | 229 ms: 1.08x slower (+8%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | pyflate | 1.27 sec | 1.37 sec: 1.08x slower (+8%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | go | 481 ms | 522 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | pickle_pure_python | 966 us | 1.05 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | meteor_contest | 175 ms | 191 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | genshi_xml | 148 ms | 161 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | chameleon | 20.6 ms | 22.5 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | nqueens | 191 ms | 208 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | telco | 14.6 ms | 16.0 ms: 1.09x slower (+9%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | logging_format | 22.1 us | 24.2 us: 1.10x slower (+10%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | regex_compile | 322 ms | 355 ms: 1.10x slower (+10%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | hexiom | 16.4 ms | 18.1 ms: 1.10x slower (+10%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | json_dumps | 25.6 ms | 28.3 ms: 1.10x slower (+10%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sympy_sum | 397 ms | 439 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sympy_integrate | 42.7 ms | 47.2 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | django_template | 120 ms | 133 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sympy_str | 633 ms | 702 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | xml_etree_generate | 162 ms | 180 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | spectral_norm | 244 ms | 272 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | xml_etree_process | 134 ms | 149 ms: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | logging_simple | 19.3 us | 21.5 us: 1.11x slower (+11%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | mako | 28.2 ms | 31.5 ms: 1.12x slower (+12%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | fannkuch | 853 ms | 954 ms: 1.12x slower (+12%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | sympy_expand | 983 ms | 1.11 sec: 1.13x slower (+13%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | genshi_text | 64.0 ms | 72.9 ms: 1.14x slower (+14%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | unpack_sequence | 123 ns | 142 ns: 1.16x slower (+16%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | logging_silent | 311 ns | 364 ns: 1.17x slower (+17%) |
    +-------------------------+--------------------------------------+---------------------------------------+
    | unpickle_pure_python | 581 us | 683 us: 1.17x slower (+17%) |
    +-------------------------+--------------------------------------+---------------------------------------+

    Not significant (2): pickle_list; regex_dna; pickle_dict; scimark_sor; regex_v8

    @pablogsal
    Copy link
    Member

    System state
    ============

    CPU: use 12 logical CPUs: 1,3,5,7,9,11,13,15,17,19,21,23
    Perf event: Maximum sample rate: 1 per second
    ASLR: Full randomization
    Linux scheduler: Isolated CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
    Linux scheduler: RCU disabled on CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
    CPU Frequency: 0,2,4,6,8,10,12,14,16,18,20,22=min=1600 MHz, max=3333 MHz; 1,3,5,7,9,13,15,17,19,21,23=min=max=3333 MHz
    Turbo Boost (MSR): CPU 1,3,5,7,9,11,13,15,17,19,21,23: disabled
    IRQ affinity: irqbalance service: inactive
    IRQ affinity: Default IRQ affinity: CPU 0,2,4,6,8,10,12,14,16,18,20,22
    IRQ affinity: IRQ affinity: IRQ 0,2=CPU 0-23; IRQ 1,3-15,17,20,22-23,28-51=CPU 0,2,4,6,8,10,12,14,16,18,20,22

    @zooba
    Copy link
    Member

    zooba commented Apr 13, 2020

    If it's easy to set up, it might be interesting to see if there's a difference if (e.g.) all interned strings are marked as immortal, or all small ints. Or type objects. Maybe set a flag during initialization and treat everything created before that as immortal.

    There's definitely going to be overhead, but assuming the branch is there we may as well use it to optimise our own operations.

    @pablogsal
    Copy link
    Member

    Thanks, Eddie for sharing the patch. I think this is an interesting discussion: for instance, in the core dev sprint at Microsoft I recall that we had a quick discussion about having immortal references.

    After analyzing the patch, and even assuming that we add conditional compilation, here are some of the things I feel uneasy about:

    • Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways
      if reference cycles are created.

    • As Eddie mentions in the PR, this does not fully cover all cases as objects that become tracked by the GC
      after they are modified (for instance, dicts and tuples that only contain immutable objects). Those objects will
      still, participate in reference counting after they start to be tracked.

    • As Gregory mentions, if immortal objects are handed to extension modules compiled with the other version of the
      macros, the reference count can be corrupted. This may break the garbage collector algorithm that relies on the
      the balance between strong references between objects and its reference count to do the calculation of the isolated cycles.
      This without mentioning that this defies the purpose of the patch in those cases, as a single modification of the reference
      the count will trigger the copy of the whole memory page where the object lives.

    • The patch modifies very core mechanics of Python for the benefit of a restricted set of users (as Gregory mentions): no
      threads, POSIX and forking.

    • Is not clear to me that leaking memory is the best way to solve this problem compared to other solutions (like having the
      the reference count of the objects separated from them).
      objects separated from them.

    There is a trend right now to try to remove immortal objects (started by Victor and others) like static types and transforming them to heap types. I am afraid that having the restriction that the interpreter and the gc needs to be aware
    of immortal types can raise considerably the maintenance costs of already complicated pieces of the CPython VM.

    @pablogsal
    Copy link
    Member

    If it's easy to set up, it might be interesting to see if there's a difference if (e.g.) all interned strings are marked as immortal, or all small ints. Or type objects. Maybe set a flag during initialization and treat everything created before that as immortal.

    I can look into it, the main problem is that every run of the pyperformance test suite takes several hours, even more with --full-pgo :( I will try to set up some automation for that when I have some free cycles

    @zooba
    Copy link
    Member

    zooba commented Apr 13, 2020

    There is a trend right now to try to remove immortal objects (started by Victor and others) like static types and transforming them to heap types.

    This isn't actually about removing immortal objects, but removing *mutable* objects that would be shared between subinterpreters. Making some objects immortal, such as interned strings or stateless singletons, would actually benefit this work, as they could then be safely shared between subinterpreters.

    But as you say, the added complexity here is starting to seem like it'll cost more than it's worth...

    @vstinner
    Copy link
    Member

    The idea of immortal objects was proposed to solve the bpo-39511 problem.

    I dislike immortable for different reasons. Here are some.

    Gregory:

    We wound up not deploying it or pushing for it in CPython because the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high.

    Exactly. Copy-on-Write problem affects a minority of users. If PR 19474 is enabled by default, all users will pay the overhead even if they never call fork() without exec().

    1.17x slower on logging_silent or unpickle_pure_python is a very expensive price :-(

    Pablo:

    Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways if reference cycles are created.

    gc.freeze() has a similar issue, no? This function also comes from Instagram specific use case ;-)

    Steve:

    This isn't actually about removing immortal objects, but removing *mutable* objects that would be shared between subinterpreters. Making some objects immortal, such as interned strings or stateless singletons, would actually benefit this work, as they could then be safely shared between subinterpreters.

    From what I understood, we simply cannot share any object (mutable or not) between two subinterpreters. Otherwise, we will need to put a lock on all Py_INCREF/Py_DECREF operation which would kill performances of running multiple interpreters in parallel. Join bpo-39511 discussion.

    @pablogsal
    Copy link
    Member

    gc.freeze() has a similar issue, no? This function also comes from Instagram specific use case ;-)

    Not really because this patch is much more impacting. gc.freeze() moves objects into a permanent generation making them not participate in the GC so the only leaks are cycle-based. With this patch immortal object will leak every strong reference that they have even if they don't participate in cycles (but also if they participate in cycles).

    @vstinner
    Copy link
    Member

    I would be more interested by an experiment to move ob_refcnt outside PyObject to solve the Copy-on-Write issue, especially to measure the impact on performances.

    @zooba
    Copy link
    Member

    zooba commented Apr 14, 2020

    > This isn't actually about removing immortal objects, but removing *mutable*
    > objects that would be shared between subinterpreters. Making some objects
    > immortal, such as interned strings or stateless singletons, would actually
    > benefit this work, as they could then be safely shared between
    > subinterpreters.

    From what I understood, we simply cannot share any object (mutable or not)
    between two subinterpreters. Otherwise, we will need to put a lock on all
    Py_INCREF/Py_DECREF operation which would kill performances of running
    multiple interpreters in parallel. Join bpo-39511 discussion.

    That thread consists of everyone else telling you what I've just told you. Not sure that me telling you as well is going to help ;)

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2020

    I run the pyperformance test suite with PGO + LTO + full cpu isolation in the speed.python.org machine and these were the results:

    Be mindful that synthetic benchmarks are probably a gentle case for branch prediction. Real-world performance on irregular workloads might be worse still (or perhaps better, who knows :-)).

    @pitrou
    Copy link
    Member

    pitrou commented Apr 14, 2020

    As a separate discussion, I would be interested to know whether the original use case (Eddie's) could be satisfied differently. It probably doesn't belong to this issue, though.

    (Eddie, if you want to discuss this, feel free to e-mail me privately. I think Davin Potts - co-maintainer of multiprocessing who recently added some facilities around shared memory - might be interested as well.)

    @carljm
    Copy link
    Member

    carljm commented Apr 14, 2020

    Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways if reference cycles are created.

    I think this is simply expected behavior if you choose to create immortal objects, and not really an issue. How could you have an immortal object that doesn't keep its strong references alive?

    this does not fully cover all cases as objects that become tracked by the GC after they are modified (for instance, dicts and tuples that only contain immutable objects). Those objects will still participate in reference counting after they start to be tracked.

    I think the last sentence here is not quite right. An immortalized object will never start participating in reference counting again after it is immortalized.

    There are two cases. If at the time of calling immortalize_heap() you have a non-GC-tracked object that is also not reachable from any GC-tracked container, then it will not be immortalized at all, so will be unaffected. This is a side effect of the PR using the GC to find objects to immortalize.

    If the non-GC-tracked object is reachable from a GC-tracked object (I believe this is by far the more common case), then it will be immortalized. If it later becomes GC-tracked, it will start participating in GC (but the immortal bit causes it to appear to the GC to have a very high reference count, so GC will never collect it or any cycle it is part of), but that will not cause it to start participating in reference counting again.

    if immortal objects are handed to extension modules compiled with the other version of the macros, the reference count can be corrupted

    I think the word "corrupted" makes this sound worse than it is in practice. What happens is just that the object is still effectively immortal (because the immortal bit is a very high bit), but the copy-on-write benefit is lost for the objects touched by old extensions.

    1.17x slower on logging_silent or unpickle_pure_python is a very expensive price

    Agreed. It seems the only way this makes sense is under an ifdef and off by default. CPython does a lot of that for debug features; this might be the first case of doing it for a performance feature?

    I would be more interested by an experiment to move ob_refcnt outside PyObject to solve the Copy-on-Write issue

    It would certainly be interesting to see results of such an experiment. We haven't tried that for refcounts, but in the work that led to gc.freeze() we did try relocating the GC header to a side location. We abandoned that because the memory overhead of adding a single indirection pointer to every PyObject was too large to even consider the option further. I suspect that this memory overhead issue and/or likely cache locality problems will make moving refcounts outside PyObject look much worse for performance than this immortal-instances patch does.

    @carljm
    Copy link
    Member

    carljm commented Apr 14, 2020

    An immortalized object will never start participating in reference counting again after it is immortalized.

    Well, "passed to an extension compiled with no-immortal headers" is an exception to this.

    But for the "not GC tracked but later becomes GC tracked" case, it will not re-enter reference counting, only the GC.

    @carljm
    Copy link
    Member

    carljm commented Apr 14, 2020

    This may break the garbage collector algorithm that relies on the balance between strong references between objects and its reference count to do the calculation of the isolated cycles.

    I don't think it really breaks anything. What happens is that the immortal object appears to the GC to have a very large reference count, even after adjusting for within-cycle references. So cycles including an immortal object are always kept alive, which is exactly the behavior one should expect from an immortal object.

    @DinoV
    Copy link
    Contributor

    DinoV commented Apr 14, 2020

    I think there's other cases of performance related features being hidden under an ifdef. Computed gotos show up that way, although probably more because it's a compiler extension that's not supported everywhere. Pymalloc is also very similar in that it implies an ABI change as well.

    I wonder if it would be worth it to introduce an ABI flag for this as well? On the one hand is it a slightly different contract, on the other hand using extensions that don't support the immortalization actually work just fine.

    @pablogsal
    Copy link
    Member

    I think this is simply expected behavior if you choose to create immortal objects, and not really an issue. How could you have an immortal object that doesn't keep its strong references alive?

    I think I didn't express myself very well: I am saying that I perceive this as a problem because when you immortalize the heap at a given time you will also immortalize any strong reference that these objects made afterwards. The immortal property will be "infecting" other objects as strong references are created.

    I think the last sentence here is not quite right. An immortalized object will never start participating in reference counting again after it is immortalized.

    Yeah, but the objects I mention will not be immortalized. These are objects that were not tracked when the immortalization is done and are not reachable from tracked objects at the moment and become tracked afterwards (for instance, some dictionary that only had immutable objects when the immortalization was done). And this is always very impacting: a single object that participates in refcounting will copy an entire page of memory. Although I agree this is a rare thing is a possible thing that can happen.

    I think the word "corrupted" makes this sound worse than it is in practice. What happens is just that the object is still effectively immortal (because the immortal bit is a very high bit), but the copy-on-write benefit is lost for the objects touched by old extensions.

    Yeah, I agree that corrupted may sound a bit more dramatic than it should but I think this is still a problem because when it happens (as mentioned before) it affects entire pages and not the single object that we consider.

    But for the "not GC tracked but later becomes GC tracked" case, it will not re-enter reference counting, only the GC.

    But if said objects (isolated and untracked before and now tracked) acquire strong references to immortal objects, those objects will be visited when the gc starts calculating the isolated cycles and that requires a balanced reference count to work.

    @zooba
    Copy link
    Member

    zooba commented Apr 14, 2020

    There hasn't been much said about which kind of objects would be immortal. Could this be a creation-time option? Or a type object (similar to a no-op tp_dealloc)?

    If it's something that is known when the object is created, then some of the "infection" concern can be reduced - but if you wanted to arbitrarily make arbitrary objects immortal dynamically then it's a different matter.

    Another benefit of knowing at creation time is that immortal objects could be allocated to a different page (or potentially even to read-only shared memory, if that was appropriate).

    @pablogsal
    Copy link
    Member

    There hasn't been much said about which kind of objects would be immortal.

    The current patch makes *all objects that are tracked by the gc* immortal, independently of their type (unless I am missing something).

    @pablogsal
    Copy link
    Member

    all objects that are tracked by the gc

    also, all the objects that are reachable from objects tracked by the gc.

    @gpshead
    Copy link
    Member

    gpshead commented Apr 14, 2020

    Marking everything as immortal/eternal after you've loaded your common code & data but before you do your forking is thenorm for this kind of serving system.

    You already presumably have a defined lifetime for the processes so they'll likely be set to die within N hours or days. Memory leaks of too many things persisting is a non-issue in this kind of system.

    The alternative of trying to pick and choose exactly what (and anything they reference) sticks around is a maintenance nightmare exercise in futility.

    ericsnowcurrently pushed a commit that referenced this issue Apr 22, 2023
    This is the implementation of PEP683
    
    Motivation:
    
    The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime.
    
    Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
    corona10 added a commit to corona10/cpython that referenced this issue Apr 28, 2023
    carljm added a commit to carljm/cpython that referenced this issue May 1, 2023
    * main: (463 commits)
      pythongh-104057: Fix direct invocation of test_super (python#104064)
      pythongh-87092: Expose assembler to unit tests (python#103988)
      pythongh-97696: asyncio eager tasks factory (python#102853)
      pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054)
      pythongh-104057: Fix direct invocation of test_module (pythonGH-104059)
      pythongh-100458: Clarify Enum.__format__() change of mixed-in types in the whatsnew/3.11.rst (pythonGH-100387)
      pythongh-104018: disallow "z" format specifier in %-format of byte strings (pythonGH-104033)
      pythongh-104016: Fixed off by 1 error in f string tokenizer (python#104047)
      pythonGH-103629: Update Unpack's repr in compliance with PEP 692 (python#104048)
      pythongh-102799: replace sys.exc_info by sys.exception in inspect and traceback modules (python#104032)
      Fix typo in "expected" word in few source files (python#104034)
      pythongh-103824: fix use-after-free error in Parser/tokenizer.c (python#103993)
      pythongh-104035: Do not ignore user-defined `__{get,set}state__` in slotted frozen dataclasses (python#104041)
      pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030)
      pythongh-104036: Fix direct invocation of test_typing (python#104037)
      pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761)
      pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897)
      Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021)
      pythongh-88496: Fix IDLE test hang on macOS (python#104025)
      Improve int test coverage (python#104024)
      ...
    corona10 added a commit to corona10/cpython that referenced this issue May 2, 2023
    corona10 added a commit to corona10/cpython that referenced this issue May 2, 2023
    carljm added a commit to carljm/cpython that referenced this issue May 2, 2023
    * main:
      pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827)
      pythongh-65022: Fix description of tuple return value in copyreg (python#103892)
      pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526)
      pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962)
      pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744)
      pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999)
      pythonGH-103484: Fix redirected permanently URLs (python#104001)
      Improve assert_type phrasing (python#104081)
      pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998)
      pythonGH-103472: close response in HTTPConnection._tunnel (python#103473)
      pythongh-88496: IDLE - fix another test on macOS (python#104075)
      pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074)
      pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072)
      pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071)
    carljm added a commit to carljm/cpython that referenced this issue May 2, 2023
    * main: (760 commits)
      pythonGH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (pythonGH-104103)
      pythonGH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (pythonGH-104105)
      pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827)
      pythongh-65022: Fix description of tuple return value in copyreg (python#103892)
      pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526)
      pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962)
      pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744)
      pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999)
      pythonGH-103484: Fix redirected permanently URLs (python#104001)
      Improve assert_type phrasing (python#104081)
      pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998)
      pythonGH-103472: close response in HTTPConnection._tunnel (python#103473)
      pythongh-88496: IDLE - fix another test on macOS (python#104075)
      pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074)
      pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072)
      pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071)
      pythongh-104057: Fix direct invocation of test_super (python#104064)
      pythongh-87092: Expose assembler to unit tests (python#103988)
      pythongh-97696: asyncio eager tasks factory (python#102853)
      pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054)
      ...
    iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 1, 2023
    iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 1, 2023
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 21, 2023
    …ecoming immortal (pythonGH-105195)
    
    (cherry picked from commit a239272)
    
    Co-authored-by: Irit Katriel <[email protected]>
    bentasker pushed a commit to bentasker/cpython that referenced this issue Jun 23, 2023
    kumaraditya303 pushed a commit that referenced this issue Jun 26, 2023
    …becoming immortal (GH-105195) (#105977)
    
    gh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis becoming immortal (GH-105195)
    (cherry picked from commit a239272)
    
    Co-authored-by: Irit Katriel <[email protected]>
    @bhack
    Copy link

    bhack commented Mar 3, 2024

    Any update here? What is the current status?

    @methane
    Copy link
    Member

    methane commented Mar 4, 2024

    I think we can close this issue because we have immortal instance already.

    @itamaro itamaro closed this as completed Mar 4, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-multiprocessing type-feature A feature request or enhancement
    Projects
    Development

    No branches or pull requests