Fixing Copy on Writes from reference counting and immortal objects #84436

eduardo-elizondo · 2020-04-11T15:40:22Z

BPO	40255
Nosy	@tim-one, @nascheme, @gpshead, @pitrou, @vstinner, @carljm, @DinoV, @methane, @markshannon, @ericsnowcurrently, @zooba, @corona10, @pablogsal, @eduardo-elizondo, @remilapeyre, @shihai1991
PRs	gh-84436: Implement Immortal Objects #19474 bpo-43503: Make limited API objects effectively immutable. #24828 bpo-40255: Implement Immortal Instances - Optimization 1 #31488 bpo-40255: Implement Immortal Instances - Optimization 2 #31489 bpo-40255: Implement Immortal Instances - Optimization 3 #31490 bpo-40255: Implement Immortal Instances - Optimizations Combined #31491

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-04-11.15:40:21.809>
labels = ['interpreter-core', 'type-feature', '3.10']
title = 'Fixing Copy on Writes from reference counting and immortal objects'
updated_at = <Date 2022-02-22.16:42:58.428>
user = 'https://github.com/eduardo-elizondo'

bugs.python.org fields:

activity = <Date 2022-02-22.16:42:58.428>
actor = 'eric.snow'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2020-04-11.15:40:21.809>
creator = 'eelizondo'
dependencies = []
files = []
hgrepos = []
issue_num = 40255
keywords = ['patch']
message_count = 69.0
messages = ['366216', '366221', '366224', '366325', '366326', '366328', '366329', '366330', '366331', '366341', '366344', '366345', '366387', '366388', '366390', '366402', '366403', '366404', '366406', '366407', '366408', '366412', '366413', '366416', '366422', '366423', '366428', '366456', '366518', '366519', '366520', '366536', '366561', '366565', '366566', '366573', '366577', '366578', '366579', '366593', '366605', '366623', '366624', '366625', '366628', '366629', '366744', '366818', '366825', '366826', '366828', '366834', '366835', '366837', '366852', '379644', '379871', '379917', '388525', '388526', '388760', '410592', '410613', '412956', '413080', '413116', '413693', '413694', '413717']
nosy_count = 16.0
nosy_names = ['tim.peters', 'nascheme', 'gregory.p.smith', 'pitrou', 'vstinner', 'carljm', 'dino.viehland', 'methane', 'Mark.Shannon', 'eric.snow', 'steve.dower', 'corona10', 'pablogsal', 'eelizondo', 'remi.lapeyre', 'shihai1991']
pr_nums = ['19474', '24828', '31488', '31489', '31490', '31491']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue40255'
versions = ['Python 3.10']

Linked PRs

eduardo-elizondo · 2020-04-11T15:40:22Z

Copy on writes are a big problem in large Python application that rely on multiple processes sharing the same memory.

With the implementation of gc.freeze, we've attenuated the problem by removing the CoW coming from the GC Head. However, reference counting still causes CoW.

This introduces Immortal Instances which allows the user to bypass reference counting on specific objects and avoid CoW on forked processes.

Immortal Instances are specially useful for applications that cache heap objects in shared memory which live throughout the entire execution (i.e modules, code, bytecode, etc.)

gpshead · 2020-04-11T22:37:58Z

Thanks for sharing! It's good to have a patch implementing for others who might need it to try out.

We experimented with an implementation of this on CPython 2.7 that we called Eternal Refcounts for YouTube many years ago. For the same reason (saving memory in forked workers by not dirtying pages of known needed objects via refcount changes).

I don't have that old patch handy right now, but looking at yours it was the same concept: A magic refcount value to branch based on.

We wound up not deploying it or pushing for it in CPython because the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high. I'd have to go digging to find numbers but it _believe_ it was on the order of a 10% slowdown on real world application code. (microbenchmarks don't tell a good story, the python performance test suite should)

Given that most people's applications don't fork workers, I do not expect to see such an implementation ever become the default. It is a very appropriate hack, but the use case is limited to applications that don't use threads, are on POSIX, and always fork() many workers.

Also note that this is an ABI change as those INCREF and DECREF definitions are intentionally public .h file macros or inlines and thus included within any compiledcode rather than being an expensive function call elsewhere (vstinner's new alternate higher level API proposal presumably changes this). If an interpreter with this loaded a binary using the CPython API built against headers without it, your refcounts could get messed up.

eduardo-elizondo · 2020-04-12T01:01:33Z

the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high.

Yeah, makes sense. I guess it really depends on the specific profile of your application.

For Instagram this was an overall net positive change and we still have it in prod. The amount of page faults from Copy on Writes was so large that reducing it resulted in a net CPU win (even with the added branching). And of course, a huge reduction in memory usage.

Microbenchmarks don't tell a good story, the python performance test suite should

Agreed. I only added the Richards benchmark as a reference. I'm hoping someone can pick it up and have more concrete numbers on an average Python workload.

Given that most people's applications don't fork workers, I do not expect to see such an implementation ever become the default.

In any case, I gated this change with ifdefs. In case we don't have it by default, we can always can easily enable it with a simple -DPy_IMMORTAL_INSTANCES flag to the compiler.

Also note that this is an ABI change as those INCREF and DECREF definitions are intentionally public .h file

This has indeed been a bit of a pain for us as well. Due to how our tooling is set-up, there's a small number of third party libraries that are still causing Copy on Writes. Fortunately, that's the only drawback. Even if your immortalized object goes through an extension that has a different Py_{DECREF,INCREF} implementation, the reference count will still be so large that it will never be deallocated.

pablogsal · 2020-04-13T19:23:27Z

I run the pyperformance test suite with PGO + LTO + full cpu isolation in the speed.python.org machine and these were the results:

+-------------------------+--------------------------------------+---------------------------------------+
| Benchmark | master | immortal_refs_patch |
+=========================+======================================+=======================================+
| pidigits | 289 ms | 290 ms: 1.01x slower (+1%) |
+-------------------------+--------------------------------------+---------------------------------------+
| pickle | 20.2 us | 20.3 us: 1.01x slower (+1%) |
+-------------------------+--------------------------------------+---------------------------------------+
| xml_etree_parse | 253 ms | 258 ms: 1.02x slower (+2%) |
+-------------------------+--------------------------------------+---------------------------------------+
| json_loads | 52.0 us | 53.1 us: 1.02x slower (+2%) |
+-------------------------+--------------------------------------+---------------------------------------+
| scimark_fft | 708 ms | 723 ms: 1.02x slower (+2%) |
+-------------------------+--------------------------------------+---------------------------------------+
| python_startup_no_site | 7.83 ms | 8.04 ms: 1.03x slower (+3%) |
+-------------------------+--------------------------------------+---------------------------------------+
| scimark_sparse_mat_mult | 9.28 ms | 9.57 ms: 1.03x slower (+3%) |
+-------------------------+--------------------------------------+---------------------------------------+
| unpickle | 28.0 us | 28.9 us: 1.03x slower (+3%) |
+-------------------------+--------------------------------------+---------------------------------------+
| regex_effbot | 4.70 ms | 4.86 ms: 1.03x slower (+3%) |
+-------------------------+--------------------------------------+---------------------------------------+
| xml_etree_iterparse | 178 ms | 184 ms: 1.04x slower (+4%) |
+-------------------------+--------------------------------------+---------------------------------------+
| python_startup | 11.5 ms | 12.0 ms: 1.04x slower (+4%) |
+-------------------------+--------------------------------------+---------------------------------------+
| scimark_monte_carlo | 212 ms | 221 ms: 1.04x slower (+4%) |
+-------------------------+--------------------------------------+---------------------------------------+
| pathlib | 38.6 ms | 40.5 ms: 1.05x slower (+5%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sqlite_synth | 7.85 us | 8.26 us: 1.05x slower (+5%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sqlalchemy_imperative | 62.9 ms | 66.2 ms: 1.05x slower (+5%) |
+-------------------------+--------------------------------------+---------------------------------------+
| richards | 138 ms | 145 ms: 1.05x slower (+5%) |
+-------------------------+--------------------------------------+---------------------------------------+
| crypto_pyaes | 199 ms | 210 ms: 1.06x slower (+6%) |
+-------------------------+--------------------------------------+---------------------------------------+
| nbody | 249 ms | 265 ms: 1.06x slower (+6%) |
+-------------------------+--------------------------------------+---------------------------------------+
| raytrace | 937 ms | 995 ms: 1.06x slower (+6%) |
+-------------------------+--------------------------------------+---------------------------------------+
| deltablue | 13.4 ms | 14.2 ms: 1.06x slower (+6%) |
+-------------------------+--------------------------------------+---------------------------------------+
| tornado_http | 276 ms | 295 ms: 1.07x slower (+7%) |
+-------------------------+--------------------------------------+---------------------------------------+
| scimark_lu | 247 ms | 264 ms: 1.07x slower (+7%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sqlalchemy_declarative | 280 ms | 300 ms: 1.07x slower (+7%) |
+-------------------------+--------------------------------------+---------------------------------------+
| dulwich_log | 143 ms | 153 ms: 1.07x slower (+7%) |
+-------------------------+--------------------------------------+---------------------------------------+
| chaos | 210 ms | 226 ms: 1.07x slower (+7%) |
+-------------------------+--------------------------------------+---------------------------------------+
| 2to3 | 594 ms | 639 ms: 1.08x slower (+8%) |
+-------------------------+--------------------------------------+---------------------------------------+
| unpickle_list | 6.60 us | 7.10 us: 1.08x slower (+8%) |
+-------------------------+--------------------------------------+---------------------------------------+
| float | 213 ms | 229 ms: 1.08x slower (+8%) |
+-------------------------+--------------------------------------+---------------------------------------+
| pyflate | 1.27 sec | 1.37 sec: 1.08x slower (+8%) |
+-------------------------+--------------------------------------+---------------------------------------+
| go | 481 ms | 522 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| pickle_pure_python | 966 us | 1.05 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| meteor_contest | 175 ms | 191 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| genshi_xml | 148 ms | 161 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| chameleon | 20.6 ms | 22.5 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| nqueens | 191 ms | 208 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| telco | 14.6 ms | 16.0 ms: 1.09x slower (+9%) |
+-------------------------+--------------------------------------+---------------------------------------+
| logging_format | 22.1 us | 24.2 us: 1.10x slower (+10%) |
+-------------------------+--------------------------------------+---------------------------------------+
| regex_compile | 322 ms | 355 ms: 1.10x slower (+10%) |
+-------------------------+--------------------------------------+---------------------------------------+
| hexiom | 16.4 ms | 18.1 ms: 1.10x slower (+10%) |
+-------------------------+--------------------------------------+---------------------------------------+
| json_dumps | 25.6 ms | 28.3 ms: 1.10x slower (+10%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sympy_sum | 397 ms | 439 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sympy_integrate | 42.7 ms | 47.2 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| django_template | 120 ms | 133 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sympy_str | 633 ms | 702 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| xml_etree_generate | 162 ms | 180 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| spectral_norm | 244 ms | 272 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| xml_etree_process | 134 ms | 149 ms: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| logging_simple | 19.3 us | 21.5 us: 1.11x slower (+11%) |
+-------------------------+--------------------------------------+---------------------------------------+
| mako | 28.2 ms | 31.5 ms: 1.12x slower (+12%) |
+-------------------------+--------------------------------------+---------------------------------------+
| fannkuch | 853 ms | 954 ms: 1.12x slower (+12%) |
+-------------------------+--------------------------------------+---------------------------------------+
| sympy_expand | 983 ms | 1.11 sec: 1.13x slower (+13%) |
+-------------------------+--------------------------------------+---------------------------------------+
| genshi_text | 64.0 ms | 72.9 ms: 1.14x slower (+14%) |
+-------------------------+--------------------------------------+---------------------------------------+
| unpack_sequence | 123 ns | 142 ns: 1.16x slower (+16%) |
+-------------------------+--------------------------------------+---------------------------------------+
| logging_silent | 311 ns | 364 ns: 1.17x slower (+17%) |
+-------------------------+--------------------------------------+---------------------------------------+
| unpickle_pure_python | 581 us | 683 us: 1.17x slower (+17%) |
+-------------------------+--------------------------------------+---------------------------------------+

Not significant (2): pickle_list; regex_dna; pickle_dict; scimark_sor; regex_v8

pablogsal · 2020-04-13T19:31:50Z

System state
============

CPU: use 12 logical CPUs: 1,3,5,7,9,11,13,15,17,19,21,23
Perf event: Maximum sample rate: 1 per second
ASLR: Full randomization
Linux scheduler: Isolated CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
Linux scheduler: RCU disabled on CPUs (12/24): 1,3,5,7,9,11,13,15,17,19,21,23
CPU Frequency: 0,2,4,6,8,10,12,14,16,18,20,22=min=1600 MHz, max=3333 MHz; 1,3,5,7,9,13,15,17,19,21,23=min=max=3333 MHz
Turbo Boost (MSR): CPU 1,3,5,7,9,11,13,15,17,19,21,23: disabled
IRQ affinity: irqbalance service: inactive
IRQ affinity: Default IRQ affinity: CPU 0,2,4,6,8,10,12,14,16,18,20,22
IRQ affinity: IRQ affinity: IRQ 0,2=CPU 0-23; IRQ 1,3-15,17,20,22-23,28-51=CPU 0,2,4,6,8,10,12,14,16,18,20,22

zooba · 2020-04-13T19:35:16Z

If it's easy to set up, it might be interesting to see if there's a difference if (e.g.) all interned strings are marked as immortal, or all small ints. Or type objects. Maybe set a flag during initialization and treat everything created before that as immortal.

There's definitely going to be overhead, but assuming the branch is there we may as well use it to optimise our own operations.

pablogsal · 2020-04-13T20:06:53Z

Thanks, Eddie for sharing the patch. I think this is an interesting discussion: for instance, in the core dev sprint at Microsoft I recall that we had a quick discussion about having immortal references.

After analyzing the patch, and even assuming that we add conditional compilation, here are some of the things I feel uneasy about:

Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways
if reference cycles are created.
As Eddie mentions in the PR, this does not fully cover all cases as objects that become tracked by the GC
after they are modified (for instance, dicts and tuples that only contain immutable objects). Those objects will
still, participate in reference counting after they start to be tracked.
As Gregory mentions, if immortal objects are handed to extension modules compiled with the other version of the
macros, the reference count can be corrupted. This may break the garbage collector algorithm that relies on the
the balance between strong references between objects and its reference count to do the calculation of the isolated cycles.
This without mentioning that this defies the purpose of the patch in those cases, as a single modification of the reference
the count will trigger the copy of the whole memory page where the object lives.
The patch modifies very core mechanics of Python for the benefit of a restricted set of users (as Gregory mentions): no
threads, POSIX and forking.
Is not clear to me that leaking memory is the best way to solve this problem compared to other solutions (like having the
the reference count of the objects separated from them).
objects separated from them.

There is a trend right now to try to remove immortal objects (started by Victor and others) like static types and transforming them to heap types. I am afraid that having the restriction that the interpreter and the gc needs to be aware
of immortal types can raise considerably the maintenance costs of already complicated pieces of the CPython VM.

pablogsal · 2020-04-13T20:10:09Z

If it's easy to set up, it might be interesting to see if there's a difference if (e.g.) all interned strings are marked as immortal, or all small ints. Or type objects. Maybe set a flag during initialization and treat everything created before that as immortal.

I can look into it, the main problem is that every run of the pyperformance test suite takes several hours, even more with --full-pgo :( I will try to set up some automation for that when I have some free cycles

zooba · 2020-04-13T20:27:52Z

There is a trend right now to try to remove immortal objects (started by Victor and others) like static types and transforming them to heap types.

This isn't actually about removing immortal objects, but removing *mutable* objects that would be shared between subinterpreters. Making some objects immortal, such as interned strings or stateless singletons, would actually benefit this work, as they could then be safely shared between subinterpreters.

But as you say, the added complexity here is starting to seem like it'll cost more than it's worth...

vstinner · 2020-04-13T22:45:41Z

The idea of immortal objects was proposed to solve the bpo-39511 problem.

I dislike immortable for different reasons. Here are some.

Gregory:

We wound up not deploying it or pushing for it in CPython because the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high.

Exactly. Copy-on-Write problem affects a minority of users. If PR 19474 is enabled by default, all users will pay the overhead even if they never call fork() without exec().

1.17x slower on logging_silent or unpickle_pure_python is a very expensive price :-(

Pablo:

Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways if reference cycles are created.

gc.freeze() has a similar issue, no? This function also comes from Instagram specific use case ;-)

Steve:

This isn't actually about removing immortal objects, but removing *mutable* objects that would be shared between subinterpreters. Making some objects immortal, such as interned strings or stateless singletons, would actually benefit this work, as they could then be safely shared between subinterpreters.

From what I understood, we simply cannot share any object (mutable or not) between two subinterpreters. Otherwise, we will need to put a lock on all Py_INCREF/Py_DECREF operation which would kill performances of running multiple interpreters in parallel. Join bpo-39511 discussion.

pablogsal · 2020-04-13T22:51:55Z

gc.freeze() has a similar issue, no? This function also comes from Instagram specific use case ;-)

Not really because this patch is much more impacting. gc.freeze() moves objects into a permanent generation making them not participate in the GC so the only leaks are cycle-based. With this patch immortal object will leak every strong reference that they have even if they don't participate in cycles (but also if they participate in cycles).

vstinner · 2020-04-13T22:54:02Z

I would be more interested by an experiment to move ob_refcnt outside PyObject to solve the Copy-on-Write issue, especially to measure the impact on performances.

zooba · 2020-04-14T13:55:01Z

> This isn't actually about removing immortal objects, but removing *mutable*
> objects that would be shared between subinterpreters. Making some objects
> immortal, such as interned strings or stateless singletons, would actually
> benefit this work, as they could then be safely shared between
> subinterpreters.

From what I understood, we simply cannot share any object (mutable or not)
between two subinterpreters. Otherwise, we will need to put a lock on all
Py_INCREF/Py_DECREF operation which would kill performances of running
multiple interpreters in parallel. Join bpo-39511 discussion.

That thread consists of everyone else telling you what I've just told you. Not sure that me telling you as well is going to help ;)

pitrou · 2020-04-14T14:23:59Z

I run the pyperformance test suite with PGO + LTO + full cpu isolation in the speed.python.org machine and these were the results:

Be mindful that synthetic benchmarks are probably a gentle case for branch prediction. Real-world performance on irregular workloads might be worse still (or perhaps better, who knows :-)).

pitrou · 2020-04-14T14:25:48Z

As a separate discussion, I would be interested to know whether the original use case (Eddie's) could be satisfied differently. It probably doesn't belong to this issue, though.

(Eddie, if you want to discuss this, feel free to e-mail me privately. I think Davin Potts - co-maintainer of multiprocessing who recently added some facilities around shared memory - might be interested as well.)

carljm · 2020-04-14T17:02:27Z

Anything that is touched by the immortal object will be leaked. This can also happen in obscure ways if reference cycles are created.

I think this is simply expected behavior if you choose to create immortal objects, and not really an issue. How could you have an immortal object that doesn't keep its strong references alive?

this does not fully cover all cases as objects that become tracked by the GC after they are modified (for instance, dicts and tuples that only contain immutable objects). Those objects will still participate in reference counting after they start to be tracked.

I think the last sentence here is not quite right. An immortalized object will never start participating in reference counting again after it is immortalized.

There are two cases. If at the time of calling immortalize_heap() you have a non-GC-tracked object that is also not reachable from any GC-tracked container, then it will not be immortalized at all, so will be unaffected. This is a side effect of the PR using the GC to find objects to immortalize.

If the non-GC-tracked object is reachable from a GC-tracked object (I believe this is by far the more common case), then it will be immortalized. If it later becomes GC-tracked, it will start participating in GC (but the immortal bit causes it to appear to the GC to have a very high reference count, so GC will never collect it or any cycle it is part of), but that will not cause it to start participating in reference counting again.

if immortal objects are handed to extension modules compiled with the other version of the macros, the reference count can be corrupted

I think the word "corrupted" makes this sound worse than it is in practice. What happens is just that the object is still effectively immortal (because the immortal bit is a very high bit), but the copy-on-write benefit is lost for the objects touched by old extensions.

1.17x slower on logging_silent or unpickle_pure_python is a very expensive price

Agreed. It seems the only way this makes sense is under an ifdef and off by default. CPython does a lot of that for debug features; this might be the first case of doing it for a performance feature?

I would be more interested by an experiment to move ob_refcnt outside PyObject to solve the Copy-on-Write issue

It would certainly be interesting to see results of such an experiment. We haven't tried that for refcounts, but in the work that led to gc.freeze() we did try relocating the GC header to a side location. We abandoned that because the memory overhead of adding a single indirection pointer to every PyObject was too large to even consider the option further. I suspect that this memory overhead issue and/or likely cache locality problems will make moving refcounts outside PyObject look much worse for performance than this immortal-instances patch does.

carljm · 2020-04-14T17:12:21Z

An immortalized object will never start participating in reference counting again after it is immortalized.

Well, "passed to an extension compiled with no-immortal headers" is an exception to this.

But for the "not GC tracked but later becomes GC tracked" case, it will not re-enter reference counting, only the GC.

carljm · 2020-04-14T17:27:00Z

This may break the garbage collector algorithm that relies on the balance between strong references between objects and its reference count to do the calculation of the isolated cycles.

I don't think it really breaks anything. What happens is that the immortal object appears to the GC to have a very large reference count, even after adjusting for within-cycle references. So cycles including an immortal object are always kept alive, which is exactly the behavior one should expect from an immortal object.

DinoV · 2020-04-14T17:30:24Z

I think there's other cases of performance related features being hidden under an ifdef. Computed gotos show up that way, although probably more because it's a compiler extension that's not supported everywhere. Pymalloc is also very similar in that it implies an ABI change as well.

I wonder if it would be worth it to introduce an ABI flag for this as well? On the one hand is it a slightly different contract, on the other hand using extensions that don't support the immortalization actually work just fine.

pablogsal · 2020-04-14T17:42:58Z

I think this is simply expected behavior if you choose to create immortal objects, and not really an issue. How could you have an immortal object that doesn't keep its strong references alive?

I think I didn't express myself very well: I am saying that I perceive this as a problem because when you immortalize the heap at a given time you will also immortalize any strong reference that these objects made afterwards. The immortal property will be "infecting" other objects as strong references are created.

I think the last sentence here is not quite right. An immortalized object will never start participating in reference counting again after it is immortalized.

Yeah, but the objects I mention will not be immortalized. These are objects that were not tracked when the immortalization is done and are not reachable from tracked objects at the moment and become tracked afterwards (for instance, some dictionary that only had immutable objects when the immortalization was done). And this is always very impacting: a single object that participates in refcounting will copy an entire page of memory. Although I agree this is a rare thing is a possible thing that can happen.

I think the word "corrupted" makes this sound worse than it is in practice. What happens is just that the object is still effectively immortal (because the immortal bit is a very high bit), but the copy-on-write benefit is lost for the objects touched by old extensions.

Yeah, I agree that corrupted may sound a bit more dramatic than it should but I think this is still a problem because when it happens (as mentioned before) it affects entire pages and not the single object that we consider.

But for the "not GC tracked but later becomes GC tracked" case, it will not re-enter reference counting, only the GC.

But if said objects (isolated and untracked before and now tracked) acquire strong references to immortal objects, those objects will be visited when the gc starts calculating the isolated cycles and that requires a balanced reference count to work.

zooba · 2020-04-14T17:50:14Z

There hasn't been much said about which kind of objects would be immortal. Could this be a creation-time option? Or a type object (similar to a no-op tp_dealloc)?

If it's something that is known when the object is created, then some of the "infection" concern can be reduced - but if you wanted to arbitrarily make arbitrary objects immortal dynamically then it's a different matter.

Another benefit of knowing at creation time is that immortal objects could be allocated to a different page (or potentially even to read-only shared memory, if that was appropriate).

pablogsal · 2020-04-14T17:51:33Z

There hasn't been much said about which kind of objects would be immortal.

The current patch makes *all objects that are tracked by the gc* immortal, independently of their type (unless I am missing something).

pablogsal · 2020-04-14T17:52:38Z

all objects that are tracked by the gc

also, all the objects that are reachable from objects tracked by the gc.

gpshead · 2020-04-14T18:09:21Z

Marking everything as immortal/eternal after you've loaded your common code & data but before you do your forking is thenorm for this kind of serving system.

You already presumably have a defined lifetime for the processes so they'll likely be set to die within N hours or days. Memory leaks of too many things persisting is a non-issue in this kind of system.

The alternative of trying to pick and choose exactly what (and anything they reference) sticks around is a maintenance nightmare exercise in futility.

This is the implementation of PEP683 Motivation: The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime. Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.

…-104054) This also does some cleanup.

* main: (463 commits) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) pythongh-104057: Fix direct invocation of test_module (pythonGH-104059) pythongh-100458: Clarify Enum.__format__() change of mixed-in types in the whatsnew/3.11.rst (pythonGH-100387) pythongh-104018: disallow "z" format specifier in %-format of byte strings (pythonGH-104033) pythongh-104016: Fixed off by 1 error in f string tokenizer (python#104047) pythonGH-103629: Update Unpack's repr in compliance with PEP 692 (python#104048) pythongh-102799: replace sys.exc_info by sys.exception in inspect and traceback modules (python#104032) Fix typo in "expected" word in few source files (python#104034) pythongh-103824: fix use-after-free error in Parser/tokenizer.c (python#103993) pythongh-104035: Do not ignore user-defined `__{get,set}state__` in slotted frozen dataclasses (python#104041) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) ...

* main: pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827) pythongh-65022: Fix description of tuple return value in copyreg (python#103892) pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526) pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962) pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744) pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999) pythonGH-103484: Fix redirected permanently URLs (python#104001) Improve assert_type phrasing (python#104081) pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998) pythonGH-103472: close response in HTTPConnection._tunnel (python#103473) pythongh-88496: IDLE - fix another test on macOS (python#104075) pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074) pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072) pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071)

* main: (760 commits) pythonGH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (pythonGH-104103) pythonGH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (pythonGH-104105) pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827) pythongh-65022: Fix description of tuple return value in copyreg (python#103892) pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526) pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962) pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744) pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999) pythonGH-103484: Fix redirected permanently URLs (python#104001) Improve assert_type phrasing (python#104081) pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998) pythonGH-103472: close response in HTTPConnection._tunnel (python#103473) pythongh-88496: IDLE - fix another test on macOS (python#104075) pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074) pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072) pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) ...

…ecoming immortal

…g immortal (#105195)

…ecoming immortal (pythonGH-105195) (cherry picked from commit a239272) Co-authored-by: Irit Katriel <[email protected]>

…ecoming immortal (python#105195)

…becoming immortal (GH-105195) (#105977) gh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis becoming immortal (GH-105195) (cherry picked from commit a239272) Co-authored-by: Irit Katriel <[email protected]>

bhack · 2024-03-03T23:39:07Z

Any update here? What is the current status?

methane · 2024-03-04T02:13:39Z

I think we can close this issue because we have immortal instance already.

eduardo-elizondo mannequin added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 11, 2020

gpshead added type-feature A feature request or enhancement labels Apr 11, 2020

This was referenced Apr 6, 2023

gh-84436: Add codemod for _PyVarObject_HEAD_IMMORTAL_INIT #103308

Closed

gh-84436: Add static flag in PyLongObject's lv_tag #103403

Closed

This was referenced Apr 25, 2023

gh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() #103817

Closed

gh-84436: Improve Immortalization for Builtin Types #103823

Draft

corona10 added a commit to corona10/cpython that referenced this issue Apr 28, 2023

pythongh-84436 Add intergration C API tests for immortal objects

44b5ac4

This was referenced Apr 28, 2023

gh-84436 Add integration C API tests for immortal objects #103962

Merged

gh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() #104054

Merged

ericsnowcurrently added a commit that referenced this issue May 1, 2023

gh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (gh…

59bc36a

…-104054) This also does some cleanup.

corona10 added a commit to corona10/cpython that referenced this issue May 2, 2023

pythongh-84436 Add intergration C API tests for immortal objects

aea3c18

corona10 added a commit to corona10/cpython that referenced this issue May 2, 2023

pythongh-84436 Add intergration C API tests for immortal objects

644e883

corona10 added a commit that referenced this issue May 2, 2023

gh-84436: Add integration C API tests for immortal objects (gh-103962)

d81ca7e

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 1, 2023

pythongh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis b…

07355e9

…ecoming immortal

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 1, 2023

pythongh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis b…

34f3010

…ecoming immortal

bedevere-bot mentioned this issue Jun 1, 2023

gh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis becoming immortal #105195

Merged

bedevere-bot mentioned this issue Jun 21, 2023

[3.12] gh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis becoming immortal (GH-105195) #105977

Merged

iritkatriel added a commit that referenced this issue Jun 21, 2023

gh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis becomin…

a239272

…g immortal (#105195)

bentasker pushed a commit to bentasker/cpython that referenced this issue Jun 23, 2023

pythongh-84436: update docs on Py_None/Py_True/Py_False/Py_Ellipsis b…

40a23a1

…ecoming immortal (python#105195)

bedevere-bot mentioned this issue Aug 3, 2023

GH-84436: Skip refcounting for known immortals #107605

Merged

brandtbucher added a commit that referenced this issue Aug 4, 2023

GH-84436: Skip refcounting for known immortals (GH-107605)

05a824f

This was referenced Mar 3, 2024

Memory Leak Found in Persistent DataLoader pytorch/pytorch#62066

Open

DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes pytorch/pytorch#13246

Open

itamaro closed this as completed Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Copy on Writes from reference counting and immortal objects #84436

Fixing Copy on Writes from reference counting and immortal objects #84436

eduardo-elizondo mannequin commented Apr 11, 2020 •

edited by bedevere-bot

Loading

eduardo-elizondo mannequin commented Apr 11, 2020

gpshead commented Apr 11, 2020

eduardo-elizondo mannequin commented Apr 12, 2020

pablogsal commented Apr 13, 2020

pablogsal commented Apr 13, 2020

zooba commented Apr 13, 2020

pablogsal commented Apr 13, 2020

pablogsal commented Apr 13, 2020

zooba commented Apr 13, 2020

vstinner commented Apr 13, 2020

pablogsal commented Apr 13, 2020

vstinner commented Apr 13, 2020

zooba commented Apr 14, 2020

pitrou commented Apr 14, 2020

pitrou commented Apr 14, 2020

carljm commented Apr 14, 2020

carljm commented Apr 14, 2020

carljm commented Apr 14, 2020

DinoV commented Apr 14, 2020

pablogsal commented Apr 14, 2020

zooba commented Apr 14, 2020

pablogsal commented Apr 14, 2020

pablogsal commented Apr 14, 2020

gpshead commented Apr 14, 2020

bhack commented Mar 3, 2024

methane commented Mar 4, 2024

Fixing Copy on Writes from reference counting and immortal objects #84436

Fixing Copy on Writes from reference counting and immortal objects #84436

Comments

eduardo-elizondo mannequin commented Apr 11, 2020 • edited by bedevere-bot Loading

Linked PRs

eduardo-elizondo mannequin commented Apr 11, 2020

gpshead commented Apr 11, 2020

eduardo-elizondo mannequin commented Apr 12, 2020

pablogsal commented Apr 13, 2020

pablogsal commented Apr 13, 2020

zooba commented Apr 13, 2020

pablogsal commented Apr 13, 2020

pablogsal commented Apr 13, 2020

zooba commented Apr 13, 2020

vstinner commented Apr 13, 2020

pablogsal commented Apr 13, 2020

vstinner commented Apr 13, 2020

zooba commented Apr 14, 2020

pitrou commented Apr 14, 2020

pitrou commented Apr 14, 2020

carljm commented Apr 14, 2020

carljm commented Apr 14, 2020

carljm commented Apr 14, 2020

DinoV commented Apr 14, 2020

pablogsal commented Apr 14, 2020

zooba commented Apr 14, 2020

pablogsal commented Apr 14, 2020

pablogsal commented Apr 14, 2020

gpshead commented Apr 14, 2020

bhack commented Mar 3, 2024

methane commented Mar 4, 2024

eduardo-elizondo mannequin commented Apr 11, 2020 •

edited by bedevere-bot

Loading