-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing Copy on Writes from reference counting and immortal objects #84436
Comments
Copy on writes are a big problem in large Python application that rely on multiple processes sharing the same memory. With the implementation of This introduces Immortal Instances which allows the user to bypass reference counting on specific objects and avoid CoW on forked processes. Immortal Instances are specially useful for applications that cache heap objects in shared memory which live throughout the entire execution (i.e modules, code, bytecode, etc.) |
Thanks for sharing! It's good to have a patch implementing for others who might need it to try out. We experimented with an implementation of this on CPython 2.7 that we called Eternal Refcounts for YouTube many years ago. For the same reason (saving memory in forked workers by not dirtying pages of known needed objects via refcount changes). I don't have that old patch handy right now, but looking at yours it was the same concept: A magic refcount value to branch based on. We wound up not deploying it or pushing for it in CPython because the CPU performance implications of adding a branch instruction to Py_INCREC and Py_DECREF were, unsurprisingly, quite high. I'd have to go digging to find numbers but it _believe_ it was on the order of a 10% slowdown on real world application code. (microbenchmarks don't tell a good story, the python performance test suite should) Given that most people's applications don't fork workers, I do not expect to see such an implementation ever become the default. It is a very appropriate hack, but the use case is limited to applications that don't use threads, are on POSIX, and always fork() many workers. Also note that this is an ABI change as those INCREF and DECREF definitions are intentionally public .h file macros or inlines and thus included within any compiledcode rather than being an expensive function call elsewhere (vstinner's new alternate higher level API proposal presumably changes this). If an interpreter with this loaded a binary using the CPython API built against headers without it, your refcounts could get messed up. |
Yeah, makes sense. I guess it really depends on the specific profile of your application. For Instagram this was an overall net positive change and we still have it in prod. The amount of page faults from Copy on Writes was so large that reducing it resulted in a net CPU win (even with the added branching). And of course, a huge reduction in memory usage.
Agreed. I only added the Richards benchmark as a reference. I'm hoping someone can pick it up and have more concrete numbers on an average Python workload.
In any case, I gated this change with ifdefs. In case we don't have it by default, we can always can easily enable it with a simple
This has indeed been a bit of a pain for us as well. Due to how our tooling is set-up, there's a small number of third party libraries that are still causing Copy on Writes. Fortunately, that's the only drawback. Even if your immortalized object goes through an extension that has a different Py_{DECREF,INCREF} implementation, the reference count will still be so large that it will never be deallocated. |
I run the pyperformance test suite with PGO + LTO + full cpu isolation in the speed.python.org machine and these were the results: +-------------------------+--------------------------------------+---------------------------------------+ Not significant (2): pickle_list; regex_dna; pickle_dict; scimark_sor; regex_v8 |
System state CPU: use 12 logical CPUs: 1,3,5,7,9,11,13,15,17,19,21,23 |
If it's easy to set up, it might be interesting to see if there's a difference if (e.g.) all interned strings are marked as immortal, or all small ints. Or type objects. Maybe set a flag during initialization and treat everything created before that as immortal. There's definitely going to be overhead, but assuming the branch is there we may as well use it to optimise our own operations. |
Thanks, Eddie for sharing the patch. I think this is an interesting discussion: for instance, in the core dev sprint at Microsoft I recall that we had a quick discussion about having immortal references. After analyzing the patch, and even assuming that we add conditional compilation, here are some of the things I feel uneasy about:
There is a trend right now to try to remove immortal objects (started by Victor and others) like static types and transforming them to heap types. I am afraid that having the restriction that the interpreter and the gc needs to be aware |
I can look into it, the main problem is that every run of the pyperformance test suite takes several hours, even more with --full-pgo :( I will try to set up some automation for that when I have some free cycles |
This isn't actually about removing immortal objects, but removing *mutable* objects that would be shared between subinterpreters. Making some objects immortal, such as interned strings or stateless singletons, would actually benefit this work, as they could then be safely shared between subinterpreters. But as you say, the added complexity here is starting to seem like it'll cost more than it's worth... |
The idea of immortal objects was proposed to solve the bpo-39511 problem. I dislike immortable for different reasons. Here are some. Gregory:
Exactly. Copy-on-Write problem affects a minority of users. If PR 19474 is enabled by default, all users will pay the overhead even if they never call fork() without exec(). 1.17x slower on logging_silent or unpickle_pure_python is a very expensive price :-( Pablo:
gc.freeze() has a similar issue, no? This function also comes from Instagram specific use case ;-) Steve:
From what I understood, we simply cannot share any object (mutable or not) between two subinterpreters. Otherwise, we will need to put a lock on all Py_INCREF/Py_DECREF operation which would kill performances of running multiple interpreters in parallel. Join bpo-39511 discussion. |
Not really because this patch is much more impacting. gc.freeze() moves objects into a permanent generation making them not participate in the GC so the only leaks are cycle-based. With this patch immortal object will leak every strong reference that they have even if they don't participate in cycles (but also if they participate in cycles). |
I would be more interested by an experiment to move ob_refcnt outside PyObject to solve the Copy-on-Write issue, especially to measure the impact on performances. |
That thread consists of everyone else telling you what I've just told you. Not sure that me telling you as well is going to help ;) |
Be mindful that synthetic benchmarks are probably a gentle case for branch prediction. Real-world performance on irregular workloads might be worse still (or perhaps better, who knows :-)). |
As a separate discussion, I would be interested to know whether the original use case (Eddie's) could be satisfied differently. It probably doesn't belong to this issue, though. (Eddie, if you want to discuss this, feel free to e-mail me privately. I think Davin Potts - co-maintainer of multiprocessing who recently added some facilities around shared memory - might be interested as well.) |
I think this is simply expected behavior if you choose to create immortal objects, and not really an issue. How could you have an immortal object that doesn't keep its strong references alive?
I think the last sentence here is not quite right. An immortalized object will never start participating in reference counting again after it is immortalized. There are two cases. If at the time of calling If the non-GC-tracked object is reachable from a GC-tracked object (I believe this is by far the more common case), then it will be immortalized. If it later becomes GC-tracked, it will start participating in GC (but the immortal bit causes it to appear to the GC to have a very high reference count, so GC will never collect it or any cycle it is part of), but that will not cause it to start participating in reference counting again.
I think the word "corrupted" makes this sound worse than it is in practice. What happens is just that the object is still effectively immortal (because the immortal bit is a very high bit), but the copy-on-write benefit is lost for the objects touched by old extensions.
Agreed. It seems the only way this makes sense is under an ifdef and off by default. CPython does a lot of that for debug features; this might be the first case of doing it for a performance feature?
It would certainly be interesting to see results of such an experiment. We haven't tried that for refcounts, but in the work that led to |
Well, "passed to an extension compiled with no-immortal headers" is an exception to this. But for the "not GC tracked but later becomes GC tracked" case, it will not re-enter reference counting, only the GC. |
I don't think it really breaks anything. What happens is that the immortal object appears to the GC to have a very large reference count, even after adjusting for within-cycle references. So cycles including an immortal object are always kept alive, which is exactly the behavior one should expect from an immortal object. |
I think there's other cases of performance related features being hidden under an ifdef. Computed gotos show up that way, although probably more because it's a compiler extension that's not supported everywhere. Pymalloc is also very similar in that it implies an ABI change as well. I wonder if it would be worth it to introduce an ABI flag for this as well? On the one hand is it a slightly different contract, on the other hand using extensions that don't support the immortalization actually work just fine. |
I think I didn't express myself very well: I am saying that I perceive this as a problem because when you immortalize the heap at a given time you will also immortalize any strong reference that these objects made afterwards. The immortal property will be "infecting" other objects as strong references are created.
Yeah, but the objects I mention will not be immortalized. These are objects that were not tracked when the immortalization is done and are not reachable from tracked objects at the moment and become tracked afterwards (for instance, some dictionary that only had immutable objects when the immortalization was done). And this is always very impacting: a single object that participates in refcounting will copy an entire page of memory. Although I agree this is a rare thing is a possible thing that can happen.
Yeah, I agree that corrupted may sound a bit more dramatic than it should but I think this is still a problem because when it happens (as mentioned before) it affects entire pages and not the single object that we consider.
But if said objects (isolated and untracked before and now tracked) acquire strong references to immortal objects, those objects will be visited when the gc starts calculating the isolated cycles and that requires a balanced reference count to work. |
There hasn't been much said about which kind of objects would be immortal. Could this be a creation-time option? Or a type object (similar to a no-op tp_dealloc)? If it's something that is known when the object is created, then some of the "infection" concern can be reduced - but if you wanted to arbitrarily make arbitrary objects immortal dynamically then it's a different matter. Another benefit of knowing at creation time is that immortal objects could be allocated to a different page (or potentially even to read-only shared memory, if that was appropriate). |
The current patch makes *all objects that are tracked by the gc* immortal, independently of their type (unless I am missing something). |
also, all the objects that are reachable from objects tracked by the gc. |
Marking everything as immortal/eternal after you've loaded your common code & data but before you do your forking is thenorm for this kind of serving system. You already presumably have a defined lifetime for the processes so they'll likely be set to die within N hours or days. Memory leaks of too many things persisting is a non-issue in this kind of system. The alternative of trying to pick and choose exactly what (and anything they reference) sticks around is a maintenance nightmare exercise in futility. |
This is the implementation of PEP683 Motivation: The PR introduces the ability to immortalize instances in CPython which bypasses reference counting. Tagging objects as immortal allows up to skip certain operations when we know that the object will be around for the entire execution of the runtime. Note that this by itself will bring a performance regression to the runtime due to the extra reference count checks. However, this brings the ability of having truly immutable objects that are useful in other contexts such as immutable data sharing between sub-interpreters.
…-104054) This also does some cleanup.
* main: (463 commits) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) pythongh-104057: Fix direct invocation of test_module (pythonGH-104059) pythongh-100458: Clarify Enum.__format__() change of mixed-in types in the whatsnew/3.11.rst (pythonGH-100387) pythongh-104018: disallow "z" format specifier in %-format of byte strings (pythonGH-104033) pythongh-104016: Fixed off by 1 error in f string tokenizer (python#104047) pythonGH-103629: Update Unpack's repr in compliance with PEP 692 (python#104048) pythongh-102799: replace sys.exc_info by sys.exception in inspect and traceback modules (python#104032) Fix typo in "expected" word in few source files (python#104034) pythongh-103824: fix use-after-free error in Parser/tokenizer.c (python#103993) pythongh-104035: Do not ignore user-defined `__{get,set}state__` in slotted frozen dataclasses (python#104041) pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030) pythongh-104036: Fix direct invocation of test_typing (python#104037) pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761) pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897) Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021) pythongh-88496: Fix IDLE test hang on macOS (python#104025) Improve int test coverage (python#104024) ...
* main: pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827) pythongh-65022: Fix description of tuple return value in copyreg (python#103892) pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526) pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962) pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744) pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999) pythonGH-103484: Fix redirected permanently URLs (python#104001) Improve assert_type phrasing (python#104081) pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998) pythonGH-103472: close response in HTTPConnection._tunnel (python#103473) pythongh-88496: IDLE - fix another test on macOS (python#104075) pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074) pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072) pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071)
* main: (760 commits) pythonGH-104102: Optimize `pathlib.Path.glob()` handling of `../` pattern segments (pythonGH-104103) pythonGH-104104: Optimize `pathlib.Path.glob()` by avoiding repeated calls to `os.path.normcase()` (pythonGH-104105) pythongh-103822: [Calendar] change return value to enum for day and month APIs (pythonGH-103827) pythongh-65022: Fix description of tuple return value in copyreg (python#103892) pythonGH-103525: Improve exception message from `pathlib.PurePath()` (pythonGH-103526) pythongh-84436: Add integration C API tests for immortal objects (pythongh-103962) pythongh-103743: Add PyUnstable_Object_GC_NewWithExtraData (pythonGH-103744) pythongh-102997: Update Windows installer to SQLite 3.41.2. (python#102999) pythonGH-103484: Fix redirected permanently URLs (python#104001) Improve assert_type phrasing (python#104081) pythongh-102997: Update macOS installer to SQLite 3.41.2. (pythonGH-102998) pythonGH-103472: close response in HTTPConnection._tunnel (python#103473) pythongh-88496: IDLE - fix another test on macOS (python#104075) pythongh-94673: Hide Objects in PyTypeObject Behind Accessors (pythongh-104074) pythongh-94673: Properly Initialize and Finalize Static Builtin Types for Each Interpreter (pythongh-104072) pythongh-104016: Skip test for deeply neste f-strings on wasi (python#104071) pythongh-104057: Fix direct invocation of test_super (python#104064) pythongh-87092: Expose assembler to unit tests (python#103988) pythongh-97696: asyncio eager tasks factory (python#102853) pythongh-84436: Immortalize in _PyStructSequence_InitBuiltinWithFlags() (pythongh-104054) ...
…ecoming immortal (pythonGH-105195) (cherry picked from commit a239272) Co-authored-by: Irit Katriel <[email protected]>
Any update here? What is the current status? |
I think we can close this issue because we have immortal instance already. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: