Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on executables embedded with 3.13 #113055

Open
neonene opened this issue Dec 13, 2023 · 20 comments
Open

Memory leak on executables embedded with 3.13 #113055

neonene opened this issue Dec 13, 2023 · 20 comments
Labels
pending The issue will be closed if no feedback is provided topic-subinterpreters type-bug An unexpected behavior, bug, or error

Comments

@neonene
Copy link
Contributor

neonene commented Dec 13, 2023

Bug report

Bug description:

Since 67807cf, _testembed.exe (x64 Release) increases the memory usage in the cycle of Py_Initialize and Py_Finalize.

I monitored the Task Manager (taskmgr.exe) on Windows10/11, with a change against Programs/_testembed.c:

- #define INIT_LOOPS 4
+ #define INIT_LOOPS 200

Commands:

  • _testembed.exe test_repeated_simple_init or
  • python -m test test_embed -m test_simple_initialization_api (no log)

With RADIX TREE (as-is)

Loop 67807cf main 3.11.7
100 67MB 210MB 4MB
200 135MB 406MB 4MB

No RADIX TREE (as-is)

Loop 67807cf main 3.11.7
100 10MB 168MB 4MB
200 16MB 336MB 4MB

Recent obmalloc.c looks ready for finalization. Just adding my rough (invalid?) experiment made the leaks even. So, I hope that they share the same issue. Otherwise, ea2c001 or 15d4c9f has another leak. #98359 (comment)

patch
void
_PyInterpreterState_FinalizeAllocatedBlocks(PyInterpreterState *interp)
{
    if (has_own_state(interp)) {
        Py_ssize_t leaked = _PyInterpreterState_GetAllocatedBlocks(interp);
        assert(has_own_state(interp) || leaked == 0);
        interp->runtime->obmalloc.interpreter_leaks += leaked;

+       OMState *state = &interp->obmalloc;
+       for (uint i = 0; i < maxarenas; ++i) {
+           if (allarenas[i].address == 0) {
+               continue;
+           }
+           _PyObject_Arena.free(_PyObject_Arena.ctx,
+                                (void *)allarenas[i].address, ARENA_SIZE);
+           allarenas[i].address = 0;
+           --narenas_currently_allocated;
+       }
+       PyMem_RawFree(allarenas);
    }
}

With RADIX TREE (patched)

Loop 67807cf main
100 42MB 40MB
200 80MB 80MB

No RADIX TREE (patched)

Loop 67807cf main
100 3.5M 3.3M
200 3.7M 3.4M

cc @ericsnowcurrently @nascheme

CPython versions tested on:

3.12, 3.13, CPython main branch: 3aea6c4

Operating systems tested on:

Windows

Linked PRs

@neonene neonene added the type-bug An unexpected behavior, bug, or error label Dec 13, 2023
@nascheme
Copy link
Member

That doesn't look so nice, I can investigate. How are you measuring memory usage and what's the OS? In the previous versions of Python the obmalloc radix tree data structures were in the BSS and so most of RAM used by them was actually virtual and not real. I'm not sure if the move to interpreter state structures lost that behaviour but it could be cause for an increase in (real) memory use. Maybe there is also a memory leak there, e.g. malloc without free.

@neonene
Copy link
Contributor Author

neonene commented Dec 13, 2023

How are you measuring memory usage and what's the OS?

Only the Task Manager on Windows 10 and 11 for me.
The microsoft.com offers VMMap as a small alternative.

@neonene neonene changed the title Memory leak on embedded Windows executable with 3.12 and later Memory leak on Windows executable embedded with 3.12 and later Dec 15, 2023
@nascheme
Copy link
Member

@ericsnowcurrently

I spent some time looking into this today. I suspect the fix is not going to be simple, unfortunately. What was previously global variables in the obmalloc.c module has become part of the PyInterpreterState structure. When an interpreter is freed, memory allocated by obmalloc doesn't get freed. I think that's the leak you are seeing. Previous to 67807cfc it didn't get freed either but it was global across interpreters, got allocated only once and didn't grow.

A possible way to fix this: when free_interpreter() gets called, walk through obmalloc arenas and free them. The pymalloc_print_stats() function is an example of how to walk over the arenas. The obmalloc radix tree nodes would also have to be walked and deallocated.

If you free the arenas and there is allocated blocks inside of them, you can't call pymalloc_free() on those memory blocks. It is maybe not a problem since the interpreter is being destroyed. If an object is passed from one interpreter to another, that would definitely break. If you want to share objects across interpreters, they would need to be allocated from a memory allocator that's global across interpreters.

@nascheme
Copy link
Member

nascheme commented Dec 16, 2023

Building gh-113218 with the address sanitizer and then running ./Programs _testembed test_repeated_simple_init shows it doesn't leak anymore. I'm not sure my fix is totally correct but I think it's something close to what we need to fix this.

@neonene neonene changed the title Memory leak on Windows executable embedded with 3.12 and later Memory leak on executables embedded with 3.12 and later Dec 16, 2023
@neonene
Copy link
Contributor Author

neonene commented Dec 17, 2023

_testembed test_repeated_simple_init shows it doesn't leak anymore

I can also personally confirm that on Windows, including the 32-bit version of Python.

@nascheme
Copy link
Member

I think I've come up with a better fix. See gh-113412.

@neonene
Copy link
Contributor Author

neonene commented Dec 23, 2023

ea2c001 has a memory leak to be fixed, which is found at gh-113412.
I've seen a few reports about likely suspects: #110411 (comment), #113190 (comment)

@nascheme
Copy link
Member

Your test_repeated_simple_init case leaks on Linux as well. With gh-113412 the leak is slower but it leaks without bound, it seems. Using valgrind, I suspect it (at least partially) due to immortal interned strings. I generated the trace with:

PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=all --log-file=valgrind.log --num-callers=20 ./Programs/_testembed test_repeated_simple_init

Trace log:

https://gist.github.com/nascheme/fac2dd566f09fe3d75c98134f18c6170

E.g.


==1275635== 502,840 bytes in 9,800 blocks are definitely lost in loss record 421 of 422
==1275635==    at 0x48407B4: malloc (vg_replace_malloc.c:381)
==1275635==    by 0x2A0F00: _PyMem_RawMalloc (obmalloc.c:56)
==1275635==    by 0x2A2568: PyObject_Malloc (obmalloc.c:917)
==1275635==    by 0x2EF536: PyUnicode_New (unicodeobject.c:1238)
==1275635==    by 0x2FBAC3: unicode_decode_utf8 (unicodeobject.c:4701)
==1275635==    by 0x2FC017: PyUnicode_DecodeUTF8Stateful (unicodeobject.c:4834)
==1275635==    by 0x2F0E38: PyUnicode_FromString (unicodeobject.c:1893)
==1275635==    by 0x3135D9: PyUnicode_InternFromString (unicodeobject.c:14943)
==1275635==    by 0x289EBA: PyObject_SetAttrString (object.c:1062)
==1275635==    by 0x28504F: _add_methods_to_object (moduleobject.c:174)
==1275635==    by 0x2857D1: PyModule_FromDefAndSpec2 (moduleobject.c:384)
==1275635==    by 0x3FB5C6: create_builtin (import.c:1396)
==1275635==    by 0x4002F6: _imp_create_builtin (import.c:3400)
==1275635==    by 0x284483: cfunction_vectorcall_O (methodobject.c:512)
==1275635==    by 0x21B736: _PyVectorcall_Call (call.c:273)
==1275635==    by 0x21B94C: _PyObject_Call (call.c:348)
==1275635==    by 0x21BA23: PyObject_Call (call.c:373)
==1275635==    by 0x3856F4: _PyEval_EvalFrameDefault (generated_cases.c.h:1250)
==1275635==    by 0x37E2B1: _PyEval_EvalFrame (pycore_ceval.h:115)
==1275635==    by 0x3AA774: _PyEval_Vector (ceval.c:1771)

And


==1275635== 339,080 bytes in 6,920 blocks are definitely lost in loss record 420 of 422
==1275635==    at 0x48407B4: malloc (vg_replace_malloc.c:381)
==1275635==    by 0x2A0F00: _PyMem_RawMalloc (obmalloc.c:56)
==1275635==    by 0x2A2568: PyObject_Malloc (obmalloc.c:917)
==1275635==    by 0x2EF536: PyUnicode_New (unicodeobject.c:1238)
==1275635==    by 0x2F11FE: _PyUnicode_FromUCS1 (unicodeobject.c:2025)
==1275635==    by 0x2F16F9: PyUnicode_FromKindAndData (unicodeobject.c:2096)
==1275635==    by 0x414B4E: r_object (marshal.c:1157)
==1275635==    by 0x414DC0: r_object (marshal.c:1222)
==1275635==    by 0x415AC6: r_object (marshal.c:1399)
==1275635==    by 0x414DC0: r_object (marshal.c:1222)
==1275635==    by 0x415A8A: r_object (marshal.c:1393)
==1275635==    by 0x416028: read_object (marshal.c:1524)
==1275635==    by 0x416339: PyMarshal_ReadObjectFromString (marshal.c:1641)
==1275635==    by 0x3FCDF0: unmarshal_frozen_code (import.c:2062)
==1275635==    by 0x4008B1: _imp_get_frozen_object_impl (import.c:3575)
==1275635==    by 0x3F8E6B: _imp_get_frozen_object (import.c.h:282)
==1275635==    by 0x284098: cfunction_vectorcall_FASTCALL (methodobject.c:425)
==1275635==    by 0x21B736: _PyVectorcall_Call (call.c:273)
==1275635==    by 0x21B94C: _PyObject_Call (call.c:348)
==1275635==    by 0x21BA23: PyObject_Call (call.c:373)

@nascheme
Copy link
Member

The following change seems to greatly reduce the leaks. The _Py_hashtable_set(INTERNED_STRINGS, s, s) call looks like a problem to me.

--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -14870,6 +14870,7 @@ _PyUnicode_InternInPlace(PyInterpreterState *interp, PyObject **p)
         return;
     }
 
+#if 0
     /* Look in the global cache first. */
     PyObject *r = (PyObject *)_Py_hashtable_get(INTERNED_STRINGS, s);
     if (r != NULL && r != s) {
@@ -14885,6 +14886,7 @@ _PyUnicode_InternInPlace(PyInterpreterState *interp, PyObject **p)
         }
         return;
     }
+#endif
 
     /* Look in the per-interpreter cache. */
     PyObject *interned = get_interned_dict(interp);

@neonene
Copy link
Contributor Author

neonene commented Dec 24, 2023

My memory leak seems to come from _PyUnicode_ClearInterned() when the Py_DEBUG flag is not specified. I'm not sure of the root cause, though.

cpython/Objects/unicodeobject.c

Lines 14952 to 14973 in 0c57454

_PyUnicode_ClearInterned(PyInterpreterState *interp)
{
PyObject *interned = get_interned_dict(interp);
if (interned == NULL) {
return;
}
assert(PyDict_CheckExact(interned));
/* TODO:
* Currently, the runtime is not able to guarantee that it can exit without
* allocations that carry over to a future initialization of Python within
* the same process. i.e:
* ./python -X showrefcount -c 'import itertools'
* [237 refs, 237 blocks]
*
* Therefore, this should remain disabled for until there is a strict guarantee
* that no memory will be left after `Py_Finalize`.
*/
#ifdef Py_DEBUG
/* For all non-singleton interned strings, restore the two valid references
to that instance from within the intern string dictionary and let the
normal reference counting process clean up these instances. */

@neonene
Copy link
Contributor Author

neonene commented Dec 29, 2023

For the record, my memory gain reduced from 2M bytes to 200K bytes per cycle with gh-113412 (non-debug).

Regarding the crash after re-initialization due to the full cleanup by _PyUnicode_ClearInterned():

main
(leaks)
#113218 #113412
datetime (current) crash crash crash
datetime (isolated)
ctypes crash crash crash
_ctypes crash? crash
_lsprof
faulthandler
tracemalloc
_testcapi crash

A dangling pointer in a non-isolable (small) extension could be worked around by making the interned string statically allocated, if any.

Original issue: #100911

@nascheme
Copy link
Member

My previous comment about disabling the INTERNED_STRINGS global hash table was incorrect. I must have built with Py_DEBUG enabled and so I erroneously thought it fixed the leak. You seem to be correct about _PyUnicode_ClearInterned(). There is this comment:

    /* TODO:
     * Currently, the runtime is not able to guarantee that it can exit without
     * allocations that carry over to a future initialization of Python within
     * the same process. i.e:
     *   ./python -X showrefcount -c 'import itertools'
     *   [237 refs, 237 blocks]
     *
     * Therefore, this should remain disabled for until there is a strict guarantee
     * that no memory will be left after `Py_Finalize`.

If Py_DEBUG is enabled, then that function will free interned strings with the flag SSTATE_INTERNED_IMMORTAL. That comment was from Eddie's immortal refcnt change (gh-19474, ea2c001).

@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented Dec 31, 2023

Yes, I believe a good portion of the memory leaks are coming from not cleaning up interned strings. This was intentional given that during the time that gh-19474 was merged, there were PyObjects leaks that would be referenced in the next Py_Initialize. Therefore, we couldn't safely clean up these interned strings.

Given that these PyObjects leaks have now been largely fixed, it should be safe to re-enable this code: gh-113601.

cc @ericsnowcurrently

@ericsnowcurrently
Copy link
Member

Context:

(expand)

runtime state and lifecyle

  • CPython's runtime relies on a large volume of global state
  • some global runtime state is statically initialized, but most is initialized when the runtime is (by Py_Initialize()) or is set lazily on first use
  • during runtime finalization (Py_Finalize()) most global runtime state is cleaned up, with resources like memory/files released
  • ideally all resources would be released, especially all allocated memory should be freed (but we haven't been rigorous about it)
  • i.e. the state of the process after finalization should be as close as possible to what the state was before initialization
  • any state that is not cleaned up during finalization effectively leaks
  • likewise, any resources not released during finalization effectively leak (e.g. the blocks of memory obmalloc had gotten from the system)
  • for normal Python usage (i.e. running the "python3" executable) this is not a problem since such leaks only lasts the short time between finalization and when the process terminates
  • it's a different story for embedders

embedders:

  • embedders (i.e. very advanced users) are able to execute arbitrary C code (unrelated to the C-API) both before runtime initialization and after finalization
  • they are also able to go through the initialize/finalize cycle as many times as they want, again with arbitrary C code in between
  • we make a reasonable effort to accommodate such usage (though there are plenty of use cases that need to be ironed out)
  • the state that we don't clean up, and resources we don't release, during finalization impact such users--particularly when we leak memory
  • pre-3.12:
    • some of that un-finalized runtime state was only initialized during the first runtime initialization (e.g. statically initialized globals) and never re-initialized during subsequent runtime initialization
    • thus, such state from the first init would be used as-is after the subsequent init
    • any un-released resources managed by such state would likewise be used as-is after the subsequent init
  • 3.12+:
    • all global runtime state is initialized from scratch during each runtime initialization
    • any state/resources that finalization leaks will not be used after a subsequent init; it will genuinely leak

interpreter lifecycle:

  • the runtime also relies on a fair amount of state specific to the execution context, e.g. PyInterpreterState and PyThreadState (which have been around for well over 20 years)
  • interpreter init/fini mostly follows the same pattern as the global runtime
  • the leak situation applies just as much to interpreters
  • however, we're more thorough about interpreter finalization since they are essentially heap-allocated rather than stored in C globals (with, technically, the exception of the main interpreter in 3.12+)
  • the main interpreter (often the only interpreter) is a special case since it is initialized during global runtime init, finalized during global runtime fini, and has a few unique jobs

consolidating global runtime state

  • over the years, the CPython code base had accrued many thousands of C global variables holding the global runtime state
  • in 2014, when I started working on my multi-core Python project (e.g. eventually PEP 684), I realized all that global runtime state was my main obstacle
  • having global variables spread throughout the code base made finding a solution challenging
  • so in 2017 I created the _PyRuntimeState with the intention of consolidating (pretty much) all the global state there
  • consolidation had a number of other benefits that made the effort justifiable outside my multi-core Python project
  • in 2017, when I added _PyRuntimeState, I also moved an initial set of key runtime state there
  • since then I've been slowly eliminating all the C global variables
  • that effort mostly completed in late 2022
  • pre-3.12:
    • unconsolidated state was initialized the same as it always had been
    • consolidated state was initialized the same way as it had been, but against a struct field instead of a global variable
  • 3.12+:
    • in 2022(?) I introduced a static initializer for _PyRuntime and, on any runtime init after the first one, applied that initializer on _PyRuntime dynamically
    • that made it so that the global runtime state is completely reset during every initialization (any resources leaked by fini would no longer be re-used by a subsequent init)
    • the main interpreter is now embedded in _PyRuntimeState and so is statically allocated and (partially) statically initialized with the rest of _PyRuntime

interpreter isolation

  • the solution I settled on in 2014 for my multi-core Python project was to move the GIL to each interpreter (PEP 684)
  • doing so required that we isolate interpreters from each other as much as possible
  • mostly, this meant following through with consolidating the C global variables and then moving nearly all of them down into each interpreter (PyInterpreterState)
  • we started doing so fairly early on (2018?) and were mostly done by 2022, but I saved some key global state for last (e.g. obmalloc, the GIL)
  • the situation with leaks in global runtime fini translated down to interpreter fini as we moved the relevant state down (e.g. obmalloc)
  • the situation for the main interpreter is mostly the same since its lifecycle is so closely coupled with the global runtime's lifecycle
  • for subinterpreters the leaks are strictly the genuine variety since they are dynamically allocated, freed during finalization, and thus never re-used

immortal objects (PEP 683):

  • the addition of immortal objects in 3.12 solved some isolation problems for us, particularly relative to the static builtin types
  • any object may be made immortal, whether statically allocated or heap-allocated
  • lifetime:
    • the lifetime of an "immortal" object isn't necessarily bound to the process; it may be viable only in the current interpreter or the current runtime
    • using an immortal object outside its supported execution context will result in a segfault (access violation) or
    • however, none of that isn't explicitly detailed anywhere nor supported by the C-API
  • we've made all interned strings immortal (but haven't been cleaning them up during finalization)

extension modules

global state and interpreter isolation:

  • extension modules may store data/objects in C global variables (including indirectly, e.g. an object stored in a dict which is stored in the struct field of a static variable)
  • static types actually fall into this category, so the situation is relatively widespread in the community
  • the data/objects in those globals often end up shared between interpreters, meaning the extension violates interpreter isolation
  • this only matters when subinterpreters are being used (limited currently but potentially much more in the future)
  • the addition of heap types (PEP 384) helped with the case of static types, though the transition has been slow (partly due to performance issues that have been worked out only recently)
  • the addition of module state (PEP 3121) also helped support per-interpreter isolation, but the transition has likewise been slow (since there wasn't much motivation for people to do it)

single-phase init ("legacy" extensions):

  • the extension init function returns the initialized module object
  • there is no expectation/requirement that the extension define heap types or use module state or otherwise avoid using C globals
  • the init function must take care of the extension's PyModuleDef
  • the extension has no opportunity to clean up after itself, neither when the module object is destroyed nor during interpreter/runtime finalization
  • we never close the extension module's handle (dlclose(), FreeLibrary())
  • trying to import the module in an isolated (per-interpreter GIL) subinterpreter causes an ImportError

multi-phase init:

  • multi-phase init (PEP 489, inspired partly by PEP 451) mostly takes us the rest of the way to per-interpreter isolation
  • the extension init function returns the module's PyModuleDef, which the import machinery then uses to load the module
  • by implementing multi-phase init, the extension promises that it will not store any state/objects in C globals, but will define only heap types and only use module state
  • we still never close the extension module's handle (dlclose(), FreeLibrary())

challenges:

  • there is no way to tell if an extension is multi-phase init or single-phase init without calling the init function (e.g. there is no distinct "SOABI" tag for a distinct filename)
  • the init function can allocate memory, create/store objects, start threads, and register callbacks (incl. in C libraries, e.g. readline)
  • multi-phase init functions are not supposed to do any of that but we have no guarantees
  • we cannot blindly close an extension module's handle (especially single-phase init) since it might have stored allocated data or an object in a C global, might have started a thread running some of its C code, or might have registered a C callback
  • an extension (even multi-phase init) may use a problematic C/etc. library:
    • a library that stores state (incl. callbacks) in global variables, thereby breaking interpreter isolation
    • a library that isn't thread-safe (e.g. relative to temporary state stored in static variables), since there may be races under a per-interpreter GIL
  • static types are only partially cleaned up during finalization (e.g. they sometimes(?) hold on to some objects)
  • loading the first time only in the main interpreter might mitigate some of the problems

obmalloc state

  • we have not (ever?) been freeing the system-allocated pages (stored in the obmalloc state) during runtime finalization
  • pre-3.12
    • obmalloc state was initialized statically and runtime finalization did not free the pages
    • all init/fini cycles would share the global obmalloc state and thus system-allocated blocks of memory, so even in embedding scenarios those blocks would mostly not "leak"
    • however, any objects still alive after each fini would never be deallocated, essentially leaking (except where the object was stashed away in a global variable, e.g. inside a static type)
  • 3.12+
    • early in the globals consolidation effort (2018?), I moved the obmalloc state from static global variables to _PyRuntimeState, but it caused problems so we reverted the change
    • in late 2022 I finally circled back and moved the obmalloc state to the global runtime state, in _PyRuntimeState.obmalloc (67807cf)
    • at that point, more or less, we started initializing the obmalloc state from scratch at each init--one global obmalloc state is no longer shared across init/fini cycles and all memory pages from each cycle would fully leak
    • in early 2023, not long before the feature freeze, I isolated the obmalloc state to each interpreter, in PyInterpreterState.obmalloc (df3173d)
    • at this point each interpreter was now leaking (after interp fini) all pages it got from the system (the situation for the main interpreter didn't change much since its lifetime is so coupled to the global runtime's)

With all that in mind, here's where I understand we're at now:

  • the problem started in 3.12.0
  • moving the obmalloc state from static globals to _PyRuntimeState alone would have preserved the pre-3.12 status quo as long as there was a single process-global static initialization
  • however, we started massively leaking between init/fini cycles once we started forcibly initializing it during every global init
  • the move to PyInterpreterState is similarly problematic but only when using subinterpreters
  • simply freeing the system-allocated blocks during finalization isn't sufficient because any extension modules might be holding on to an object that doesn't get replaced during the next init/fini cycle (e.g. a str object stored somewhere in a static type
  • this is all exacerbated by the fact that we can't properly clean up single-phase init modules
  • it's also problematic that interned strings aren't getting properly cleaned up

From there, something along the lines of what @nascheme has proposed seems reasonable. We also need to clean up interned strings as @eduardo-elizondo has proposed.

It would be nice if we could be a bit more proactive about cleaning up single-phase init modules (at the least static types defined therein), but that might not be feasible.

One other option: use a process-global allocator for immortal, immutable objects (e.g. str) we know have process-lifetime. However, that seems like overkill for accommodating a use case (single-phase init modules) that we expect to eventually go away.

@neonene
Copy link
Contributor Author

neonene commented Jan 25, 2024

Regarding interned strings cleanup, see also:

nascheme added a commit that referenced this issue Jan 27, 2024
For interpreters that share state with the main interpreter, this points
to the same static memory structure.  For interpreters with their own
obmalloc state, it is heap allocated.  Add free_obmalloc_arenas() which
will free the obmalloc arenas and radix tree structures for interpreters
with their own obmalloc state.

Co-authored-by: Eric Snow <[email protected]>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
)

For interpreters that share state with the main interpreter, this points
to the same static memory structure.  For interpreters with their own
obmalloc state, it is heap allocated.  Add free_obmalloc_arenas() which
will free the obmalloc arenas and radix tree structures for interpreters
with their own obmalloc state.

Co-authored-by: Eric Snow <[email protected]>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
)

For interpreters that share state with the main interpreter, this points
to the same static memory structure.  For interpreters with their own
obmalloc state, it is heap allocated.  Add free_obmalloc_arenas() which
will free the obmalloc arenas and radix tree structures for interpreters
with their own obmalloc state.

Co-authored-by: Eric Snow <[email protected]>
@neonene neonene changed the title Memory leak on executables embedded with 3.12 and later Memory leak on executables embedded with 3.13 Oct 15, 2024
@neonene
Copy link
Contributor Author

neonene commented Oct 15, 2024

I think this can be closed after 3203a74 is backported to 3.13. #113190 is opened for 3.12.

@erlend-aasland
Copy link
Contributor

I think this can be closed after 3203a74 is backported to 3.13. #113190 is opened for 3.12.

What about #113218 then, @nascheme?

@nascheme
Copy link
Member

GH-118618 would fix the most serious memory leak in 3.12. For the last bugfix release of 3.12, @Yhg1s decided it was too large/disruptive to merge a backport and so that PR is closed. Assuming we are not changing our minds on merging that, I think the memory leak due to the obmalloc state will remain unfixed in 3.12. In theory there might be a less invasive and simpler fix for 3.12, I'd have to research that though.

It's getting confusing to keep track of these embedding bugs and what fixes have been backported. There are two classes of them: memory leaks and crashers. The crashers are the more serious ones. I believe we have fixes the known crashers (see GH-124865, GH-124536, GH-125314). For the memory leaks, the obmalloc state one is the worst. The leaks due to interned strings also depend on the mortal string change (49f6beb). Running the 3.12 head with ASAN enabled would show what's still leaking.

In the long term, i think it would be nice to have CPython running with no ASAN warnings. Then we have a build-bot to ensure we don't regress on that.

@Yhg1s
Copy link
Member

Yhg1s commented Oct 21, 2024

FYI, 3.12.7 was not the last bugfix release of 3.12. We have scheduled bugfix releases of 3.12 until April 2025.

@vstinner
Copy link
Member

I cannot reproduce this issue anymore on the main branch:

diff --git a/Programs/_testembed.c b/Programs/_testembed.c
index d15dd519dbf..3f9455a197f 100644
--- a/Programs/_testembed.c
+++ b/Programs/_testembed.c
@@ -38,7 +38,7 @@ char **main_argv;
 #define PROGRAM_NAME L"./_testembed"
 #define PROGRAM_NAME_UTF8 "./_testembed"
 
-#define INIT_LOOPS 4
+#define INIT_LOOPS 100
 
 // Ignore Py_DEPRECATED() compiler warnings: deprecated functions are
 // tested on purpose here.
@@ -220,6 +220,10 @@ static int test_repeated_simple_init(void)
         _testembed_Py_Initialize();
         Py_Finalize();
         printf("Finalized\n"); // Give test_embed some output to check
+
+        char cmd[1000];
+        sprintf(cmd, "grep ^VmRSS /proc/%i/status", getpid());
+        system(cmd);
     }
     return 0;
 }

Output:

--- Loop #1 ---
Finalized
VmRSS:	    9424 kB
--- Loop #2 ---
Finalized
VmRSS:	    9932 kB
--- Loop #3 ---
Finalized
VmRSS:	    9952 kB
(...)
--- Loop #98 ---
Finalized
VmRSS:	    9588 kB
--- Loop #99 ---
Finalized
VmRSS:	    9588 kB
--- Loop #100 ---
Finalized
VmRSS:	    9576 kB

Since 3.12 branch is not going to be fixed, can we close the issue?

@picnixz picnixz added the pending The issue will be closed if no feedback is provided label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided topic-subinterpreters type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests

8 participants