Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate PyModuleDef to Each Interpreter for Extension/Builtin Modules #101758

Closed
ericsnowcurrently opened this issue Feb 9, 2023 · 10 comments
Closed
Assignees
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API topic-subinterpreters type-feature A feature request or enhancement

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Feb 9, 2023

Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each PyModuleDef is worth doing, especially if you consider we've already run into problems1 because of m_copy.

The main focus here is on PyModuleDef.m_base.m_copy2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).

(expand for more context)

PyModuleDef for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception is m_copy, which is problematic in some cases for modules imported in multiple interpreters.

Note that m_copy is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.

  • initialization
    • m_copy is only set by _PyImport_FixupExtensionObject() (and thus indirectly _PyImport_FixupBuiltin() and _imp.create_builtin())
    • _PyImport_FixupExtensionObject() is called by _PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loaded
  • usage
    • m_copy is only used in import_find_extension(), which is only called by _imp.create_builtin() and _imp.create_dynamic() (via the respective importers)

When such a legacy module is imported for the first time, m_copy is set to a new copy of the just-imported module's __dict__, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted from sys.modules and then imported), a new empty module is created and m_copy is [shallow] copied into that object's __dict__.

When m_copy is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case the m_copy from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.

The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old m_copy, but that object is "owned" by the first interpreter. This is a problem1.

Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When m_copy is copied into the new module's __dict__ on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its __dict__ filled with objects not owned by the correct interpreter.


Here are some possible approaches to isolating each module's PyModuleDef to the interpreter that imports it:

  1. keep a copy of PyModuleDef for each interpreter (would _PyRuntimeState.imports.extensions need to move to the interpreter?)
  2. keep just m_copy for/on each interpreter
  3. fix _PyImport_FixupExtensionObject() some other way...

Linked PRs

Footnotes

  1. see https://github.com/python/cpython/pull/101660#issuecomment-1424507393 2

  2. We should probably consider isolating PyModuleDef.m_base.m_index, but for now we simply sync the modules_by_index list of each interpreter. (Also, modules_by_index and m_index are only used for single-phase init modules.)

  3. specifically def->m_size == -1; multi-phase init modules always have def->m_size >= 0; single-phase init modules can also have a non-negative m_size 2

@encukou
Copy link
Member

encukou commented Feb 13, 2023

Well, this is the main reason why single-phase-init modules don't support multiple interpreters well.
Last time I touched this, it was a Pandora's box of worms, and the best solution we could come up with was PEP 489 itself -- designing multi-phase init, and starting to switch to that.
The situation might be better now, but, tread carefully. And note that lots of extensions might have other process-global state, so switching to the multi-phase init API is often the easy part of making things multi-interpreter-friendly.

ericsnowcurrently added a commit that referenced this issue Feb 14, 2023
The new test exercises the most important variants for single-phase init extension modules. We also add some explanation about those variants to import.c.

#101758
ericsnowcurrently added a commit that referenced this issue Feb 15, 2023
This change is almost entirely moving code around and hiding import state behind internal API.  We introduce no changes to behavior, nor to non-internal API.  (Since there was already going to be a lot of churn, I took this as an opportunity to re-organize import.c into topically-grouped sections of code.)  The motivation is to simplify a number of upcoming changes.

Specific changes:

* move existing import-related code to import.c, wherever possible
* add internal API for interacting with import state (both global and per-interpreter)
* use only API outside of import.c (to limit churn there when changing the location, etc.)
* consolidate the import-related state of PyInterpreterState into a single struct field (this changes layout slightly)
* add macros for import state in import.c (to simplify changing the location)
* group code in import.c into sections
*remove _PyState_AddModule()

#101758
ericsnowcurrently added a commit that referenced this issue Feb 15, 2023
…preters (gh-101920)

The test verifies the behavior of single-phase init modules when loaded in multiple interpreters.

#101758
ericsnowcurrently added a commit that referenced this issue Feb 16, 2023
@encukou
Copy link
Member

encukou commented Feb 16, 2023

#101919 removed _PyState_AddModule, which is part of the stable ABI. Please add it back.

@encukou
Copy link
Member

encukou commented Feb 16, 2023

Or can we be absolutely sure no stable ABI extension since Python 3.2 could use it?

@ericsnowcurrently
Copy link
Member Author

tl;dr I'm fine with adding it back (and will do so today).

Thanks for bringing this up! In my mind, it's exceptionally unlikely that anyone is using the function, but obviously not impossible. (more explanation below)

Also, I left this footnote on the PR:

I've also removed _PyState_AddModule() from the stable ABI manifest. It was part of "public" API since PEP 3121 was implemented 15 years ago (before a stable ABI existed) but was removed from limited API 6 years ago and then moved to internal API 4 years ago. Presumably it inadvertently slipped into the stable ABI manifest (with PEP 652) due to PC/python3dll.c.

("6 years ago" was in time for the 3.6 release, meaning the function was exposed as technically public API from 3.2 to 3.5.)

Let me also explain why it's unlikely anyone is using the function. It's an undocumented "private" function corresponding to a similarly named documented public function, where the only differences are that _PyState_AddModule() takes an explicit PyThreadState argument and PyState_AddModule() does a check for if the module is already added. However, it is certainly possible that someone out there is using it in an extension they built against Python 3.2-3.5. I'm always surprised (but shouldn't be any more) by the otherwise-highly-unlikely (and inadvisable) ways in which users use CPython, especially the C-API. 😄

I'll also point out that PEP 384 says "All functions starting with _Py are not available to applications.", which I'm guessing is why @serhiy-storchaka moved _PyState_AddMethod() to Include/cpython/pystate.h in 2016 (gh-71087). I'm sure this point from PEP 384 has been covered since then (e.g. by the PEP 652 discussion and the recent discussion about "unstable" API) but at the least it is clear what the intention was with the stable ABI.

All that said, while there's a limit to how conservative we can be with the stable ABI, from a practicality perspective, in this case there really is no motivation to remove this function from the stable ABI other than to clean up. So I don't mind adding it back. I'll do that today.

ericsnowcurrently added a commit that referenced this issue Feb 16, 2023
We're adding the function back, only for the stable ABI symbol and not as any form of API. I had removed it yesterday.

This undocumented "private" function was added with the implementation for PEP 3121 (3.0, 2007) for internal use and later moved out of the limited API (3.6, 2016) and then into the internal API (3.9, 2019). I removed it completely yesterday, including from the stable ABI manifest (where it was added because the symbol happened to be exported). It's unlikely that anyone is using _PyState_AddModule(), especially any stable ABI extensions built against 3.2-3.5, but we're playing it safe.

#101758
ericsnowcurrently added a commit that referenced this issue Feb 17, 2023
…h-101969)

gh-101891 is causing failures under `$> ./python -m test test_imp -R 3:3`.  Furthermore, with that fixed, "test_singlephase_variants" is leaking references.  This change addresses the first part, but skips the leaking tests until we can follow up with a fix.

#101758
@gpshead
Copy link
Member

gpshead commented Feb 17, 2023

Are the test_imp failures in test_singlephase_variants tests on the Refleaks buildbots related to this work?

@ericsnowcurrently
Copy link
Member Author

Yeah. The failures should be fixed now though.

@ericsnowcurrently
Copy link
Member Author

To be clear, I've skipped the leaky/broken tests in test_imp. If there are other refleak failures, they should not be related to the work I've been doing.

@encukou
Copy link
Member

encukou commented Feb 20, 2023

So I don't mind adding it back. I'll do that today.

Thank you!
Sorry for my terse comments before, I was short on time. I'll add some general thoughts.

Let me also explain why it's unlikely anyone is using the function

I do think it's possible that nothing uses this, and maybe we can even prove nothing does, but proving it is harder than keeping the function.
And if we're wrong, we might not hear about it -- some engineer with a proprietary codebase will grumble about Python not keeping promises, and stay on an old CPython version.

Anyway, here's a plausible breaking scenario: something like a language binding gets the list of exposed ABI symbols, and re-exports all of them, regardless of whether they're usable. Removing any symbol will break them.

I'll also point out that PEP 384 says "All functions starting with _Py are not available to applications.",

Yeah, but the trouble is that if they were called e.g. by a public macro, they need to stay in the ABI. Who knows how it's been used over the years -- it's possible to find out, but the archaeology is usually more trouble than keeping the function in.

For _PyState_AddMethod specifically, it does look like we didn't keep the stable ABI promise in the past. But that's no reason to do it again.

there's a limit to how conservative we can be with the stable ABI,

FWIW, in cases like these, turning the function into a stup that always raises is acceptable (as a last resort of course), since that doesn't technically break the ABI. (It would break specific code paths that the new CPython presumably can't support, rather than the entire extension being not importable due to a missing symbol.)

@serhiy-storchaka
Copy link
Member

Did you meant _PyState_AddModule? There is no _PyState_AddMethod.

Note that the current _PyState_AddModule differs from _PyState_AddModule in 3.2.

In 3.2:

PyAPI_FUNC(int) _PyState_AddModule(PyObject*, struct PyModuleDef*);

Current:

PyAPI_FUNC(int) _PyState_AddModule(PyThreadState*, PyObject*, PyModuleDef*);

I am sure _PyState_AddModule was not in the stable ABI defined in PEP 384. The fact that its declaration was visible to users was a bug, fixed years ago. And there were no any complains about breaking programs.

@ericsnowcurrently
Copy link
Member Author

There really isn't much we can do to solve this, so I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API topic-subinterpreters type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

4 participants