Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster access to per-interpreter globals. #692

Open
markshannon opened this issue Jul 30, 2024 · 4 comments
Open

Faster access to per-interpreter globals. #692

markshannon opened this issue Jul 30, 2024 · 4 comments

Comments

@markshannon
Copy link
Member

markshannon commented Jul 30, 2024

We recently saw a big performance regression on the telco benchmark when the decimal module was moved to multi-phase init.
Accessing state is now much slower than before.
Anecdotally, accessing a global now takes 7 dependent loads instead of 1. (@mdboom do you have a link for this?)

If we make the observation that we do not need per-module variables, but per interpreter ones, to replace (C) global variables, we can design an API that needs much fewer indirections.

This API is largely stolen from HPy with a few tweaks for better performance. https://docs.hpyproject.org/en/stable/api-reference/hpy-global.html

typedef struct { uintptr_t index } PyGlobal;
/* Declare a global */
#define PyGLOBAL_DECLARE(NAME) PyGlobal NAME = PY_GLOBAL_INIT;

/* Initialize global, this must be called at least once per-process.
 * This function is idempotent, so can be called whenever a module is loaded */
PyGlobal_Init(PyGlobal *name);

PyObject *PyGlobal_Load(PyGlobal name);
void PyGlobal_Store(PyGlobal name, PyObject *value);

Implementation

Each interpreter states has a reference to an array of PyObject * pointers.
PyGlobal_Init() initializes the global to so non-zero index and makes sure that each interpreter has a table large enough to store that index.
Then load and store can be implemented as follows

PyObject *
PyGlobal_Load(PyGlobal name)
{
      return Py_NewRef(_PyThreadState_GET()->globals_table[name.index]);
}

void
PyGlobal_Store(PyGlobal name, PyObject *value)
{
    PyObject **table = _PyThreadState_GET()->globals_table;
    PyObject *tmp = table[name.index];
    table[name.index] = Py_NewRef(value);
    Py_XDECREF(tmp);
}
@markshannon
Copy link
Member Author

The above API handles individual objects. We would need something different, but similar, if we want to handle arbitrary blocks of C data (e.g. module state).

For arbitrary data we can't use Py_DECREF, so we need to add a callback for cleanup. Which gives this API:

typedef struct { uintptr_t index; funcptr cleanup } PyGlobal;
/* Declare a global */
#define PyGLOBAL_DECLARE(NAME) PyGlobal NAME = PY_GLOBAL_INIT;

/* Initialize global, this must be called at least once per-process.
 * This function is idempotent, so can be called whenever a module is loaded */
PyGlobal_Init(PyGlobal *name, funcptr cleanup);

void *PyGlobal_GetData(PyGlobal name);
void PyGlobal_StoreData(PyGlobal name, void *data);
void*
PyGlobal_LoadData(PyGlobal name)
{
      return _PyThreadState_GET()->globals_table[name.index];
}

void
PyGlobal_StoreData(PyGlobal name, void *data)
{
    void **table = _PyThreadState_GET()->globals_table;
    void *tmp = table[name.index];
    table[name.index] = data
    name.cleanup(tmp);
}

@erlend-aasland
Copy link

@neonene
Copy link

neonene commented Aug 9, 2024

I'm not a fan of the TLS version of _PyThreadState_GET(), since it has made things slow down on non-linux OSes incl my Windows. (IIRC, get_state() in obmalloc.c has been one of the bottlenecks.)

@neonene
Copy link

neonene commented Aug 31, 2024

Also, please consider enhancing METH_METHOD C function calls: python/cpython#123500.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants