Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for getting the current context (and do we even want/need it) #268

Open
steve-s opened this issue Dec 2, 2021 · 5 comments
Open

API for getting the current context (and do we even want/need it) #268

steve-s opened this issue Dec 2, 2021 · 5 comments
Milestone

Comments

@steve-s
Copy link
Contributor

steve-s commented Dec 2, 2021

Edit: see summary of the discussion so far: #268 (comment)

Use-cases for an API that would get the current context:

  • legacy (temporary) code where the current context is not available
    • example: numpy C functions exposed to other packages -- unless the calling package is already converted HPy, it does not have a HPy context at hand
    • sharing internal helper functions between already ported and not yet ported code, e.g., for int myhelper(PyObject* x) one would create int myhelper_hpy(HPyContext *ctx, HPy x) used in the new HPy code and int myhelper(PyObject* x) would delegate to that
  • preserving API (long term)
    • pybind11
      • HPy "pure" solution: change the API so that it takes context where necessary (maybe not that many places?)
      • don't change the API, but then we need a way to access the current context
  • other
    • I remember discussion about tp_dealloc?

Some things to consider:

  • should the API acquire GIL (for Pythons with GIL) or assume it is acquired by the caller

Maybe we should first focus on the use-cases. Collect all the relevant use-cases and better understand them.


Some preliminary suggestions:

HPy_GetContext(): simple, but may look like it's OK to "cache" the context somewhere in some global state. May encourage not passing the context around as a parameter to helper functions and such, which should be the best practice.

HPyContext *ctx = HPy_EnterContext();
...work with the ctx...
HPy_LeaveContext(ctx);

Still simple, but I think it communicates that the context should be used for limited period of time. Allows the Python engine to dispose the context in case it's needed or to take whatever action it needs (unpin some objects, ...). Debug mode can check that the contexts created with this API are properly left, but not via the current debug context mechanism, it would have to be a different mechanism, because HPy_EnterContext does not dispatch on the context.

Trampolines:

// inside some regular HPy code, where context is available
mystate->trampoline = HPy_GetContextTrampoline(ctx);

// later on in some different code, where context is not available
mystate->trampoline(myhelper, arg);

// ...
void* myhelper(HPyContext *ctx, void* arg) { ... }

This is more complicated, and requires some boilerplate code, but that could be an advantage, because we want to discourage abuse of this API. Advantage of this approach is that even when we "out of thin air" need to call into HPy code, we are still calling via a trampoline connected with some context, so there does not have to be one global context, i.e., subinterpreters could run on the same thread and we can still distinguish in which subinterpreter we are supposed to run the HPy code. Disadvantage: there has to be some place where to store the trampoline.

All in all I think that the trampolines would be useful only if we wanted to some longer term "get current context" API. If the API is supposed to be available only in some "legacy" mode and discouraged everywhere, maybe we do not need such complex solution.

Alternative would be getting some token from a context and passing that to the HPy_EnterContext:

// inside some regular HPy code, where context is available
mystate->token = HPy_GetContextReference();

// later on in some different code, where context is not available
HPyContext *ctx = HPy_EnterContext(mystate->token);
...work with the ctx...
HPy_LeaveContext(ctx);
@steve-s
Copy link
Contributor Author

steve-s commented Feb 17, 2022

Taking the inspiration from JNI (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html), I have a new proposal: new structure with pointer to function to get the current context, e.g. HPyVM. One will be able to get an instance of HPyVM from HPyContext during module initialization and store it in a C global to be able to later on get the current context through it. Advantages:

  • no linking with some HPy_GetCurrentContext(void) symbol
  • HPy implementation can detect that there is another incompatible HPy implementation already loaded and can fail with meaningful message explaining what's going on

Some code skeleton to better illustrate this:

// in HPy header files:
struct HPyVM;
typedef HPyContext* (*open_context_t)(HPyVM *vm);
typedef void (*close_context_t)(HPyVM *vm, HPyContext *ctx);

typedef struct {
  uintptr_t _data;
  open_context_t open_context;
  close_context_t close_context;
} HPyVM;

HPyContext* HPyVM_OpenContext(HPyVM *vm) { return vm->open_context(vm); }
void HPyVM_CloseContext(HPyVM *vm, HPyContext *ctx) { return vm->close_context(vm, ctx); }

// Additional context function
void HPy_GetVM(HPyContext *ctx, HPyVM **storage) { ... }

// Implementation
void ctx_GetVM(HPyContext *ctx, HPyVM **storage) {
   if (*storage == NULL) {
      // not the initial value, check that there is no VM conflict
      if (*storage->_data != MY_MAGIC_NUMBER)
          HPyErr_Fatal(ctx, "cannot use HPy_GetVM with multiple incompatible Python runtimes within one process");
   } else {
      *storage = my_hpyvm_impl;
   }
}
   
// ------------------------
// ------------------------
// Usage:

HPyVM *hpy_vm = NULL;

static void for_example_some_callback_that_cannot_get_context_easily(void) {
   HPyContext *ctx = HPyVM_OpenContext(hpy_vm);
   // ...
   HPyVM_CloseContext(hpy_vm, ctx);
}

HPy_MODINIT("foo")
static HPy init_foo_impl(HPyContext *ctx) {
   HPy_GetVM(ctx, &hpy_vm);
   // ...
}

Maybe HPyVM should not be a pointer, but passed by value everywhere. It may be OK because it will be small and it would solve issues with management of the memory allocated for it.

The recommendation would still be to pass the context around if possible.

@hodgestar
Copy link
Contributor

We probably need to define more clearly when a context can re-used. E.g. This approach will likely break sub-interpreters because there will be global C state storing a single context, but each interpreter will have it's own. It will also potentially interfere with GIL-less operation within a single interpreter where one can imagine it might be very useful to have one context per thread.

This is already an issue for static types, but it would be good not to duplicate those issues in HPy.

@steve-s
Copy link
Contributor Author

steve-s commented Mar 4, 2022

We probably need to define more clearly when a context can re-used.

The idea was that you never "reuse" context. The context returned from HPyVM_OpenContext should be short-lived and explicitly closed by HPyVM_CloseContext. JVM even differentiates these two situations:

  • I know I am inside a Java thread, I just could not pass the context (JNIEnv) all the way to here (possibly some callback or such)
  • I could be in some thread that the Java VM is not aware of yet

Maybe this could be useful for us too. Or maybe we want to disallow running HPy API from non Python threads.

This approach will likely break sub-interpreters because there will be global C state storing a single context

The global state will be implementation detail of the given HPy implementation. I envisage a thread local global variable and if subinterpreters are migrated between threads, then the value of variable would have to be updated, otherwise the getting a context would be relatively cheap read from this global variable. Additionally, the "global" context could be pointing to different set of functions that may do some extra work or that use some other indirection.

@steve-s
Copy link
Contributor Author

steve-s commented May 4, 2023

Summary of the suggestions so far:

  1. Simple to use and implement, but dangerous and introduces link time dependency (mentioning it here just for completeness):
HPyContext *ctx = HPy_EnterContext();
...work with the ctx...
HPy_LeaveContext(ctx);
  1. Trampolines/closures: runtime gives user a closure that is callable from anywhere. HPyContext is required to create the closure, HPyContext will be not be required to call it, but the user code inside the closure would receive HPyContext (that is valid only for the duration of the closure call). Some possible shape of this API:
// inside some regular HPy code, where context is available
// trampoline is `void* (*trampoline_t)(void*);`
mystate->trampoline = HPy_GetContextTrampoline(ctx);

// later on in some different code, where context is not available
void* result = mystate->trampoline(myhelper, arg);

// ...
void* myhelper(HPyContext *ctx, void* arg) { ... }
  1. JNI inspired API: like 1), but removes the link time dependency:
HPyVM *hpy_vm = NULL;

static void for_example_some_callback_that_cannot_get_context_easily(void) {
   HPyContext *ctx = HPyVM_OpenContext(hpy_vm);
   // ...
   HPyVM_CloseContext(hpy_vm, ctx);
}

HPy_MODINIT("foo")
static HPy init_foo_impl(HPyContext *ctx) {
   HPy_GetVM(ctx, &hpy_vm);
   // ...
}
// more details in a comment below

Open questions:

  • will it be valid to call the closure/HPyVM_OpenContext from anywhere? Should we have some graceful error handling (or debug mode check) for places where it is not allowed to be called. Some situations that come to mind:
    • non-Python threads? -- how will it work with HPyGlobal?
    • destructors?
    • when embedding Python: after the interpreter is finished
  • what to do with a module state
    • another implicit argument to the closure/API to get "current" module context
    • what is the current module anyway, can we figure that out in all the situations we want to support to call the API to "get context".

Current module or any HPy handle can be argument to the closure creation function:

// inside some regular HPy code, where context is available
// trampoline is `void* (*trampoline_t)(void*);`
mystate->trampoline = HPy_GetContextTrampoline(ctx, my_handle);

// later on in some different code, where context is not available
// the runtime will not only pass HPyContext to myhlper, but also a handle
// to the Python object to which the "my_handle" was pointing to when the
// closure was created
void* result = mystate->trampoline(myhelper, arg);

// ...
void* myhelper(HPyContext *ctx, HPy handle, void* arg) { ... }

For now I think the API should return NULL as an indication of an error when called in place where it is not allowed and cause segfault in debug mode. Alternatively "release" mode behavior will be undefined in such case and debug mode behavior would be segfault with a nice message and a stacktrace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@hodgestar @steve-s @fangerer and others