Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Task Subprocesses #627

Open
lightsighter opened this issue Oct 5, 2019 · 8 comments
Open

Support for Task Subprocesses #627

lightsighter opened this issue Oct 5, 2019 · 8 comments
Assignees
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Legion Issues pertaining to Legion planned Feature/fix to be actively worked on - needs release target Realm Issues pertaining to Realm

Comments

@lightsighter
Copy link
Contributor

This is our issue for tracking progress on support for task subprocesses. We want to run individual tasks in subprocess for two reasons. First, subprocesses give us isolation for terrible libraries (like OpenMP and OpenBLAS OpenMathLib/OpenBLAS#2146) and interpreters (like Python) that have unprotected global variables that are subject to races when multiple instances of the library or interpreter are used in the same process. By putting different tasks in subprocesses we know that these global variables will be isolated from each other). Second subprocesses give us the ability to mprotect memory so that tasks can only access the instances that they have mapped. This prevents them from corrupting other application data or runtime metadata structures. We are adding support for Realm to support multiple heaps and to run tasks in suprocesses. Legion needs to be modified to ensure that all data structures that it creates both for itself and application data structures end up in the proper heap.

@lightsighter lightsighter added enhancement Legion Issues pertaining to Legion Realm Issues pertaining to Realm labels Oct 5, 2019
@lightsighter lightsighter added question planned Feature/fix to be actively worked on - needs release target and removed question labels Oct 8, 2019
@streichler streichler added this to the 20.06 milestone Oct 9, 2019
@pmccormick
Copy link
Contributor

A similar but slightly different view (extension) on this would be to add the concept of a "lightweight" processor target that does not require/desire/have the resources needed to run the full runtime feature set. This could support aspects of a subprocess but could also the notion of a remote processor in a more traditional sense from distributed computing; geographically speaking vs. HPC's more tightly interconnected model. These processors should be "map-able" but that may need us to consider a slightly different set of semantics for handling them in the overall model -- perhaps only suitable as targets for leaf tasks? Restrictions could also force some rethinking of how tasks are "delivered" to such nodes -- RPC, JIT (avoiding the need for a fat binary across all systems in the map-able set). Some targets could only support a fixed set of tasks (appliance) but at the same time general purpose capabilities would certainly provide more flexibility...

@streichler
Copy link
Contributor

The piece of this that puts Legion's runtime metadata into an arena so that we can share it across subprocesses and/or reason about "lightweight" processors that can't see the metadata has been carved out into #690.

@streichler streichler added the best effort indicates the milestone tag for an issue is a goal rather than a commitment label Apr 1, 2020
@streichler streichler modified the milestones: 20.06, 20.09 May 31, 2020
@lightsighter
Copy link
Contributor Author

What is the story for background worker threads going to be for calling back into a Python interpreter? We have Python CFFI callback functions in Legate Dask that are invoked on background worker threads. How are they going to be able to specify which interpreter to use when calling PyGILState_Ensure and then doing their callbacks?

@streichler
Copy link
Contributor

What do these callbacks look like from the perspective of Realm? I can't think of any case right now where a Realm worker thread calls application-provided code (unless you're hiding stuff in reduction ops, which you shouldn't do).

@lightsighter
Copy link
Contributor Author

I should be precise, these are actually on utility processor threads, not sure if you count those as background worker threads or not. I could probably move these into a Realm C++ task if I had to, but then I would need the ability to run a Realm C++ task on a Python processor so that they can do their callback back into the interpreter.

@streichler
Copy link
Contributor

yes, these need to run on the python proc, and I'll make it possible to register C functions on the python proc

@lightsighter
Copy link
Contributor Author

Ok, let me know when you do that and I'll update the callbacks to run on the appropriate processor.

@lightsighter
Copy link
Contributor Author

Depends on #690 which is slated for the 21.06 release, so put this in the release following.

@lightsighter lightsighter modified the milestones: 20.12, 21.09 Dec 16, 2020
@streichler streichler removed this from the 21.09 milestone Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Legion Issues pertaining to Legion planned Feature/fix to be actively worked on - needs release target Realm Issues pertaining to Realm
Projects
None yet
Development

No branches or pull requests

4 participants