-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
realm: unify all the various "application cpu processors" #680
Comments
Another very important use case for this issue is having Python processors with a GPU property so that we know they have an associated CUDA context with them. |
Yes, please. I would need this to run e.g. Numba with CUDA support on a Python processor. |
I had a little bit of insight today on this issue that is worth documenting. While I think we should definitely move in the direction of reducing our processor kinds down, I don't think we can quite get to the place where there is only a single "application" processor kind which has lots of different properties. Specifically, I think we still need processors that name hardware computational resources. So the processor kinds would be things like "CPU", "GPU", "TPU", etc. The "properties" of these processors will be what kinds of software you can run on them. So for example a "python" property could be applied to both "CPU" and "GPU" processor kinds, but a "CUDA" property could be applied to only NVIDIA GPUs. |
Specifically for python, before it can become a property that can be applied to more than one processors we need support for subprocesses #627, so we can have multiple copies of the python interpreter under the same runtime. Until then "python" will have to function more like an "application" processor, of which there can only be one per runtime. However, as @streichler noted, even if there are restrictions on the number of "application" processors based on the computational resources they manage (e.g. one "GPU" processor per physical device), two or more of them can be joined into one processor, that has access to both resources (e.g. a single processor that manages both the python interpreter and a GPU), with the caveat that if a task running on this processor doesn't use one of the resources (e.g. a python task that doesn't use the GPU) then that resource goes idle while the task is running. |
I don't think that's necessarily true. You can have just one Python interpreter that is used by all the processors with that property. You would have to serialize access to it and it would become a sequential bottleneck if multiple processors all tried to use it concurrently, but I believe it would work correctly. |
Exactly. At the moment, we have a guarantee that only one Python thread is running in an interpreter at a time. This is actually kind of nice because it means you don't need to worry about synchronization like you normally would in concurrent programs. (Python's GIL is not sufficient as it provides no guarantees about where in a function it may break execution; i.e. code is still concurrent even if it is not parallel with the GIL.) |
In the long run, I think it will actually be an open configuration question for users how they want to configure Python interpreters with processors. You can imagine a case where you want a Python interpreter separately for each processor so you can run as many parallel Python tasks as possible, but you can also imagine a case where you want just a few or one Python interpreter so that tasks can share common data structures inside the Python interpreter (e.g. a Legate runtime object). |
Realm now has 3 different kinds of "application cpu processors": LOC_PROC, PROC_SET, and OMP_PROC. Remove the later two and just have the normal LOC_PROC have a notion of having more than one assigned core, and (depending on build settings) potentially supporting things like OpenMP or Kokkos or whatever in task bodies.
The text was updated successfully, but these errors were encountered: