-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1
Labels
Milestone
Comments
On oneAPI at least 2 products implemented this feature:
So it look like multiple projects have this requirement of exposing tile as device |
FYI: we have a debug key in compute-runtime that provides that functionality: DECLARE_DEBUG_VARIABLE(int32_t, ReturnSubDevicesAsApiDevices, -1, "Expose each subdevice as a separate device during clGetDeviceIDs or zeDeviceGet API call") |
Can confirm this atleast enumerates correctly with L0. Needed the |
ReturnSubDevicesAsApiDevices currently works only with NEOReadDebugKeys=1. |
jandres742
pushed a commit
to jandres742/level-zero-spec
that referenced
this issue
Jun 27, 2023
Resolves: oneapi-src#1 Signed-off-by: Jaime Arteaga <[email protected]>
wdamon-intel
pushed a commit
that referenced
this issue
Jul 7, 2023
Resolves: #1 Signed-off-by: Jaime Arteaga <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Moved from oneapi-src/level-zero#86
jandres742 commented 5 days ago
From customer feedback:
Currently with a device with two sub-devices, following mask exposes the root device and bot sub-devices:
ZE_AFFINITY_MASK=0.0,0.1
Request is to have these exposed as two separate root devices. In other words, that each sub-device exposed in the mask is presented by Level Zero as a device, with no sub-devices.
@jandres742
Author
jandres742 commented 5 days ago
When you use the affinity mask, we expose the parent device when at least 2 sub-devices are selected with the mask. From https://spec.oneapi.io/level-zero/latest/core/PROG.html?highlight=affinity#affinity-mask[](https://github.com/servesh): See here how for a 4 sub-device system, when you have 1.3 and 1.0 in the mask, then we expose the root device and two subdevices for it (see below).
The following examples demonstrate proper usage for a system configuration of two devices, each with four sub-devices:
• …
• 0.2, 1.3, 1.0, 0.3: both device 0 and 1 are reported; device 0 reports sub-devices 2 and 3 as sub-devices 0 and 1, >respectively; device 1 reports sub-devices 0 and 3 as sub-devices 0 and 1, respectively; the order is unchanged.
Now, the reason we do that, instead of exposing 1.3 and 1.0 as separate devices is threefold:
Flexibility:
a. It exposes everything to the application, letting it to decide what to use and what not. If the application wants to see each sub-device as a device, then middleware library (DPC++, OpenMP) or the application can use the sub-device handles, but if other application wants to use the hierarchy of root and sub-device handles, then it would be also available. Limiting to exposing sub-devices as devices always, would limit applications who want to see the hierarchy.
Implicit scaling:
a. By exposing the root device, we allow for implicit scaling to be supported with a sub-set of tiles. In the sample below, we would have implicit scaling with the two-out-of-four tiles 1.3, and 1.0. The application then would decide whether to use the root device with a 2T implicit scaling, or just use the tiles directly. If we exposed each sub-device as a device, then implicit scaling wouldn’t’ be possible with a sub-set of tiles.
Scalability:
a. In the future we could have further levels in the device hierarchy, with sub-devices inside sub-devices. In this case, it would become difficult to decide what a device is. Imagine the case where you have this:
1 root device
2 tiles
Each tile with 4 sub-sub-devices.
Now imagine the user pass this mask:
MASK=0.0,0.1.2,0.1.3
In this case, if we exposed each as a sub-device, then we would have each device with a different set of capabilities, which may further complicate things. However, by exposing in this case
MASK=0.0,0.1.2,0.1.3 =>
root device handle 0
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
it is clear, and easier for the application, to traverse the device hierarchy and understand what each device handle represents.
In this case, if the application, DPC++, or OpenMP, wants to see 0.0, 0.1.2, and 0.1.3, as separate devices, can do it by just selecting the right-most leaves in the trees, and if other application wants to see the whole hierarchy, and use implicit scaling, then it would use the device handle that they need.
Now, one proposal from customers is to either change the meaning of the affinity mask, or to define a new one, like ZE_VISIBLE_DEVICES, which allows for this model.
@servesh
servesh commented 4 days ago
@jandres742 Would it make sense to be more pragmatic in the way root devices are shown to the programming layer above?
The current issue seems to stem from, "we expose the parent device when at least 2 sub-devices are selected with the mask"
My thinking here is,
MASK=0.0,0.1.2,0.1.3=>
sub device handle 0: representing 0.0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
MASK=0,0.0,0.1,0.1.2,0.1.3=>
root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
And if the application chooses a device handle with subdevice, then implicitly scale the workload across its subdevices.
@jandres742
Author
jandres742 commented 4 days ago
thanks @servesh . I think what you are saying is the same as me, no? The way we have the affinity mask defined allows for allowing users to programmatically select the device handle in the hierarchy that fits their needs, depending on the mask passed. The behavior you showed in your example is exactly that. We would expose several device handles in the hierarchy, and do implicit scaling and color the allocations accordingly, and as you say, the application can programmatically select the handle it wants.
MASK=0,0.0,0.1,0.1.2,0.1.3=>
root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory > allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
If instead of that, we would expose each of this comma-separated masks as a single device, then no memory coloring nor implicit scaling would be possible.
That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.
@TApplencourt
TApplencourt commented 4 days ago •
That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.
I agree. HavingZE_VISIBILE_DEVICES or another ENV, will be maybe more tractable. Look like both behaviors (the visibly and the masking) are needed.
Some users definitely want the same behavior as ROCR_VISIBLE_DEVICES. So not giving a mask, just an "expose was I pass you as a device".
So having 2 different ENV seems to be a good idea!
The text was updated successfully, but these errors were encountered: