Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for flexible device hierarchy model #169

Merged
merged 4 commits into from
Jul 7, 2023

Conversation

jandres742
Copy link

Resolves: #1

@jandres742
Copy link
Author

@wdamon-intel , @MichalMrozek: please review

Copy link
Contributor

@wdamon-intel wdamon-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly, just a few minor comments to address.


With a value of `0`, ${x}DeviceGet returns all the devices that do not have a root-device. Traversing the device hierarchy is possible by querying sub-devices with ${x}DeviceGetSubDevices and root-devices with ${x}DeviceGetRootDevice. Driver implementation may perform implicit optimizations to submissions and allocations done in the root-devices.

With a value of `1`, ${x}DeviceGet returns all the devices that do not have sub-devices. Traversing the device hierarchy is **not** possible, with ${x}DeviceGetSubDevices returning always a count of 0 device handles and and ${x}DeviceGetRootDevice returning nullptr. This mode allows Level Zero driver implementations to optimize execution and memory allocations by removing any overhead required to account for simultaneous use of root-devices and sub-devices in the same application.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] 0 device handles and and [...]
"and and"

scripts/core/PROG.rst Outdated Show resolved Hide resolved

${X}_FLAT_DEVICE_HIERARCHY allows users to select the device hierarchy model with which the underlying hardware is exposed and the types of devices returned with ${x}DeviceGet.

With a value of `0`, ${x}DeviceGet returns all the devices that do not have a root-device. Traversing the device hierarchy is possible by querying sub-devices with ${x}DeviceGetSubDevices and root-devices with ${x}DeviceGetRootDevice. Driver implementation may perform implicit optimizations to submissions and allocations done in the root-devices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in mode 0 DeviceGetRootDevice is not allowed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalMrozek

You mean in mode 0, which is what we have currently, the application cannot query for the root device? could you elaborate on why would that be? this has been a request from some customers some times, to query for the parent device for a sub-device. From point of view of implementation, this doesnt change too much, as the Level Zero Driver implementation already has that info, it would be just a matter of returning it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API was only meant for mode 2, when you start to flatten the hierarchy.
If hierarchy is not flattened (as in mode 0), then getroot is not supported.

That's the whole difference between mode 2 and other modes, it allows this new API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalMrozek : thanks. yes, the API was originally meant for mode 2, but there's no functional or performance reason to not have it available in mode 0, would that be correct? By having it in mode 0 as well, we allow users to have more flexibility and broader set of APIs to use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say you are in mode 0
app did ZE_AFFINITY_MASK=0.0
app calls getRoot on such device, what is returned?

Copy link
Author

@jandres742 jandres742 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalMrozek it is return nullptr, but if you are in mode 0, and use ZE_AFFINITY_MASK=0, then you can call getRootDevice on any handle obtained with getSubDevices and get the parent device.

It is the same for getSubDevices. With MASK=0, getSubDevices for a root device returns a list of sub-devices, and with MASK=0.0 getSubDevices returns a count of 0. Spec doesnt forbid the use of getSubDevices with MASK=0.0, it just returns the appropriate value depending on the situation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in mode 1 , we will return all sub devices as root device and getRoot will return nullptr?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalMrozek Correct. in mode 1, getRootDevice returns nullptr, and also getSubDevices will return count of 0.

same system with two devices and four sub-devices.

- `0, 1, 2, 4`: all sub-devices are reported by ${x}DeviceGet (same as default)
- `0`: only sub-device 0 in the first device is reported

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no sub devices when FLAT_DEVICE_HIERACHY is "1"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @MichalMrozek . I have expanded this section to make it clearer and added some tables showing what is being exposed. It would look something like this when rendered:

image

Jaime Arteaga added 2 commits June 27, 2023 10:10
Copy link
Contributor

@wdamon-intel wdamon-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jandres742
Copy link
Author

@MichalMrozek : do you have further comments or think that PR is good to go?

@MichalMrozek
Copy link

The initial plan was that new api getRoot is only available in mode 2.
Right now it will be enabled in mode 0, so this would be addition to existing model.
We need clear explanation that resources created for sub devices will not work on extracted root devices, unless root device is a part of the context, same with programs, kernels etc.
Right now we allow that resources created on root devices are available in sub devices , even if they are not in the context, so we need to be careful to not have the same level of expectations with new getRoot API.

@jandres742
Copy link
Author

The initial plan was that new api getRoot is only available in mode 2. Right now it will be enabled in mode 0, so this would be addition to existing model. We need clear explanation that resources created for sub devices will not work on extracted root devices, unless root device is a part of the context, same with programs, kernels etc. Right now we allow that resources created on root devices are available in sub devices , even if they are not in the context, so we need to be careful to not have the same level of expectations with new getRoot API.

@MichalMrozek : currently, we allow for sub-devices to share the resources from the root device, that's correct, and you are correct, the same cannot be assumed when you have created a resource in a sub-device, and then getRootDevice is called.

So we could add something like this to getRootDevice:

"To allow the device handle returned by getRootDevice access the resources created by the sub-device handle, these resources need to be created with a context containing explicitly both handles.".

@jandres742
Copy link
Author

The initial plan was that new api getRoot is only available in mode 2. Right now it will be enabled in mode 0, so this would be addition to existing model. We need clear explanation that resources created for sub devices will not work on extracted root devices, unless root device is a part of the context, same with programs, kernels etc. Right now we allow that resources created on root devices are available in sub devices , even if they are not in the context, so we need to be careful to not have the same level of expectations with new getRoot API.

@MichalMrozek : currently, we allow for sub-devices to share the resources from the root device, that's correct, and you are correct, the same cannot be assumed when you have created a resource in a sub-device, and then getRootDevice is called.

So we could add something like this to getRootDevice:

"To allow the device handle returned by getRootDevice access the resources created by the sub-device handle, these resources need to be created with a context containing explicitly both handles.".

Added clarification.

@wdamon-intel wdamon-intel merged commit 05e8e15 into oneapi-src:master Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Expose sub-device exposed by ZE_AFFINITY_MASK as devices
3 participants