Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only output bo needs to be synced from device after result is available #849

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ artifacts/

# Local cache files
llvm-external-projects/iree-dialects/.cache
.build-cache

# pkgci artifacts
artifacts/
Expand Down
12 changes: 7 additions & 5 deletions runtime/src/iree-amd-aie/driver/xrt/direct_command_buffer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -333,11 +333,10 @@ static iree_status_t iree_hal_xrt_direct_command_buffer_dispatch(
// Third argument is the number of LX6 instructions.
run.set_arg(arg_index++, kernel_params.num_instr);

xrt::bo ofm_bo;

// Copy descriptors from all sets to the end of the current segment for later
// access.
// TODO(jornt): hack to ensure that the output buffer is synced by syncing all
// buffers after the run.
std::vector<xrt::bo> bos;
// TODO(max): do we need multiple descriptor sets ever for AIE?
uint32_t set = 0;
IREE_RETURN_AND_END_ZONE_IF_ERROR(
Expand All @@ -348,8 +347,11 @@ static iree_status_t iree_hal_xrt_direct_command_buffer_dispatch(
xrt::bo(*command_buffer->descriptor_sets[set].bindings[j],
command_buffer->descriptor_sets[set].lengths[j],
command_buffer->descriptor_sets[set].offsets[j]);
bos.push_back(arg_buffer);
run.set_arg(arg_index + j, arg_buffer);
bool not_ofm = (bindings.values[j].buffer->memory_type & IREE_HAL_MEMORY_TYPE_HOST_VISIBLE) &&
(bindings.values[j].buffer->allowed_usage & IREE_HAL_BUFFER_USAGE_MAPPING);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IREE_HAL_BUFFER_USAGE_MAPPING is also not the correct flag:

  // WARNING: mapping can be extremely expensive, use limited hardware
  // resources, introduce data hazards, and synchronize host and device
  // execution. Unless an application knows that such issues will not arise
  // (as in tests where there's never concurrent usage) mapping should be used
  // judiciously: do not assume mapping is a high-performance technique!

https://github.com/iree-org/iree/blob/66342abbfaaee707e27ecc7d8151ad9e357ca0da/runtime/src/iree/hal/buffer.h#L389

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to this comment:
#849 (comment)

if (!not_ofm)
ofm_bo = arg_buffer;
}

run.start();
Expand All @@ -360,7 +362,7 @@ static iree_status_t iree_hal_xrt_direct_command_buffer_dispatch(
return iree_make_status(IREE_STATUS_UNKNOWN, e.what());
}

for (xrt::bo& bo : bos) bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
ofm_bo.sync(XCL_BO_SYNC_BO_FROM_DEVICE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because today we only have one output, doesn't mean tomorrow we will only have one output.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far all NPU products only use one output buffer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has absolutely nothing to do with NPU and everything to do with the model.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has absolutely nothing to do with NPU and everything to do with the model.

Do we have a model running on NPU with multiple output bo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed doesn't have anything to do with NPU and whether we currently have a model like this or not. We shouldn't make the assumption that there is a single output buffer.


IREE_TRACE_ZONE_END(z0);
return iree_ok_status();
Expand Down
Loading