Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hexagon] Generalize builtin for Nd memory alloc with storage scope and add lowering for VTCM / Hexagon #10558

Merged
merged 44 commits into from
Mar 14, 2022

Conversation

adstraw
Copy link
Contributor

@adstraw adstraw commented Mar 10, 2022

This PR modifies the texture2d_alloca builtin --- a specialized intrinsic for allocating texture memory --- to a more general form called nd_mem_alloc_with_scope and provides a lowering for that builtin for both texture memory / OpenCL and VTCM / Hexagon.

This change allows for allocations from cache_read and cache_write schedule primitives with VTCM storage scope to be lowered to specialized interfaces in the Hexagon Device API to allocate and free VTCM: AllocVtcmWorkspace and FreeVtcmWorkspace. Where previously those allocations were lowered to AllocWorkspace and FreeWorkspace which allocate and free DDR.

This PR makes possible but does not address a standard (rather than specialized) AllocWorkspace with storage scope in the Device API.

adstraw and others added 30 commits March 2, 2022 09:53
Breakage was caused by apache#9727, which
didn't account for the new `builtin::mem_copy()` when computing the
stack size in `StackSizeChecker`.
@adstraw adstraw changed the title [Hexagon] Add lowering from AllocateNd to VTCM workspace allocation in Hexagon Device API [Hexagon] Generalize builtin for Nd memory alloc with storage scope and add lowering for VTCM / Hexagon Mar 10, 2022
@@ -163,6 +170,63 @@ TVM_REGISTER_GLOBAL("device_api.hexagon.mem_copy").set_body([](TVMArgs args, TVM
*rv = static_cast<int32_t>(0);
});

std::map<void*, HexagonBuffer*> vtcmallocs;

TVM_REGISTER_GLOBAL("device_api.hexagon.AllocNd").set_body([](TVMArgs args, TVMRetValue* rv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a lambda for the body, can we move this to a separate function? It tends to make debugging easier later on, and becomes a template for a generalized function we can propose adding to c_runtime_api.cc.

Copy link
Contributor Author

@adstraw adstraw Mar 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR creates a separate function called AllocVtcmWorkspace in the Hexagon Device API that is called from the lambda and which mirrors AllocTextureWorkspace in the OpenCL device API. These two APIs are the template for a more generalized Device API to "allocate a workspace with storage scope" but this should occur in a follow-up PR, in my opinion.

Note that the lambda must construct Device and DLDataType arguments for AllocVtcmWorkspace from primitive types passed through the builtin before making the call to AllocVtcmWorkspace.

src/tir/transforms/lower_tvm_builtin.cc Show resolved Hide resolved
Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great with few minor suggestions!

src/runtime/hexagon/hexagon/hexagon_device_api_v2.cc Outdated Show resolved Hide resolved
src/tir/transforms/lower_vtcm_alloc.cc Outdated Show resolved Hide resolved
src/tir/transforms/texture_flatten.cc Outdated Show resolved Hide resolved
@adstraw adstraw requested review from Lunderberg and csullivan March 14, 2022 19:34
@adstraw
Copy link
Contributor Author

adstraw commented Mar 14, 2022

@kparzysz-quic please let me know if you have any feedback, this is very nearly ready to merge

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kparzysz-quic
Copy link
Contributor

My only comment is about naming conventions in device_api: there is snake_case (e.g. device_api.hexagon.mem_copy), but here we're introducing CamelCase: device_api.hexagon.AllocNd. Should we have a uniform naming style?

@adstraw
Copy link
Contributor Author

adstraw commented Mar 14, 2022

My only comment is about naming conventions in device_api: there is snake_case (e.g. device_api.hexagon.mem_copy), but here we're introducing CamelCase: device_api.hexagon.AllocNd. Should we have a uniform naming style?

Good point. If it's all right with you let's clean this up in the subsequent PR that will be needed to clean up the TODO items and enable nd allocations through the Hexagon Device API.

@tmoreau89 tmoreau89 merged commit 2f7bb58 into apache:main Mar 14, 2022
@csullivan
Copy link
Contributor

Thanks @adstraw!

@tmoreau89
Copy link
Contributor

Forgot to add the usual message :)
Thank you @adstraw @csullivan @kparzysz-quic @elvin-n the PR has been merged!

@adstraw adstraw deleted the hexagon-alloc-nd-lower-vtcm branch April 6, 2022 16:31
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…nd add lowering for VTCM / Hexagon (apache#10558)

* repurpose texture flatten for vtcm; TIR lowering correct

* clean up remaining code in texture flatten pass

* add Alloc and FreeTexture, but failing to run over rpc

* test passing with malloc in the device api

* cleanup

* fails in very reliable way with memory corruption

* working with non-HexagonBuffer vtcm alloc

* cleanup

* do not pass scope through mem_copy api

* [Hexagon] Resolve breakage in test_hexagon/test_cache_read_write

Breakage was caused by apache#9727, which
didn't account for the new `builtin::mem_copy()` when computing the
stack size in `StackSizeChecker`.

* use HexagonBuffer in Alloc and Free packed funcs

* Added comment indicating need for StackSizeChecker::MakeMemCopy.

* add AllocVtcmWorkspace and FreeVtcmWorkspace

* cleanup

* Updated unittests to run all contrib/test_hexagon at CI.

* create separate vtcm alloc lowering pass and transform

* reset texture_flatten.cc

* comments

* CI bump

* Fix lint formatting error.

* Updated fix to remove StackSizeChecker entirely.

* pass device and type to device api

* Bugfix, verify the precheck's allocations, not own.

* Bugfix, pass context information to the precheck.

* pass order and shape to device api

* working

* fix up types and arg passing

* pass scope to device api

* common builtin for texture / vtcm

* add scope to freend api

* format and lint

* fixed missed format error

* restart ci

* fix test random value issue + code review feedback

* fix test hang

* restructure lower vtcm pass per code review feedback (option a)

* format error

* global.vtcm + tvm_stack_make_shape

Co-authored-by: Eric Lunderberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants