-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU] enable USMP #10022
[microNPU] enable USMP #10022
Conversation
ff3ca9e
to
a854d61
Compare
a854d61
to
d709b14
Compare
* increase test coverage in variants Change-Id: I42a8d7edf3ed92b9643ea01821dbf2f5f97b3e4f
* zephyr cmsis needs more overhead workspace Change-Id: Ifd0071360df4096528d3b1e562e4fa82b1bcb36f
* rebase fixes -- PoolInfo is moved out of usmp namespace Change-Id: Idcf3b4a9a08d8d430eebee9a748ef0f0878379b0
* added more docs Change-Id: I856917523f3d1094d6b5a2335cfd89028f512e37
a33ddf5
to
b7e5d6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -105,22 +195,27 @@ def translate(tir_module, params): | |||
base_addresses : List[util.BaseAddress] | |||
base addresses to be used by the driver | |||
""" | |||
|
|||
candidate_regions_for_scratch = [5, 2, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekalda , I did the change as this PR got conflicted. See if it looks good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -105,22 +195,27 @@ def translate(tir_module, params): | |||
base_addresses : List[util.BaseAddress] | |||
base addresses to be used by the driver | |||
""" | |||
|
|||
candidate_regions_for_scratch = [5, 2, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding comment (and the other docstrings as well)! I suppose "regions ranging from 0-6" can be interpreted as both, total of 6 or 7 regions, depending on the counting philosophy :D I think it is clear enough what is meant there though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly worried by the hack I left in being further hacked, thoughts on why that is @manupa-arm ? Otherwise a few naming things that can be looked into later 😸
|
||
tvm.tir.stmt_functor.post_order_visit(primfunc.body, analyze_pool_access) | ||
|
||
tvmbaw_region = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than referencing TVM APIs it's probably better to use workspace_region
and workspace_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I was looking for a word differentiate a runtime allocation that is serviced outside of the codegen. Any suggestion?
(workspace_region and workspace_size seems ambigous in that sense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe dynamic_allocation_region ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, though low priority 😸
@@ -34,7 +34,7 @@ extern float output_storage[12]; | |||
|
|||
extern const size_t output_len; | |||
|
|||
static uint8_t g_crt_workspace[TVMGEN_DEFAULT_WORKSPACE_SIZE + 256]; | |||
static uint8_t g_crt_workspace[TVMGEN_DEFAULT_WORKSPACE_SIZE + 512]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I add this? This looks like a hack to increase the workspace size for over-allocation, if we still need this is something broken in USMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea -- I was puzzled by this too.
In fact, USMP is not enabled for cmsis-nn (yet -- its coming in the next PR) and this is only needed for Zephyr -- other cmsis-nn tests are fine. I am not familiar with Zephyr related impacts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. when USMP is enabled we dont need this workspace altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was surfaced when I removed this :
tvm/python/tvm/micro/model_library_format.py
Lines 217 to 219 in c54a3dd
# TODO(Mousius) - Remove this massive hack when Targets are unified | |
if target.kind.name in external_codegens: | |
device_max_workspace[main_target] += int(workspace_size) |
Which seems to be adding the workspace again for external_codegens.
I think we need to investigate this further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zephyr should have no real impact on the memory pre-allocated here as it's just a block in flash, this is deeply worrying as the allocator is configured here:
StackMemoryManager_Init(&app_workspace, g_crt_workspace, TVMGEN_DEFAULT_WORKSPACE_SIZE); |
Thus the allocator itself should never go over if it's performing properly, something is very weird here but I agree we should investigate further when we've integrated USMP fully.
( | ||
scratch_region_map, | ||
tvmbaw_workspace_size, | ||
tvmbaw_region, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, TVMBAW is a detail of how Allocates are lowered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to align with the name we end up deciding above...
Thanks @manupa-arm 😸 A great step towards unification! |
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
As a follow up to apache#10022, this is a follow PR to perform name change of the region as discussed in that PR. Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
As a follow up to apache#10022, this is a follow PR to perform name change of the region as discussed in that PR. Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
As a follow up to apache#10022, this is a follow PR to perform name change of the region as discussed in that PR. Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
Following an investigation from #10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR.
As a follow up to apache#10022, this is a follow PR to perform name change of the region as discussed in that PR. Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
As a follow up to #10022, this is a follow PR to perform name change of the region as discussed in that PR.
This commit enables USMP in the microNPU codegen and tests. The microNPU codegen is modified to support Let nodes that are produced as from USMP.
Following an investigation from apache#10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR.
As a follow up to apache#10022, this is a follow PR to perform name change of the region as discussed in that PR.
This commit enables USMP in the microNPU codegen and tests.