Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microNPU] enable USMP #10022

Merged
merged 6 commits into from
Feb 21, 2022
Merged

[microNPU] enable USMP #10022

merged 6 commits into from
Feb 21, 2022

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Jan 21, 2022

This commit enables USMP in the microNPU codegen and tests.

@manupak
Copy link
Contributor Author

manupak commented Jan 21, 2022

This is blocked on #9929 and #9951

@manupak manupak force-pushed the micronpu_usmp_v2 branch 2 times, most recently from ff3ca9e to a854d61 Compare January 28, 2022 18:15
@manupak manupak marked this pull request as ready for review January 28, 2022 18:16
* increase test coverage in variants

Change-Id: I42a8d7edf3ed92b9643ea01821dbf2f5f97b3e4f
* zephyr cmsis needs more overhead workspace

Change-Id: Ifd0071360df4096528d3b1e562e4fa82b1bcb36f
* rebase fixes -- PoolInfo is moved out of usmp namespace

Change-Id: Idcf3b4a9a08d8d430eebee9a748ef0f0878379b0
* added more docs

Change-Id: I856917523f3d1094d6b5a2335cfd89028f512e37
Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've addressed all the comments.

Waiting for CI 🤞

cc: @leandron @Mousius

@@ -105,22 +195,27 @@ def translate(tir_module, params):
base_addresses : List[util.BaseAddress]
base addresses to be used by the driver
"""

candidate_regions_for_scratch = [5, 2, 1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekalda , I did the change as this PR got conflicted. See if it looks good now.

Copy link
Contributor

@ekalda ekalda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -105,22 +195,27 @@ def translate(tir_module, params):
base_addresses : List[util.BaseAddress]
base addresses to be used by the driver
"""

candidate_regions_for_scratch = [5, 2, 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding comment (and the other docstrings as well)! I suppose "regions ranging from 0-6" can be interpreted as both, total of 6 or 7 regions, depending on the counting philosophy :D I think it is clear enough what is meant there though.

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly worried by the hack I left in being further hacked, thoughts on why that is @manupa-arm ? Otherwise a few naming things that can be looked into later 😸


tvm.tir.stmt_functor.post_order_visit(primfunc.body, analyze_pool_access)

tvmbaw_region = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than referencing TVM APIs it's probably better to use workspace_region and workspace_size ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I was looking for a word differentiate a runtime allocation that is serviced outside of the codegen. Any suggestion?
(workspace_region and workspace_size seems ambigous in that sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe dynamic_allocation_region ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, though low priority 😸

@@ -34,7 +34,7 @@ extern float output_storage[12];

extern const size_t output_len;

static uint8_t g_crt_workspace[TVMGEN_DEFAULT_WORKSPACE_SIZE + 256];
static uint8_t g_crt_workspace[TVMGEN_DEFAULT_WORKSPACE_SIZE + 512];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I add this? This looks like a hack to increase the workspace size for over-allocation, if we still need this is something broken in USMP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea -- I was puzzled by this too.

In fact, USMP is not enabled for cmsis-nn (yet -- its coming in the next PR) and this is only needed for Zephyr -- other cmsis-nn tests are fine. I am not familiar with Zephyr related impacts here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. when USMP is enabled we dont need this workspace altogether.

Copy link
Contributor Author

@manupak manupak Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was surfaced when I removed this :

# TODO(Mousius) - Remove this massive hack when Targets are unified
if target.kind.name in external_codegens:
device_max_workspace[main_target] += int(workspace_size)
.

Which seems to be adding the workspace again for external_codegens.

I think we need to investigate this further

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zephyr should have no real impact on the memory pre-allocated here as it's just a block in flash, this is deeply worrying as the allocator is configured here:

StackMemoryManager_Init(&app_workspace, g_crt_workspace, TVMGEN_DEFAULT_WORKSPACE_SIZE);

Thus the allocator itself should never go over if it's performing properly, something is very weird here but I agree we should investigate further when we've integrated USMP fully.

(
scratch_region_map,
tvmbaw_workspace_size,
tvmbaw_region,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, TVMBAW is a detail of how Allocates are lowered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to align with the name we end up deciding above...

@manupak
Copy link
Contributor Author

manupak commented Feb 17, 2022

@Mousius ,

As we discussed offline, lets tackle the CMSIS issue in the next PR : #10224 .

For the name suggestion, lets use dynamic_allocation_region and dynamic_allocation_size for which I will do a follow up.

@Mousius Mousius merged commit 2f93780 into apache:main Feb 21, 2022
@Mousius
Copy link
Member

Mousius commented Feb 21, 2022

Thanks @manupa-arm 😸 A great step towards unification!

manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 21, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 21, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 21, 2022
As a follow up to apache#10022, this is a follow PR to
perform name change of the region as discussed in
that PR.

Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 22, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 22, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 22, 2022
As a follow up to apache#10022, this is a follow PR to
perform name change of the region as discussed in
that PR.

Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 22, 2022
As a follow up to apache#10022, this is a follow PR to
perform name change of the region as discussed in
that PR.

Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 22, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
manupak added a commit to manupak/incubator-tvm that referenced this pull request Feb 23, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.

Change-Id: I5202d9cc7c6a8c00c73791b82df062a8e13dd224
Mousius pushed a commit that referenced this pull request Feb 23, 2022
Following an investigation from #10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.
manupak added a commit to manupak/incubator-tvm that referenced this pull request Mar 16, 2022
As a follow up to apache#10022, this is a follow PR to
perform name change of the region as discussed in
that PR.

Change-Id: Ifc5244ed2a3b3cc31b089e422c4b3068cd7e26e8
Mousius pushed a commit that referenced this pull request Mar 18, 2022
As a follow up to #10022, this is a follow PR to
perform name change of the region as discussed in
that PR.
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
This commit enables USMP in the microNPU codegen
and tests. The microNPU codegen is modified to
support Let nodes that are produced as from USMP.
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
Following an investigation from apache#10022,
it turns out, currently the workspace
calculation assumes there would be a single
lowered PrimFunc could be produced per
primitive Relay Function.

However, the exception turned out to
be the CMSIS-NN codegen that produces
multiple calls/PrimFuncs in the place
of a single call to single relay PrimFunc.

This commit adds changes to workspace
calculation to be done on lowered IRModule.

Additionally, changes the test utils to
not to generate any stack allocator code
when USMP is used to make the tests more
strict.

This change also removes the confusing
"run_model" which has semantics identitical
to "__tvm_main__" in TIR.
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
As a follow up to apache#10022, this is a follow PR to
perform name change of the region as discussed in
that PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants