Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[microNPU][2c] Add performance modelling to cascader #9778

Merged
merged 3 commits into from
Jan 17, 2022

Conversation

jacobbohlin
Copy link
Contributor

RFC: apache/tvm-rfcs#37
Issue: #9429

NOTE: This PR builds on top of #9469 and #9471 and therefore includes those changes. This PR will remain as 'draft' until both dependencies are merged.

The algorithm described in the RFC uses two metrics for pareto culling, performance and memory usage. This commit addresses the former and introduces the basis of performance estimation for the Parts. It also includes performance estimation code that is specific to ethosu_conv2d.

The output of the performance model is only meant to be consumed by the cascader.

@jacobbohlin
Copy link
Contributor Author

Comment on lines -71 to 121
if (!is_rolling) {
num_blocks *= output_stripe_config->GetShape()[i] * output_stripe_config->GetStripes()[i] /
if (buffer_mode == BufferMode::RECOMPUTE) {
num_blocks *= static_cast<float>(output_stripe_config->GetShape()[i] *
output_stripe_config->GetStripes()[i]) /
block_shape[i];
} else {
num_blocks *= output_stripe_config->GetExtent()[i] / block_shape[i];
num_blocks *= static_cast<float>(output_stripe_config->GetExtent()[i]) / block_shape[i];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to mention that this logic is placeholder and will be replaced in a later patch.

@mbaret
Copy link
Contributor

mbaret commented Jan 12, 2022

Just a quick note on the test coverage of this feature. The results of the performance model are not explicitly tested against the FVP because we don’t have performance instrumentation available in CI. We will however be testing this component downstream where such instrumentation is available.

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.
@mbaret
Copy link
Contributor

mbaret commented Jan 17, 2022

cc @manupa-arm could you take a look and merge if everything's OK? Thanks

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@manupak manupak merged commit 133bb9c into apache:main Jan 17, 2022
@manupak
Copy link
Contributor

manupak commented Jan 17, 2022

Thanks! @jacobbohlin @mbaret

yuanfz98 pushed a commit to yuanfz98/tvm that referenced this pull request Jan 24, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
crazydemo pushed a commit to crazydemo/tvm that referenced this pull request Jan 27, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
* [microNPU][2c] Initial Performance Model

* Added the pre-computed performance modelling per block.
* Added the aggregation of cycles given a stripe config.
* Implemented the op-specific performance code for conv2d.
* Created a DeviceConfig class to hold constant performance related data
that is dependent on the accelerator configuration
* Added generation of all valid block configs. This is pre-computed and
given as an argument when constructing EthosuParts.
* Implemented selection of the block config that gives the least amount
of data read given a StripeConfig.

* Add test guards

* Extended block config testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants