-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU][2c] Add performance modelling to cascader #9778
Conversation
77a5b22
to
366aa80
Compare
1124baf
to
35d9164
Compare
35d9164
to
0fa664d
Compare
6366031
to
426d0ae
Compare
426d0ae
to
c4a4b5a
Compare
if (!is_rolling) { | ||
num_blocks *= output_stripe_config->GetShape()[i] * output_stripe_config->GetStripes()[i] / | ||
if (buffer_mode == BufferMode::RECOMPUTE) { | ||
num_blocks *= static_cast<float>(output_stripe_config->GetShape()[i] * | ||
output_stripe_config->GetStripes()[i]) / | ||
block_shape[i]; | ||
} else { | ||
num_blocks *= output_stripe_config->GetExtent()[i] / block_shape[i]; | ||
num_blocks *= static_cast<float>(output_stripe_config->GetExtent()[i]) / block_shape[i]; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to mention that this logic is placeholder and will be replaced in a later patch.
Just a quick note on the test coverage of this feature. The results of the performance model are not explicitly tested against the FVP because we don’t have performance instrumentation available in CI. We will however be testing this component downstream where such instrumentation is available. |
* Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig.
be05493
to
e1daf76
Compare
cc @manupa-arm could you take a look and merge if everything's OK? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! @jacobbohlin @mbaret |
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
* [microNPU][2c] Initial Performance Model * Added the pre-computed performance modelling per block. * Added the aggregation of cycles given a stripe config. * Implemented the op-specific performance code for conv2d. * Created a DeviceConfig class to hold constant performance related data that is dependent on the accelerator configuration * Added generation of all valid block configs. This is pre-computed and given as an argument when constructing EthosuParts. * Implemented selection of the block config that gives the least amount of data read given a StripeConfig. * Add test guards * Extended block config testing
RFC: apache/tvm-rfcs#37
Issue: #9429
NOTE: This PR builds on top of #9469 and #9471 and therefore includes those changes. This PR will remain as 'draft' until both dependencies are merged.
The algorithm described in the RFC uses two metrics for pareto culling, performance and memory usage. This commit addresses the former and introduces the basis of performance estimation for the Parts. It also includes performance estimation code that is specific to ethosu_conv2d.
The output of the performance model is only meant to be consumed by the cascader.