Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: new benchmark tool #9893

Open
stas00 opened this issue Jan 29, 2021 · 2 comments
Open

rfc: new benchmark tool #9893

stas00 opened this issue Jan 29, 2021 · 2 comments
Assignees
Labels
Benchmarks Issues related to Memory regressions in tests and scripts WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@stas00
Copy link
Contributor

stas00 commented Jan 29, 2021

This issue is to collect notes and ideas on creating a new benchmarking tool.

This is not about the other speed/memory regression project we have been discussing elsewhere.

This is about integration and various comparisons that we need to run in order to give users the best advice on how to deploy transformers in the most efficient way.

Please share the comments ideas/suggestions/concerns/needs, and I will compile them here.

  • important: not part of examples - the goal is performance and integration tooling and not user-facing - totally different needs and priorities
  • the cmd line has to continue working the same months later - so that old benchmarks could be re-run - ok to change interface with back-compat option so that the old benchmarks can be still re-validated and compared to
  • ideally work with any transformers model - a single tool to rule them all
  • minimal amount of arguments - just the important ones
  • ability to generate markdown table entries directly and json files that contain not just the outcome but also the key variables that are being tested -
  • the report to include critical hardware/software params as well in a compact form and allow these to be merged from multiple recordings - i.e. if the hw/sw are the same - they can be merged into a single report. will need to figure out how to record hardware nuances
    • e.g. the same DDP test with 2 gpus connected w/ NVLink gives dramatically different results than the same 2 gpus w/o NVLink.
    • not sure how to record CPU-capacity/ free RAM, etc., since all these impact the outcome
  • crucial to be able to truncate the dataset
@stas00 stas00 self-assigned this Jan 29, 2021
@stas00 stas00 added the Benchmarks Issues related to Memory regressions in tests and scripts label Jan 29, 2021
@stas00 stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Mar 18, 2021
@bhadreshpsavani
Copy link
Contributor

bhadreshpsavani commented Jun 16, 2021

I was thinking about one feature if possible,

How about when we run an example script a benchmarking script is automatically run and store the results in one file if the user passes an optional argument.

When the user uploads the model on the model hub we can directly sort the model based on benchmarking results file.

@stas00
Copy link
Contributor Author

stas00 commented Jun 16, 2021

All data files on the model hub for the same model arch will give the same speed performance results, since they are just data points.

Therefore it's the model code that needs to be benchmarked (and the trainer if there is more than one).

And given that currently we have only one model implementation of each there is nothing to compare it to.

The main idea of this issue is to do regression testing, to ensure that we don't accidentally make models slower while changing the code. For an example of this happening, please see: #11218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks Issues related to Memory regressions in tests and scripts WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

2 participants