Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resurrect ao benchmark on AWS A100 runner #2561

Closed
wants to merge 32 commits into from
Closed

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Dec 17, 2024

I'm bringing back some example models first, one for each set (TIMM, HF, TorchBench), to have some data to unblock our TorchAO ClickHouse migration. More can be added later if we decide to keep this workflow.

Testing

https://github.com/pytorch/benchmark/actions/runs/12388956432/job/34581035274

The results are now available on oss_ci_benchmark_v3 table select * from oss_ci_benchmark_v3 where workflow_id = 12388956432

xuzhao9 and others added 8 commits June 17, 2024 15:13
Summary:
attempt to fix dependencies - this is no longer compatible with the latest huggingface_hub, see failing test at  https://github.com/pytorch/pytorch/actions/runs/11445304501/job/31843081598

Pull Request resolved: #2523

Reviewed By: huydhn

Differential Revision: D64711662

Pulled By: wdvr

fbshipit-source-id: eed9143e6e0531840a53ba5ab3fad04894727272
Summary:
Some fixes for pytorch/pytorch#137602

Pull Request resolved: #2514

Reviewed By: xuzhao9

Differential Revision: D64628614

Pulled By: mikaylagawarecki

fbshipit-source-id: edebf25cc6648919d5673a3baeaffdac26e5b91f
Summary: Pull Request resolved: #2524

Reviewed By: kit1980

Differential Revision: D64771621

Pulled By: mikaylagawarecki

fbshipit-source-id: 545f3d528cfbe2668c8d37e98e99423cd77a8e8e
@huydhn huydhn temporarily deployed to docker-s3-upload December 18, 2024 08:17 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to docker-s3-upload December 18, 2024 08:17 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to docker-s3-upload December 18, 2024 08:17 — with GitHub Actions Inactive
@huydhn huydhn temporarily deployed to docker-s3-upload December 18, 2024 08:18 — with GitHub Actions Inactive
@huydhn huydhn changed the title Resurrect ao benchmark Resurrect ao benchmark on Dev Infra AWS A100 runner Dec 18, 2024
@huydhn huydhn changed the title Resurrect ao benchmark on Dev Infra AWS A100 runner Resurrect ao benchmark on AWS A100 runner Dec 18, 2024
@huydhn huydhn marked this pull request as ready for review December 18, 2024 08:45
@facebook-github-bot
Copy link
Contributor

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@huydhn has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@huydhn merged this pull request in 65789d4.

huydhn added a commit to pytorch/test-infra that referenced this pull request Dec 27, 2024
After pytorch/benchmark#2561, TorchBench AO
benchmark data is now available to query and we can finally use that
dashboard again. If this proves useful, the next steps would be:

1. pytorch/benchmark#2561 only brings back one
example model for each suite (TorchBench, HF, TIMM). We need to add
more.
2. TorchBench AO dashboard shares the code with TorchInductor dashboard.
While the former has been migrated to the new benchmark database, the
latter hasn't. I will need to do that and clean this up in a later PR.
3. Looking at the results on the dashboard, it seems that `autoquant`
works, but not `int8dynamic` and `int8weightonly`. I'm not sure if they
are still relevant, but if they are, ao team should know how to fix them
(cc @jerryzh168). The run on TorchBench is at
https://github.com/pytorch/benchmark/actions/workflows/torchao.yml

### Testing

The two metrics speedup and abs execution time are now showing up
https://torchci-git-fork-huydhn-ch-migrate-torchao-queries-fbopensource.vercel.app/benchmark/torchao?dashboard=torchao&startTime=Sun%2C%2015%20Dec%202024%2011%3A06%3A45%20GMT&stopTime=Sun%2C%2022%20Dec%202024%2011%3A06%3A45%20GMT&granularity=hour&mode=inference&dtype=autoquant&deviceName=cuda%20(a100)&lBranch=main&lCommit=07e6ef43fca2e95bc6cf59f97ba6251e618ef0e3&rBranch=main&rCommit=c03fa7c6c1bd03242a9de1fddb77a9c778106afd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants