Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the full dataset for accuracy validation in the integration test #655

Open
anhappdev opened this issue Feb 15, 2023 · 7 comments
Open
Labels
domain:ci Something related to continuous integration

Comments

@anhappdev
Copy link
Collaborator

In our integration test we have a check to validate the accuracy results of the benchmarks. Currently, the app uses a tiny subset of the full dataset, so its result may not be reliable.

We should use the full dataset instead. Since this test is run exclusively on our CI, the full dataset can be stored privately on Google Cloud.

@anhappdev anhappdev added the domain:ci Something related to continuous integration label Feb 15, 2023
@freedomtan
Copy link
Contributor

@Mostelk to check if we can get space from what Bruno is planning to buy. 20 GiB should be a safe bet for this purpose.
access frequency: maybe 50 times per month.

@freedomtan
Copy link
Contributor

@Mostelk to check with with Bruno for the progress.

for public data, it's pretty easy.

The main issue: ImageNet

What we need: 8.x GiB for all the full validation.

@anhappdev
Copy link
Collaborator Author

Another solution would be using GitHub Release if each file is under 2 GB.
The SNUSR dataset was released this way:
https://github.com/mlcommons/mobile_models/releases

@freedomtan
Copy link
Contributor

@freedomtan check if we can put datasets other than ImageNet to GitHub Release.

@freedomtan
Copy link
Contributor

@freedomtan
Copy link
Contributor

Let's use github release for datasets other than ImageNet.

@anhappdev
Copy link
Collaborator Author

Waiting until #707 is merged so we save time and bandwidth downloading dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Something related to continuous integration
Projects
None yet
Development

No branches or pull requests

2 participants