Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checksum checks for all MLPerf inference dataset and model downloads #167

Closed
9 tasks done
Tracked by #20 ...
arjunsuresh opened this issue Aug 16, 2024 · 7 comments
Closed
9 tasks done
Tracked by #20 ...
Assignees
Labels
enhancement New feature or request

Comments

@arjunsuresh
Copy link
Contributor

arjunsuresh commented Aug 16, 2024

We need to add checksum checks for all models and datasets used in MLPerf inference. For folder downloads, we can use checksum file like done here

  • ResNet50
  • DLRMv2
  • Stable Diffusion
  • LLAMA2
  • RetinaNet
  • Bert
  • 3d-unet
  • Mixtral
  • GPTJ
@gfursin
Copy link
Contributor

gfursin commented Aug 16, 2024

Hi Arjun. Just a quick question: does the recent MD5SUM check mechanism in CM supports MacOS and Windows? Do we have some tests for Windows and MacOS in GitHub actions (I think I saw somewhere that GitHub now has Windows Server support in workflows)? Thanks!

@arjunsuresh
Copy link
Contributor Author

Hi @gfursin the code to do the check is in python and should work on Windows too. But currently we don't have any Windows specific tests for this. We do have Windows and macOS gh actions for CM installation and ABTF inference - that covers CHECKSUM check for individual files - but nothing yet for folders.

@gfursin
Copy link
Contributor

gfursin commented Aug 16, 2024

Hi @gfursin the code to do the check is in python and should work on Windows too. But currently we don't have any Windows specific tests for this. We do have Windows and macOS gh actions for CM installation and ABTF inference - that covers CHECKSUM check for individual files - but nothing yet for folders.

Cool! Thank you! That's already a very good starting point!

@anandhu-eng
Copy link
Contributor

Hi @arjunsuresh , does the env variabe CM_EXTRACT_EXTRACTED_CHECKSUM_FILE here and CM_DOWNLOAD_CHECKSUM_FILE here have same goal of being the path variable to checksum file for extracted files?

@arjunsuresh
Copy link
Contributor Author

Yes

@arjunsuresh
Copy link
Contributor Author

@anandhu-eng sorry - I was not clear in the earlier reply. `

CM_EXTRACT_EXTRACTED_CHECKSUM_FILE` is the file having the checksums for the files inside the extracted folder.

Even though mostly we download a single file, when we use rclone etc we do download a folder. CM_DOWNLOAD_CHECKSUM_FILE is the file having the checksums for the files inside such downloaded folder.

@anandhu-eng
Copy link
Contributor

Closing the issue, as checksum for the listed models and their datasets have been added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

No branches or pull requests

3 participants