-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added huggingface integration - DEPRECATED #880
Added huggingface integration - DEPRECATED #880
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Waiting to hear from @NielsRogge about this PR. Also, is there any way to put unit tests for this? |
from typing import List, Union | ||
|
||
|
||
def push_to_model_hub( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand it you'll provide a CLI tool to push and pull models from the 🤗 hub. cc @Wauplin wondering if it's possible to programmatically add tags?
By default, this won't track any download numbers for the various models or add specific tags which ease model discoverability (like "image-classification" etc.). For that we can open a PR similar to this one: huggingface/huggingface.js#669, where you specify an extension which tracks downloads each time a file with the specified extension is downloaded.
Next to that, one can add code snippets when users click the "use in GaNDLF" button which you can add, as shown in this guide: https://huggingface.co/docs/hub/en/models-adding-libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranayasinghcsmpl @sarthakpati do you have an example of GaNDLF repo pushed on the Hub already? Agree with @NielsRogge, having it as an "officially compatible library" on the Hub would be awesome! It enables download counts + better discoverability + possible to have code snippets.
wondering if it's possible to programmatically add tags?
Of course it is! 🤗 Check out the ModelCard guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have uploaded a test model made using gandlf. You can find it here-https://huggingface.co/pranayasingh/test_upload1/tree/main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt responses @NielsRogge, @Wauplin and @pranayasinghcsmpl!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I see this uploads various checkpoints to a single repository. Typically we recommend to create a single repository per model checkpoint, see a recent example of YOLOv10 where ONNX checkpoints are stored in separate repositories. However I think it makes sense to keep the initial and latest PyTorch checkpoints (along with the best) in a single repository.
For download stats to work, we could add a "gandlf" tag to the model cards, along with a PR like this one, where you can customize which files should contribute to incrementing the download counter. See also this guide: https://huggingface.co/docs/hub/en/models-download-stats.
This will allow you to track all models that people trained using GaNDLF (and see which ones are downloaded the most).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, from an organizational and requirements perspective, we want to enable anyone training models using GaNDLF to be able to upload to HF using their own org information, and not specifically under GaNDLF itself. For example, if organization A
is training a model X-1
, they should be able to upload this model (including the hashes of the code, and whatever else is needed to "deploy" the model) to huggingface.co/A/X-1
(or whatever else that makes sense).
The main reason behind this is that MLCommons does not want to be a curator of models, and only wants to enable other organizations to train their own models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what the huggingface-cli
/huggingface_hub
are for! The suggestion from @NielsRogge is to add a gandlf
tag to the model card metadata from generating it. This way all repos pushed to the Hub with GaNDLF will be tagged as such and searchable. It does not require to manually curate those models yourself. The download count will happen per repo, allowing you and your users to get an idea of the traction you get on the Hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping @NielsRogge :) I'm maintainer of the huggingface_hub
library and sometimes help with integrations 👋
Note that currently it seems that the 2 added commands are very similar to
huggingface-cli upload <repo-id> <path-to-local-folder>
and
huggingface-cli download <repo-id>
without adding new features to it. Those two commands offers more flexibility to users and are officially supported and maintained. On the long run, having 2 aliases in GaNDLF repo can become more of a maintenance burden than a real benefit. But I'm leaving it to the repo owners that knows the broader context :)
) | ||
|
||
|
||
def download_from_hub( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an external pair of eyes, I'm not sure to understand why defining download_from_hub
which end-up being an alias for snapshot_download
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wauplin, i did this to stay consistent with the implementations of other cli subcommands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranayasinghcsmpl - are any specific benefits to using the new commands you have introduces versus those currently offered through HF? Would it make more sense to use them directly, and have HF as a dependency for GaNDLF going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarthakpati, the HF CLI offers more features than this implementation. However, incorporating a CLI subcommand for HF Hub might encourage more users to fully utilize it. Maybe we could explore integrating the HF CLI into our CLI subcommands if we require all its features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sarthakpati, I think you're right. Also using the HF CLI might resolve issues mentioned in ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precisely!
I am going to mark this PR as draft until this gets ready, @pranayasinghcsmpl.
@NielsRogge & @Wauplin - would it be okay to tag you to take a look at this PR once @pranayasinghcsmpl is done from his end to provide additional feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 👍🏽
2 points from me:
|
@sarthakpati, Sure I'll start working on the tests. Also I'll add the dependencies in the next commit. |
The mlcube-docker errors in actions are related to mlcommons/mlcube#360 Changing the pip version in the workflow should help - please put this change in a separate PR, @pranayasinghcsmpl Edit: done in #887 |
58d5508
to
5ad21de
Compare
@pranayasinghcsmpl - the latest changes from the base branch should make things easier. However, we are still missing updates to |
Fixes #727
Proposed Changes
Checklist
CONTRIBUTING
guide has been followed.typing
is used to provide type hints, including and not limited to usingOptional
if a variable has a pre-defined value).pip install
step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].