Added huggingface integration - DEPRECATED #880

pranayasinghcsmpl · 2024-06-04T11:13:06Z

Fixes #727

Proposed Changes

Add a new option for huggingface in the cli
Add model upload functionality to a huggingface repo
Add model download functionality from a huggingface repo

Checklist

github-actions · 2024-06-04T11:13:21Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sarthakpati · 2024-06-04T15:07:59Z

Thanks for the PR! I have changed the base to the "new-apis" branch (instead of master), and some conflicts need to be resolved. can you take a look?

sarthakpati · 2024-06-05T14:35:51Z

Waiting to hear from @NielsRogge about this PR. Also, is there any way to put unit tests for this?

NielsRogge · 2024-06-05T19:58:32Z

GANDLF/cli/huggingface_hub_handler.py

+from typing import List, Union
+
+
+def push_to_model_hub(


So as I understand it you'll provide a CLI tool to push and pull models from the 🤗 hub. cc @Wauplin wondering if it's possible to programmatically add tags?

By default, this won't track any download numbers for the various models or add specific tags which ease model discoverability (like "image-classification" etc.). For that we can open a PR similar to this one: huggingface/huggingface.js#669, where you specify an extension which tracks downloads each time a file with the specified extension is downloaded.

Next to that, one can add code snippets when users click the "use in GaNDLF" button which you can add, as shown in this guide: https://huggingface.co/docs/hub/en/models-adding-libraries.

@pranayasinghcsmpl @sarthakpati do you have an example of GaNDLF repo pushed on the Hub already? Agree with @NielsRogge, having it as an "officially compatible library" on the Hub would be awesome! It enables download counts + better discoverability + possible to have code snippets.

wondering if it's possible to programmatically add tags?

Of course it is! 🤗 Check out the ModelCard guide.

I have uploaded a test model made using gandlf. You can find it here-https://huggingface.co/pranayasingh/test_upload1/tree/main

Thanks for the prompt responses @NielsRogge, @Wauplin and @pranayasinghcsmpl!

So I see this uploads various checkpoints to a single repository. Typically we recommend to create a single repository per model checkpoint, see a recent example of YOLOv10 where ONNX checkpoints are stored in separate repositories. However I think it makes sense to keep the initial and latest PyTorch checkpoints (along with the best) in a single repository.

For download stats to work, we could add a "gandlf" tag to the model cards, along with a PR like this one, where you can customize which files should contribute to incrementing the download counter. See also this guide: https://huggingface.co/docs/hub/en/models-download-stats.

This will allow you to track all models that people trained using GaNDLF (and see which ones are downloaded the most).

So, from an organizational and requirements perspective, we want to enable anyone training models using GaNDLF to be able to upload to HF using their own org information, and not specifically under GaNDLF itself. For example, if organization A is training a model X-1, they should be able to upload this model (including the hashes of the code, and whatever else is needed to "deploy" the model) to huggingface.co/A/X-1 (or whatever else that makes sense).

The main reason behind this is that MLCommons does not want to be a curator of models, and only wants to enable other organizations to train their own models.

That's exactly what the huggingface-cli/huggingface_hub are for! The suggestion from @NielsRogge is to add a gandlf tag to the model card metadata from generating it. This way all repos pushed to the Hub with GaNDLF will be tagged as such and searchable. It does not require to manually curate those models yourself. The download count will happen per repo, allowing you and your users to get an idea of the traction you get on the Hub.

Wauplin

Thanks for the ping @NielsRogge :) I'm maintainer of the huggingface_hub library and sometimes help with integrations 👋

Note that currently it seems that the 2 added commands are very similar to

huggingface-cli upload <repo-id> <path-to-local-folder>

and

huggingface-cli download <repo-id>

without adding new features to it. Those two commands offers more flexibility to users and are officially supported and maintained. On the long run, having 2 aliases in GaNDLF repo can become more of a maintenance burden than a real benefit. But I'm leaving it to the repo owners that knows the broader context :)

GANDLF/cli/huggingface_hub_handler.py

GANDLF/entrypoints/subcommands.py

Wauplin · 2024-06-06T08:51:40Z

GANDLF/cli/huggingface_hub_handler.py

+    )
+
+
+def download_from_hub(


From an external pair of eyes, I'm not sure to understand why defining download_from_hub which end-up being an alias for snapshot_download?

@Wauplin, i did this to stay consistent with the implementations of other cli subcommands.

Fine with it!

@pranayasinghcsmpl - are any specific benefits to using the new commands you have introduces versus those currently offered through HF? Would it make more sense to use them directly, and have HF as a dependency for GaNDLF going forward?

@sarthakpati, the HF CLI offers more features than this implementation. However, incorporating a CLI subcommand for HF Hub might encourage more users to fully utilize it. Maybe we could explore integrating the HF CLI into our CLI subcommands if we require all its features.

@sarthakpati, I think you're right. Also using the HF CLI might resolve issues mentioned in ref.

Precisely!

I am going to mark this PR as draft until this gets ready, @pranayasinghcsmpl.

@NielsRogge & @Wauplin - would it be okay to tag you to take a look at this PR once @pranayasinghcsmpl is done from his end to provide additional feedback?

Yes sounds good!

Of course! 🤗

Thank you! 👍🏽

sarthakpati · 2024-06-19T13:40:45Z

2 points from me:

Tests need to be added. See the entrypoint tests here.
I am guessing hf needs to be added as a dependency to setup.py.

pranayasinghcsmpl · 2024-06-20T05:35:14Z

@sarthakpati, Sure I'll start working on the tests. Also I'll add the dependencies in the next commit.

sarthakpati · 2024-06-27T15:25:03Z

The mlcube-docker errors in actions are related to mlcommons/mlcube#360

Changing the pip version in the workflow should help - please put this change in a separate PR, @pranayasinghcsmpl

Edit: done in #887

sarthakpati · 2024-07-01T20:28:27Z

@pranayasinghcsmpl - the latest changes from the base branch should make things easier. However, we are still missing updates to setup.py.

added huggingface cli option

58f8638

pranayasinghcsmpl requested a review from a team as a code owner June 4, 2024 11:13

sarthakpati changed the base branch from master to new-apis_v0.1.0-dev June 4, 2024 15:06

sarthakpati mentioned this pull request Jun 4, 2024

Make sure Hugging Face downloads work, better discoverability #878

Closed

pranayasinghcsmpl added 2 commits June 5, 2024 15:11

Merge branch 'new-apis_v0.1.0-dev' into hf_cli2

cb07ad6

added clack linting

038c52f

NielsRogge reviewed Jun 5, 2024

View reviewed changes

Wauplin reviewed Jun 6, 2024

View reviewed changes

removed print statements

ad200ae

sarthakpati marked this pull request as draft June 6, 2024 19:36

added huggingface cli

5ad21de

pranayasinghcsmpl force-pushed the hf_cli2 branch 2 times, most recently from 58d5508 to 5ad21de Compare July 1, 2024 10:01

pranayasinghcsmpl and others added 3 commits July 1, 2024 16:37

changed pip version

7fdc12c

Merge branch 'new-apis_v0.1.0-dev' into hf_cli2

2317daa

Merge branch 'new-apis_v0.1.0-dev' into hf_cli2

0268654

pranayasinghcsmpl added 2 commits July 2, 2024 10:18

updated setup.py

939c354

added black linting

58abb29

sarthakpati mentioned this pull request Jul 2, 2024

Added huggingface integration #892

Closed

10 tasks

sarthakpati changed the title ~~Added huggingface integration~~ Added huggingface integration - DEPRECATED Jul 2, 2024

Merge branch 'new-apis_v0.1.0-dev' into hf_cli2

5aaa286

sarthakpati deleted the branch mlcommons:new-apis_v0.1.0-dev July 31, 2024 14:21

sarthakpati closed this Jul 31, 2024

pranayasinghcsmpl deleted the hf_cli2 branch August 14, 2024 06:47

pranayasinghcsmpl restored the hf_cli2 branch August 14, 2024 06:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added huggingface integration - DEPRECATED #880

Added huggingface integration - DEPRECATED #880

pranayasinghcsmpl commented Jun 4, 2024

github-actions bot commented Jun 4, 2024 •

edited

Loading

sarthakpati commented Jun 4, 2024

sarthakpati commented Jun 5, 2024

NielsRogge Jun 5, 2024

Wauplin Jun 6, 2024

pranayasinghcsmpl Jun 6, 2024

sarthakpati Jun 6, 2024

NielsRogge Jun 6, 2024 •

edited

Loading

sarthakpati Jun 6, 2024

Wauplin Jun 6, 2024

Wauplin left a comment

Wauplin Jun 6, 2024

pranayasinghcsmpl Jun 6, 2024

Wauplin Jun 6, 2024

sarthakpati Jun 6, 2024

pranayasinghcsmpl Jun 6, 2024

pranayasinghcsmpl Jun 6, 2024

sarthakpati Jun 6, 2024

NielsRogge Jun 7, 2024

Wauplin Jun 7, 2024

sarthakpati Jun 7, 2024

sarthakpati commented Jun 19, 2024

pranayasinghcsmpl commented Jun 20, 2024

sarthakpati commented Jun 27, 2024 •

edited

Loading

sarthakpati commented Jul 1, 2024

		)


		def download_from_hub(

Added huggingface integration - DEPRECATED #880

Added huggingface integration - DEPRECATED #880

Conversation

pranayasinghcsmpl commented Jun 4, 2024

Proposed Changes

Checklist

github-actions bot commented Jun 4, 2024 • edited Loading

sarthakpati commented Jun 4, 2024

sarthakpati commented Jun 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NielsRogge Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wauplin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarthakpati commented Jun 19, 2024

pranayasinghcsmpl commented Jun 20, 2024

sarthakpati commented Jun 27, 2024 • edited Loading

sarthakpati commented Jul 1, 2024

github-actions bot commented Jun 4, 2024 •

edited

Loading

NielsRogge Jun 6, 2024 •

edited

Loading

sarthakpati commented Jun 27, 2024 •

edited

Loading