-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Integration with Hugging Face Hub #46000
Comments
We're very wary of adding dependencies and extending an already-overstuffed API. Is something like |
Agreed with the hesitancy adding this directly in pandas. For context, pandas-datareader (similar spirit public/private data sourcing feature) used to be packaged with pandas but was spun off into its own package: https://pandas-datareader.readthedocs.io/en/latest/ Given that, I think this would be best implemented as a third party package and included in the ecosystem docs. |
Pandas already supports many protocols thanks to fsspec (writing/loading to AWS, GCS, ...). If you manage to integrate the "Hugging Face Hub protocol" in fsspec, you get pandas support for free :) edit: this would take care of the transmission from a user to your hub, but the format might not be what you want (unless you are fine with a csv/json/pickle/excel version of a dataframe). |
@twoertwein that's a pretty cool idea! |
You can now find some early documentation on import pandas as pd
df = pd.read_parquet("hf://datasets/username/my_dataset/data.parquet") And automatic code snippets on HF as well: |
Hi Pandas devs and Pandas community 🤗
I am reaching out to you to see if you would be interested in an integration with the Hugging Face Hub. We have been hosting datasets on the hub for a while and are now close to 3000 public datasets not counting all the private datasets.
In both the models and datasets areas of the Hugging Face ecosystem we use the
push_to_hub
functionality to upload datasets and models to the Hub in one line. Similarly, these assets can be loaded from the Hub in a single line with theload_dataset
andfrom_pretrained
functions, respectively.We wanted to ask you whether you would be interested to add the
huggingface_hub
dependancy such that anyDataFrame
could be pushed and pulled from the hub.Here are a few use-cases where such a functionality would add value:
git-lfs
in the background)Here is how such an integration could look like:
Here is the documentation on publishing files on the Hugging Face Hub using the
huggingface_hub
library:https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub#publish-files-to-the-hub
I am curious to hear what you think about this and please let me know if I can clarify anything!
cc @osanseviero @julien-c
The text was updated successfully, but these errors were encountered: