Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export dataset to HF Hub from the UI #2520

Closed
Tracked by #31
dvsrepo opened this issue Mar 13, 2023 · 7 comments
Closed
Tracked by #31

Export dataset to HF Hub from the UI #2520

dvsrepo opened this issue Mar 13, 2023 · 7 comments
Assignees
Labels
status: stale Indicates that there is no activity on an issue or pull request type: enhancement Indicates new feature requests

Comments

@dvsrepo
Copy link
Member

dvsrepo commented Mar 13, 2023

Is your feature request related to a problem? Please describe.
I want an action in the UI to push the dataset to the Hub. We currently can push two "types" of datasets to the Hub:

  1. Prepared for training: which applies pre-processing to align the dataset format with "standard" training sets and now can also split the dataset into train and test
  2. Argilla Dataset: this pushes the full dataset with an "Argilla" format that can be used to import back the dataset using rg.load

At first, I would say the most relevant is type 2. as I think the main goal at firs is to let users keep a copy of the datasets (especially crucial for Argilla HF spaces) and be able to restore the dataset, as we do for most of our tutorials. But I'm open to additional export "types" which include the option to export type 2. as well.

Another thing to take into account, maybe for further iterations if this is a blocker is that users might want to push to a private dataset so we need to ask them for an HF token.

Describe the solution you'd like
I would like to have a button and modal box somewhere in the UI to push the dataset to the Hub. In this modal box we can require the users to input:

  1. HF organization target: where to push the dataset
  2. Dataset name: we can propose the same name as in Argilla but let users change this
  3. A query to filter the dataset to push? Not sure we need/desire this but might be a good feat.

If something went wrong when pushing the dataset, provide a clear message to the user of what went wrong.

This will require to have the logic of pushing the dataset in the backend, and add datasets as a dependency for at least the server but I don't see an issue there.

Describe alternatives you've considered
Using listeners or the Python client but this is already possible and leaves non-developers out of this feature.

Additional context
We have something along these lines in the export-to-hub (https://huggingface.co/spaces/argilla/argilla-streamlit-customs) but as I said first use case is pushing even without preparing for training.

@dvsrepo dvsrepo added the type: enhancement Indicates new feature requests label Mar 13, 2023
@dvsrepo dvsrepo changed the title Export dataset to Hugging Hub from the UI Export dataset to HF Hub from the UI Mar 13, 2023
@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Mar 24, 2023

So, ideally I was wondering if it would be possible to create a similar card where you add (title, description, buttonText and buttonLink), and potentially add required_environment_var, endpoint request_type, and payload/form.

  • it should only show when the `required_environment_var is not None.
  • there should be some definition option for the payload, being either set to the required_environment_var, and/or a form with customizable fields. (I would like to use a form to keep it as flexible as possible for potential other integrations).
  • there should be a FastAPI Server endpoint which does something with this request and payload, like exporting data to the hub, csv, or excel.
  • in this way, we could potentially also let users import data from the hub or potentially via a form upload.
  • In the future, similar configurations might be used to start training on AutoTrain.

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Jun 23, 2023
@nataliaElv
Copy link
Member

Importing and exporting directly from the UI is a popular request in our community, so I'm removing this from stale.

@nataliaElv nataliaElv removed the status: stale Indicates that there is no activity on an issue or pull request label Jun 23, 2023
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Oct 18, 2023
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

@nataliaElv nataliaElv reopened this Nov 17, 2023
@github-actions github-actions bot removed the status: stale Indicates that there is no activity on an issue or pull request label Nov 18, 2023
Copy link

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Jul 28, 2024
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: stale Indicates that there is no activity on an issue or pull request type: enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

4 participants