Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to retrieve all dataset refs in a project? #3053

Closed
akiyamasho opened this issue Nov 22, 2024 · 3 comments
Closed

Is there a way to retrieve all dataset refs in a project? #3053

akiyamasho opened this issue Nov 22, 2024 · 3 comments

Comments

@akiyamasho
Copy link

In the Datasets documentation, there is a way to publish and get by ref, but there seems to be no way to get a list of all existing dataset refs in a project.

(NOTE: I also checked the codebase and it doesn't seem to include any way to retrieve a list of dataset refs from the project
https://github.com/wandb/weave/blob/master/weave/flow/dataset.pyhttps://github.com/wandb/weave/blob/master/weave/trace/api.py)

@TeoZosa
Copy link

TeoZosa commented Nov 23, 2024

For context, @akiyamasho and I are deploying Weave over multiple projects. Grabbing datasets programmatically would make that easier vs. now having to hard-code dataset refs as they're published for each project. 🙇

@m-rgba
Copy link
Contributor

m-rgba commented Nov 23, 2024

I don't believe this is currently in the SDK - you can query the API though to retrieve the datasets like the following:

import requests

### Your Info:
team_id=""
project_id=""
wandb_token=""
###

url = "https://trace.wandb.ai/objs/query"
headers = {
  'Content-Type': 'application/json',
  'Accept': 'application/json'
}
payload ={
  "project_id": f"{team_id}/{project_id}",
  "filter": {
    "base_object_classes": [
      "Dataset"
    ],
  },
  "metadata_only": True
}
response = requests.post(
  url, headers=headers, json=payload, auth=("api", wandb_token)
)
data = response.json()
print(data)

Depending on your use case you may want to return the actual datasets by setting metadata_only to False.

You can find more information on the request object here:
https://weave-docs.wandb.ai/reference/service-api/objs-query-objs-query-post

@akiyamasho
Copy link
Author

Thank you @gtarpenning!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants