Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example of a data provisioning specification #174

Open
mih opened this issue May 6, 2024 · 0 comments · May be fixed by #175
Open

Example of a data provisioning specification #174

mih opened this issue May 6, 2024 · 0 comments · May be fixed by #175
Assignees

Comments

@mih
Copy link
Contributor

mih commented May 6, 2024

Background: datalad/datalad-remake#12

The twist here is the focus on (partial) data access to information in dataset, rather than a (full) description of a dataset. It should enable precise instructions what to obtain, allow for smart decisions on how to obtain it, and all that with a lean data specification.

Decision making: I can declare a download_url for a dataset and use that, but if it is 1gb and I only need a 1mb file from it, a full download is not smart. So that dataset may be a datalad dataset, and we may be able to clone it, and may be able to get that individual file separately.

This means that we need to be able to declare a clone_url in a way that is recognizable. And we should not start declaring additional attributes like clone_url without thinking real hard. Because in no time we will have 1k additional attributes for each special case.

I am thinking to go via QualifiedAccess and have a DataService that is some kind of Git service...

I think this best here is to try to write done a small, clean example of a record, and then get it to be compliant with the schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant