Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register annex key on dataverse as file-metadata for lookup #188

Closed
mih opened this issue Mar 6, 2023 · 3 comments
Closed

Register annex key on dataverse as file-metadata for lookup #188

mih opened this issue Mar 6, 2023 · 3 comments

Comments

@mih
Copy link
Member

mih commented Mar 6, 2023

I have asked the following things in dataverse chat. I think this would be the best way to make EXPORT mode work reliably, and not having to depend on path matching

Hey, I am looking at the possibilities for file-metadata. My use case is adding a non-dataverse file identifier as an additional metadata field in order to enable unambiguous file-lookups (because we need to mangle file/path name info on upload due to name restrictions). Looking at the docs for adding files or updating file metadata I see possibilities for the fields "description", "provFreeform", "categories", "restrict".

Q1: Is there a way to add a custom file and set that? I am specifically asking for the possibilities of any stock dataverse deployment, not for potential customizations that could be done to a particular instance.

Q2: If Q1 is negative, it seems prov-metadata could fit (we essentially derive the dataverse representation of a file from its original (name)). Looking at the docs for prov-upload, I get the impression that provstore terminology and structure is desired. However, the endpoint is called "freeform". Does that mean that no particular structure is mandatory?

Maybe one more clarification: Even if prov is the preferred possibility to express such things, ultimately it is about lookup. The use case requires to be able to determine whether a particular file identity is present in a dataverse dataset or not, and if so, to be able to download its content.

mih added a commit to mih/datalad-dataverse that referenced this issue Mar 6, 2023
This fix is needed after ripping out the special casing for XDLRA
keys. It intentionally only addresses the non-export case, to make
clear what is important for which mode.

This fix was developed by @christian-monch as part of
#1

It fixes the situation where a git-clone from dataverse cannot know
the fileId of a repository export (datalad-annex git remote helper)
that, by definition, needs to be packed up and uploaded _before_
a fileId can be known and registered in the repository that is
already finalized and uploaded.

In order to break this chicken-and-egg-problem, `_remove_file()` now
uniformly falls back on determining the fileId via path matching.

But see datalad#189 and datalad#188 for related aspects of this general issue.
@mih
Copy link
Member Author

mih commented Mar 6, 2023

We could deposit side-car files with metadata on dataverse: https://guides.dataverse.org/en/5.13/developers/aux-file-support.html

@mih
Copy link
Member Author

mih commented Mar 6, 2023

TODO here is to check if provFreeform shows up in the listing returned by https://pydataverse.readthedocs.io/en/latest/reference.html#pyDataverse.api.NativeApi.get_dataset and report back to dataverse chat.

@mih
Copy link
Member Author

mih commented Mar 13, 2023

I am closing this, together with #201, because after some explorations and back-and-forth with upstream no superior alternative materialized.

The present implementation continues to store dataverse fileIds in the local annex key STATE and will attempt to obtain such an ID based on matching a (mangled) remote file path against a listing of files in the latest (or across all) dataverse dataset versions.

@mih mih closed this as completed Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant