-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support in datasets API for persistent id (doi) #1837
Comments
From #1717: However, the doi naming Scheme contains slashes, and so does not play well with REST api. We could introduce a Scheme where the doi id is escaped or the slashes are replaces with dashes, or maybe base64-ed. Not sure any of these is a good idea - at least, it's not a very intuitive one. We could offer another endpoint that converts global ids to local ones. |
I definitely have a need to figure out internal dataset ID numbers when working with APIs. For a long times I've been doing this at https://github.com/IQSS/dataverse/blob/master/scripts/search/assumptions export FIRST_FINCH_DATASET_ID= And more recently I've been using an undocumented feature of the Search API to expose database IDs (looking them up by globalId/persistentId/DOI) but this requires turning on an experimental feature I haven't fully implemented at #1299 - Anyway, my point is that this is an important endpoint for sure. /cc @rliebz |
Hi, This is a blocker issue for my project, because without the ids I can't perform metadata updates, and I can't get the ids because the get_contents() call takes too long to complete. I will give a try on some of the workarounds described here, so thank you to folks who posted those! One possible suggestion for a simple solution here would be to URL-escape the DOIs and then use them in the REST format as usual, so you'd get something like https://dataverse.harvard.edu/api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/versions/:latest Anyway, if anyone has any additional suggestions for how to find the IDs or how to perform metadata updates using only DOI, I would love to hear them! Thanks, Garth |
Disabled because we still need a way to find a dataset id based on a DOI: IQSS/dataverse#1837
Right, get_contents is a method @garthg is calling from https://github.com/IQSS/dataverse-client-python and the corresponding issue about this slowness on the API side is #2122 |
Without this functionality of being able to look up datasets via DOI, the native "datasets" API ( http://guides.dataverse.org/en/4.0/api/native-api.html#datasets ) is way less useful. An example use case today from @aawinburn was "How do I get the file ID this PDF in my unpublished dataset?" Good question and #1795 was supposed to be the answer but you have to know the database id of the dataset. I've also answered this question at https://groups.google.com/d/msg/dataverse-community/fFrJi7NnBus/JUdOlOmhtQgJ encouraging people (for now) to get a list of file IDs via the SWORD statement ( http://guides.dataverse.org/en/latest/api/sword.html#display-a-dataset-statement ) mostly because SWORD operates via DOIs. See also infsci2711/MultiDBs-FilesAPIs2DBs-WebClient#6 |
As I just mentioned in a thread on the Dataverse Google Group, #2416 was opened recently which is about how hard it is to discover file IDs from the GUI. In addition #2438 is a new issue about what persistent IDs we could/should use for files. |
Developers of the Dataverse client for Python would like the ability to use DOIs (not just database IDs) to operate on the native API. IQSS/dataverse-client-python#28 has some discussion on this. |
This would also be useful for the R client. |
I should elaborate: there's a tension between the Native API's ability to get versions of a dataset (but only by dataset ID) and the SWORD API's ability to retrieve a dataset by DOI. It would be nice for these to be able to play together, particularly given that the Native API doesn't require an API key to view the contents of a public dataset, but the SWORD API does. |
This is a blocker as well for my project, and I do not see what the reason is that the search API does not expose the dataset ID's by default. As it turns out, several dataverse installations I've tested do provide the id's when the 'show_entity_ids=true' parameter is passed in the URL. However, this feature is undocumented in the API docs. |
See also #1717 which spawned this ticket. I think @michbarsinai @scolapasta and I need to get together and decide on an approach to try. Options include:
@garthg means well when he suggests escaping the DOI in the URL like Another approach would be to put the DOI at the end of the URL, like we do with SWORD ( Whatever we decide on we would, of course, continue to support the old way for a while. And I think we should continue to support looking up a dataset by id, even if we use a query parameter ( |
Another option is to have a DOI endpoint. This will also allow to point to different types of items from a DOI, which is, I think, one of the main goals of the DOI project. Something along the lines of:
Not sure how to deal with versions there - we could append them ( |
@RinkeHoekstra In case it's helpful, I wrote some Python that does cached lookup of dataverse IDs to make it slightly easier to manage this issue. Some code is on pastebin at: http://pastebin.com/ipdhEPXA . Obviously that's not a substitute for proper implementation through the API, but I wanted to pass it along just in case it's helpful. |
@garthg thanks! I found similar code somewhere on Github and now have a workaround. A separate issue is that the search API is rather picky as to how the DOI is quoted. For instance Python requests always quotes the query parameters in a GET request, but the API then searches for the quoted string rather than unquoting it first. But that is a separate issue ... |
URL scheme for external persistent ids:
|
…Also updated the native API guide (#1837)
@scolapasta this is one of the issues I mentioned this morning for which code has been pushed to a branch made from 4.2.3 and a decision should be made whether to merge it in to the 4.2.3 branch or not. |
Most recently, this issue is affecting this user:
I'm replying with workarounds but really we should just fix this issue. @michbarsinai implemented a fix at #1837 (comment) and it has since become pull request #2893. |
Issue #1837 implemented and ready to be merged.
Tested and merged. |
You can see the fix in production at https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI (That's the dataset @monogan said we could test with at IQSS/dataverse-client-r#2 (comment) .) Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets |
We currently have the APis as using the db id, but we need to support persistent Id.
The text was updated successfully, but these errors were encountered: