Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate APIs for various repositories #58

Closed
briri opened this issue Sep 22, 2023 · 10 comments
Closed

Investigate APIs for various repositories #58

briri opened this issue Sep 22, 2023 · 10 comments
Assignees

Comments

@briri
Copy link
Collaborator

briri commented Sep 22, 2023

Here’s the list. We don’t need it to be exhaustive; just some examples would be helpful.

Particularly thinking about:

  • Are they sharing the data we’d need to connect to DMP-IDs? In general yes, they all include DOIs for the object
  • Is the format something we could work with? Yes, but some are more difficult in their current state than others
  • Are there additional elements they could share that would be helpful? ORCIDs and RORs would be most helpful along with the ability to search by those 2 identifiers

APIs:

@briri briri self-assigned this Sep 22, 2023
@briri
Copy link
Collaborator Author

briri commented Sep 22, 2023

Mendeley analysis:

Mendeley includes ORCIDs and RORs 🎉

The API does not appear to support filtering/searching by ORCID and ROR though, it only allows searching by internal Mendeley ids (institution and profile)

The API also only supports discovery of public datasets. How would we discover metadata about private outputs?

The API uses OAuth2 (client_credentials here) so you must create an account with Elsevier and then add your application in the dev tools section.

# AUTH
# ---------------------------------
curl -X POST -H "Content-Type: application/x-www-form-urlencoded" -u[id]:[secret]-d "grant_type=client_credentials&scope=all" https://api.mendeley.com/oauth/token

# SEARCH
# ---------------------------------
curl -H 'Authorization: Bearer [token]' "https://api.mendeley.com/datasets?type=software&limit=2"

# Results
# ---------------------------------
{
  "results": [
    {
      "id":"000000000",
      "doi":{
        "id":"10.12345/000000000.3",
        "status":"allocated",
        "prefix":"10.12345"
      },
      "name":"Example name of a the output",
      "description":"Description of the output that deals with Paleomagnetism",
      "version":3,
      "contributors":[
        {
          "profile_id":"12345",
          "first_name":"Mickey",
          "last_name":"Mouse"
        },{
          "profile_id":"123456",
          "first_name":"Donald",
          "last_name":"Duck",
          "orcid_id":"0000-0000-0000-0000"
        },{
          "first_name":"Minnie",
          "last_name":"Mouse"
        }
      ],
      "versions":[
        {
          "version":3,
          "available":true,
          "publish_date":"2022-06-27T15:43:33.294Z"
        },{
          "version":2,
          "available":true,
          "publish_date":"2022-05-30T07:01:56.404Z"
        },{
          "version":1,
          "available":true,
          "publish_date":"2021-12-23T15:21:20.402Z"
        }
      ],
      "articles":[],
      "categories":[
        {
          "id":"data.elsevier.com/vocabulary/OmniScience/Concept-210636270",
          "label":"Paleomagnetism"
        }
      ],
      "institutions":[
        {
          "id":"999999999",
          "name":"Universidad Nacional Autonoma de Mexico"
        },{
          "id":"88888888888",
          "name":"Universidad de Sonora",
          "ror_id":"https://ror.org/00c32gy34"
        }
      ],
      "available":true,
      "size":0,
      "owner":{
        "profile_id":"12345",
        "first_name":"Mickey",
        "last_name":"Mouse"
      },
      "channel":"WEB",
      "owner_id":"12345",
      "publish_date":"2022-06-27T15:43:33.294Z",
      "data_licence":{
        "id":"01d9c749-3c4d-4431-9df3-620b2dcfe144",
        "description":"You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.",
        "url":"http://creativecommons.org/licenses/by/4.0",
        "category":"Creative",
        "short_name":"CC BY 4.0",
        "full_name":"Creative Commons Attribution 4.0 International"
      },
      "related_links":[],
      "funders":[],
      "customer_id":"555555555",
      "modified_on":"2022-06-26T01:28:14.149Z",
      "confidential":false,
      "links":{
        "view":"https://data.mendeley.com/datasets/000000000"
      },
      "repository":{
        "id":"MENDELEY_DATA",
        "name":"Mendeley Data"
      }
    }
  ]
}

@briri
Copy link
Collaborator Author

briri commented Sep 22, 2023

OSF is a bit opaque. We need to review the API docs in more detail since it uses a lot of graphDB language like GET /nodes/.

Its possible though to have multiple entry points. For example searching for preprints by title and then walking to the list of contributors OR starting with a search for contributor and then walking their preprints and nodes. Either way it will require multiple API calls and some of our own algorithms to verify matches.

Like Mendeley, it would be super useful if you could search by ROR or ORCID although I am not sure if their metadata contains those identifiers.

Need to use Postman or something similar since you need to obtain a OAuth2 Code before fetching a token. Didn't see a way to just use client_credentials.

Example of result from the preprints API: https://api.test.osf.io/v2/preprints. The available filters do not allow searching for a contributor name or institution. We would need to fetch results and then do our own filtering.

{
  "data": [
      {
          "id": "12345",
          "type": "preprints",
          "attributes": {
              "date_created": "2023-09-22T15:35:53.459096",
              "date_modified": "2023-09-22T15:37:17.496732",
              "date_published": "2023-09-22T15:37:16.381453",
              "original_publication_date": null,
              "doi": null,
              "title": "Test edit",
              "description": "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
              "is_published": true,
              "is_preprint_orphan": false,
              "license_record": {
                  "copyright_holders": [
                      ""
                  ],
                  "year": "2023"
              },
              "tags": [],
              "preprint_doi_created": null,
              "date_withdrawn": null,
              "current_user_permissions": [],
              "public": true,
              "reviews_state": "pending",
              "date_last_transitioned": "2023-09-22T15:37:16.381453",
              "has_coi": false,
              "conflict_of_interest_statement": null,
              "has_data_links": "no",
              "why_no_data": null,
              "data_links": [],
              "has_prereg_links": "no",
              "why_no_prereg": null,
              "prereg_links": [],
              "prereg_link_info": "",
              "subjects": [
                  [
                      {
                          "id": "59552881da3e240081ba3203",
                          "text": "Arts and Humanities"
                      }
                  ]
              ]
          },
          "relationships": {
              "contributors": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/contributors/",
                          "meta": {}
                      }
                  }
              },
              "bibliographic_contributors": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/bibliographic_contributors/",
                          "meta": {}
                      }
                  }
              },
              "citation": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/citation/",
                          "meta": {}
                      }
                  },
                  "data": {
                      "id": "8n27h",
                      "type": "preprints"
                  }
              },
              "identifiers": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/identifiers/",
                          "meta": {}
                      }
                  }
              },
              "node": {
                  "links": {
                      "self": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/relationships/node/",
                          "meta": {}
                      }
                  }
              },
              "license": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/licenses/000000000000000000/",
                          "meta": {}
                      }
                  },
                  "data": {
                      "id": "000000000000000000",
                      "type": "licenses"
                  }
              },
              "provider": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/providers/preprints/osf/",
                          "meta": {}
                      }
                  },
                  "data": {
                      "id": "osf",
                      "type": "preprint-providers"
                  }
              },
              "files": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/files/",
                          "meta": {}
                      }
                  }
              },
              "primary_file": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/files/000000000000000000/",
                          "meta": {}
                      }
                  },
                  "data": {
                      "id": "650db45d3cbde5000ad3eca2",
                      "type": "files"
                  }
              },
              "review_actions": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/review_actions/",
                          "meta": {}
                      }
                  }
              },
              "requests": {
                  "links": {
                      "related": {
                          "href": "https://api.test.osf.io/v2/preprints/12345/requests/",
                          "meta": {}
                      }
                  }
              }
          },
          "links": {
              "self": "https://api.test.osf.io/v2/preprints/12345/",
              "html": "https://test.osf.io/12345/",
              "preprint_doi": "https://doi.org/10.12345/ABC123.io/12345"
          }
      }
    ]
  }

@briri
Copy link
Collaborator Author

briri commented Sep 22, 2023

The Dataverse Search API allows access to published datasets.

Dataverse is an open source codebase and there are many installations out in the wild (e.g. Harvard), so we will likely need to have a table to store the target URLs and the searchable fields see this issue discussing that.

I am not seeing ORCID or ROR identifiers in the output, so we would need to do some messy text matching on names.

# EXAMPLE QUERIES:
# -------------------------------------
curl "https://demo.dataverse.org/api/search?q=*&type=dataset"
curl "https://demo.dataverse.org/api/search?q=trees"

# Example result
# -------------------------------------
{
  "status":"OK",
  "data":{
    "q":"*",
    "total_count":2792,
    "start":0,
    "spelling_alternatives":{},
    "items":[
      {
        "name":"test dataset #2",
        "type":"dataset",
        "url":"https://doi.org/10.12345/ABC/ZYXWVUT",
        "global_id":"doi:10.12345/ABC/ZYXWVUT",
        "description":"test creating dataset",
        "published_at":"2022-09-01T16:05:03Z",
        "publisher":"ABC",
        "citationHtml":"DOE, JANE, 2022, \"test dataset #2\", <a href=\"https://doi.org/10.12345/ABC/ZYXWVUT\" target=\"_blank\">https://doi.org/10.12345/ABC/ZYXWVUT</a>, Demo Dataverse, V1",
        "identifier_of_dataverse":"ABC",
        "name_of_dataverse":"ABC",
        "citation":"DOE, JANE, 2022, \"test dataset #2\", https://doi.org/10.12345/ABC/ZYXWVUT, Demo Dataverse, V1",
        "storageIdentifier":"s3://10.12345/ABC/ZYXWVUT",
        "subjects":["Arts and Humanities"],
        "fileCount":0,
        "versionId":220004,
        "versionState":"RELEASED",
        "majorVersion":1,
        "minorVersion":0,
        "createdAt":"2022-09-01T16:04:43Z",
        "updatedAt":"2022-09-01T16:05:03Z",
        "contacts":[{
          "name":"DOE, JANE",
          "affiliation":"University of California, Los Angeles"
        }],
        "publications":[{}],
        "authors":["DOE, JANE"]
      }
    ]
  }
}

@briri
Copy link
Collaborator Author

briri commented Sep 22, 2023

Zenodo allows searching for 'published' records.

They have some others in beta currently that allow searching for funders, grants, communities (e.g 'dryad') and licenses.

Their funder list uses Crossref funder DOIs currently.

The grants API returns info about a grant, but I'm not seeing any connection to the awardee

curl "https://zenodo.org/api/grants/"

{
  "created":"2023-04-12T13:47:31.657945+00:00",
  "id":"10.13039/501100000780::101103476",
  "links":{"self":"https://zenodo.org/api/grants/10.13039/501100000780::101103476"},
  "metadata":{
    "$schema":"http://zenodo.org/schemas/grants/grant-v1.0.0.json",
    "acronym":"ERA TALENT",
    "code":"101103476",
    "enddate":"2026-02-28",
    "funder":{
      "$schema":"http://zenodo.org/schemas/funders/funder-v1.0.0.json",
      "acronyms":[],
      "country":"",
      "doi":"10.13039/501100000780",
      "identifiers":{"oaf":"ec__________::EC"},
      "name":"European Commission",
      "parent":{},
      "remote_created":"2011-06-08T16:00:03.000000",
      "remote_modified":"2019-07-19T16:49:12.000000",
      "subtype":"national government",
      "type":"gov"
    },
    "identifiers":{
      "eurepo":"info:eu-repo/grantAgreement/EC/HE/101103476/",
      "oaf":"corda_____he::becfdc0f5223e4577c583857048ffcf2",
      "purl":null
    },
    "internal_id":"10.13039/501100000780::101103476",
    "legacy_id":"10.13039/501100000780::101103476",
    "program":"HE",
    "remote_modified":"2021-04-27",
    "startdate":"2023-03-01",
    "suggest":{
      "contexts":{
        "funder":["10.13039/501100000780"]
      },
      "input":["101103476","ERA TALENT","ERA TALENT Platform for career development of researchers in Europe"]
    },
    "title":"ERA TALENT Platform for career development of researchers in Europe",
    "url":""
  },
  "updated":"2023-04-12T13:47:31.657960+00:00"
}

The records (aka datasets) are pretty good (at least the ones provided by Dryad for NIH). The grant/award id is buried in the 'notes' field, but may be searchable/filterable.

Records allow for ORCID 🎉 but I am not seeing RORs.

Here is an example:

# Records that were funded by NIH
curl "https://zenodo.org/api/records/?q=grants.funder.doi:doi.org%2F10.13039%2F100000002"

{
  "conceptrecid":"1234567",
  "created":"2000-01-01T13:14:15.077983+00:00",
  "doi":"10.12345/ABC123",
  "files":[{
    "bucket":"0000000000000000000000",
    "checksum":"md5:abcdefghijklmnop",
    "key":"Biological_data.fcs",
    "links":{"self":"https://zenodo.org/api/files/00000000000000/Biological_data.fcs"},
    "size":22599873,
    "type":"fcs"
  }],
  "id":11111111,
  "links":{
    "badge":"https://zenodo.org/badge/doi/10.12345/ABC123.svg",
    "bucket":"https://zenodo.org/api/files/000000000000000000000",
    "doi":"https://doi.org/10.12345/ABC123",
    "html":"https://zenodo.org/record/1234567",
    "latest":"https://zenodo.org/api/records/1234567",
    "latest_html":"https://zenodo.org/record/1234567",
    "self":"https://zenodo.org/api/records/1234567"
  },
  "metadata":{
    "access_right":"open",
    "access_right_category":"success",
    "communities":[{"id":"dryad"}],
    "creators":[{
      "affiliation":"Example University",
      "name":"Doe, Jane",
      "orcid":"0000-0000-0000-0000"
    }],
    "description":"<p>Research data about biological stuff</p>",
    "doi":"12345/ABC123",
    "keywords":["human immunodeficiency virus (HIV)","cell death","apoptosis","pyroptosis","lymphoid tissues"],
    "license":{"id":"CC0-1.0"},
    "method":"<p>mass cytometry; single-cell RNA-seq</p>\n<p>mass cytometry data has been pre-gated on live singlets</p>",
    "notes":"<p>Funding provided by: National Institutes of Health<br>Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000002<br>Award Number: A12 0A123456</p>",
    "publication_date":"2000-01-01",
    "related_identifiers":[{
      "identifier":"10.98765/journal.2000.99999","relation":"isCitedBy","scheme":"doi"
    }],
    "relations":{
      "version":[{
        "count":1,
        "index":0,
        "is_last":true,
        "last_child":{"pid_type":"recid","pid_value":"8888888"},
        "parent":{"pid_type":"recid","pid_value":"7777777"}
      }]
    },
    "resource_type":{
      "title":"Dataset",
      "type":"dataset"
    },
    "title":"Data from: my research about biological stuff."
  },
  "owners":[00000],
  "revision":2,
  "stats":{
    "downloads":3.0,
    "unique_downloads":3.0,
    "unique_views":7.0,
    "version_downloads":3.0,
    "version_unique_downloads":3.0,
    "version_unique_views":7.0,
    "version_views":7.0,
    "version_volume":581366663.0,
    "views":7.0,
    "volume":581366663.0
  },
  "updated":"2000-01-01T14:15:16.325605+00:00"
}

@briri
Copy link
Collaborator Author

briri commented Sep 22, 2023

Dryad allows searching for 'published' datasets and allows you to filter the results by author affiliation.

Dryad of course contains RORs and ORCIDs 🎉

There does not seem to be a way to search by ORCID.

For example:

curl -X 'GET' \
  'https://datadryad.org/api/v2/search?q=molecular&affiliation=https%3A%2F%2Fror.org%2F01an7q238' \
  -H 'accept: application/json'

# Result
# ----------------------
{
  "_links": {
    "self": {
      "href": "/api/v2/datasets/doi%3A10.12345%2Fdryad.abc12"
    },
    "stash:versions": {
      "href": "/api/v2/datasets/doi%3A10.12345%2Fdryad.abc12/versions"
    },
    "stash:version": {
      "href": "/api/v2/versions/1234"
    },
    "stash:download": {
      "href": "/api/v2/datasets/doi%3A10.12345%2Fdryad.abc12/download"
    },
    "curies": [
      {
        "name": "stash",
        "href": "https://github.com/CDL-Dryad/stash/blob/main/stash_api/link-relations.md#{rel}",
        "templated": "true"
      }
    ]
  },
  "identifier": "doi:10.12345%2Fdryad.abc12",
  "id": 12345,
  "storageSize": 1032385573,
  "relatedPublicationISSN": "1234-123X",
  "title": "Data from: Measuring ectoplasm amounts left by Slimer",
  "authors": [
    {
      "firstName": "Jane C.",
      "lastName": "Doe",
      "affiliation": "University of Minnesota",
      "affiliationROR": "https://ror.org/017zqws13"
    },
    {
      "firstName": "John Jacob",
      "lastName": "Jingle Hymer-Smith",
      "email": "[email protected]",
      "affiliation": "University of California, Berkeley",
      "affiliationROR": "https://ror.org/01an7q238",
      "orcid": "0000-0000-0000-0000"
    }
  ],
  "abstract": "Dispersal plays a prominent role in how gross it feels to be slimed.",
  "keywords": [
    "dispersal limitation",
    "Metacommunities"
  ],
  "usageNotes": "Use with caution!",
  "relatedWorks": [
    {
      "relationship": "primary_article",
      "identifierType": "DOI",
      "identifier": "https://doi.org/10.1234/j.4567zyx.2000.9876.a"
    }
  ],
  "versionNumber": 1,
  "versionStatus": "submitted",
  "curationStatus": "Published",
  "versionChanges": "none",
  "publicationDate": "2000-01-01",
  "lastModificationDate": "2000-01-01",
  "visibility": "public",
  "sharingLink": "https://datadryad.org/stash/share/0000000000aaaaaaaaaaaaa",
  "userId": 12345,
  "license": "https://creativecommons.org/publicdomain/zero/1.0/"
}

@briri
Copy link
Collaborator Author

briri commented Oct 2, 2023

Review complete. Leaving this one open though so we can reference when we decide to implement integrations for these APIs

@pdurbin
Copy link

pdurbin commented Oct 7, 2023

I am not seeing ORCID or ROR identifiers in the output

@briri hi, thanks for kicking the tires on the Dataverse Search API! 🎉

It's not very intuitive but you can get ORCIDs out of the Search API. If you pass metadata_fields=citation:author, for example (docs), you can get more details about that field (author). Below is an example where you can see an ORCID. We need to make this easier, obviously. 😅

We have ROR support for our author affiliation field but haven't rolled it out to our demo server yet. You can track this here:

For a list of searchable fields, this might help: https://demo.dataverse.org/api/metadatablocks/citation

Please feel free to ask questions at https://chat.dataverse.org or https://groups.google.com/g/dataverse-community

curl 'https://demo.dataverse.org/api/search?q=F8QXRU&metadata_fields=citation:author'

{
  "status": "OK",
  "data": {
    "q": "F8QXRU",
    "total_count": 1,
    "start": 0,
    "spelling_alternatives": {},
    "items": [
      {
        "name": "The History of Coffee",
        "type": "dataset",
        "url": "https://doi.org/10.70122/FK2/F8QXRU",
        "global_id": "doi:10.70122/FK2/F8QXRU",
        "description": "Description text",
        "published_at": "2023-01-12T19:31:16Z",
        "publisher": "Dataverse de Exemplo Lepidus",
        "citationHtml": "admin, admin; Castanheiras, Iris, 2023, \"The History of Coffee\", <a href=\"https://doi.org/10.70122/FK2/F8QXRU\" target=\"_blank\">https://doi.org/10.70122/FK2/F8QXRU</a>, Demo Dataverse, V1, UNF:6:dEgtc5Z1MSF3u7c+kF4kXg== [fileUNF]",
        "identifier_of_dataverse": "dataverseDeExemplo",
        "name_of_dataverse": "Dataverse de Exemplo Lepidus",
        "citation": "admin, admin; Castanheiras, Iris, 2023, \"The History of Coffee\", https://doi.org/10.70122/FK2/F8QXRU, Demo Dataverse, V1, UNF:6:dEgtc5Z1MSF3u7c+kF4kXg== [fileUNF]",
        "storageIdentifier": "s3://10.70122/FK2/F8QXRU",
        "keywords": [
          "Documentary"
        ],
        "subjects": [
          "Agricultural Sciences"
        ],
        "fileCount": 1,
        "versionId": 224817,
        "versionState": "RELEASED",
        "majorVersion": 1,
        "minorVersion": 0,
        "createdAt": "2023-01-12T14:26:17Z",
        "updatedAt": "2023-01-12T19:31:16Z",
        "contacts": [
          {
            "name": "Conta de Desenvolvimento para Testes",
            "affiliation": ""
          }
        ],
        "publications": [
          {
            "citation": "admin, a., &amp; Castanheiras, I. (2023). <em>The History of Coffee</em>. Lepidus"
          }
        ],
        "metadataBlocks": {
          "citation": {
            "displayName": "Citation Metadata",
            "fields": [
              {
                "typeName": "author",
                "multiple": true,
                "typeClass": "compound",
                "value": [
                  {
                    "authorName": {
                      "typeName": "authorName",
                      "multiple": false,
                      "typeClass": "primitive",
                      "value": "admin, admin"
                    }
                  },
                  {
                    "authorName": {
                      "typeName": "authorName",
                      "multiple": false,
                      "typeClass": "primitive",
                      "value": "Castanheiras, Iris"
                    },
                    "authorAffiliation": {
                      "typeName": "authorAffiliation",
                      "multiple": false,
                      "typeClass": "primitive",
                      "value": "Lepidus"
                    },
                    "authorIdentifierScheme": {
                      "typeName": "authorIdentifierScheme",
                      "multiple": false,
                      "typeClass": "controlledVocabulary",
                      "value": "ORCID"
                    },
                    "authorIdentifier": {
                      "typeName": "authorIdentifier",
                      "multiple": false,
                      "typeClass": "primitive",
                      "value": "0000-0002-1825-0097"
                    }
                  }
                ]
              }
            ]
          }
        },
        "authors": [
          "admin, admin",
          "Castanheiras, Iris"
        ]
      }
    ],
    "count_in_response": 1
  }
}

@briri
Copy link
Collaborator Author

briri commented Oct 9, 2023

thanks @pdurbin this is very helpful!

@briri
Copy link
Collaborator Author

briri commented Jul 22, 2024

somewhat related to #77

@briri
Copy link
Collaborator Author

briri commented Jul 22, 2024

closing as our investigation is done. will create new tickets if we decide to build integrations/harvesters

@briri briri closed this as completed Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants