Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OMIDs for agent roles instead of responsible agents in Meta CSV "author" field #26

Open
eliarizzetto opened this issue Jul 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@eliarizzetto
Copy link
Collaborator

In version 8 (https://doi.org/10.6084/m9.figshare.21747461.v8) and version 9 (https://doi.org/10.6084/m9.figshare.21747461.v9) of the OpenCitations Meta CSV dump, the author field of some of the resources erroneously contains OMIDs of agent roles (prefixed by "/ar") instead of OMIDs of responsible agents (prefixed by "/ra").
For example, the following row, storing metadata for br/06602041963, contains three agent roles in the author field:

id title author issue volume venue page pub_date type publisher editor
omid:br/06602041963 doi:10.33029/9704-6031-3-2021-1-432 openalex:W4244840417 CLINICAL PHARMACOLOGY. Obstetrics. Gynecology. Infertile Marriage [omid:ar/06609023674]; [omid:ar/06609023675]; [omid:ar/06609023673] CLINICAL PHARMACOLOGY. Obstetrics. Gynecology. Infertile Marriage [omid:br/06602042953] 1-432 2021 book chapter Geotar-Media Publishing Group [omid:ra/0610116993 crossref:18453] Radzinskiy, E.V. [omid:ra/06606217946]; Shykh, E.V. [omid:ra/06606217947]

CSV rows with a faulty value in the the author field are 7,607,734 in version 8 and 8,105,378 in version 9. Such errors in the CSV files are not observable when the same data is accessed via API (see e.g. https://opencitations.net/meta/api/v1/metadata/omid:br/06602041963).

Based on the study of randomly sampled cases (including the one mentioned above), my guess is that the following conditions hold also for the rest of the rows interested by the phenomenon:

  • no author is available for the bibliographic resource (neither in the CSV file nor in the triplestore accessible via SPARQL endpoint);
  • the OMIDs of the roles mistakenly mentioned in the author field are the OMIDs of the roles of the other contributors linked to the bibliographic resource, i.e. the publisher(s) and the editors (see all the contributors of br/06602041963 as they are recorded in the triplestore);
  • the publisher and the editor fields contain the correct data, i.e. the name and IDs (including OMID) of the publisher(s) and the editor(s), despite the fact that the entities of the roles, which should not be represented at all in the CSV dump, are stored in the author field. E.g., the role of publisher of br/06602041963, identified by ar/06609023673, is rightly linked to the agent ra/0610116993 in the triplestore, which is correctly mentioned in the CSV row and is the actual publisher of the publication in question.
@eliarizzetto eliarizzetto added the bug Something isn't working label Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant