Bubble-up publications #3

mslw · 2023-09-15T09:03:00Z

This is a nice-to-have eventually but not a priority at the time of writing.

The SFB catalog is structured in the following way:

landing page
└── Project (project metadata here)
    └── Research dataset (dataset metadata here)

Currently, publications can be added to a Project dataset (via user-submitted RIS / nbib or web-scraped json files) or to a Research dataset (e.g. via a tabby record). There is an expectation that tabby records would declare at least one publication.

It could be neat if the publications added to a Research dataset could be automatically reported at a level-up (Project). This of course goes beyond the "each (sub)dataset is standalone" approach of the catalog itself, but reflects the logical nature of reality (dataset-related publication is a project's publication).

As a side note, and an extended problem, making a reverse connection has also been proposed: sfb1451/metadata-catalog#24

When adding things up in the hierarchy, there is a duplication to be dealt with: a publication can be listed in project and its subdataset (maybe even 100%, which would make this issue irrelevant). The catalog does not handle this on its own, so it would be up to the script here. Can the DOIs be reliably used as identifiers to deduplicate (considering that we treat them as optional for tabby)?

The text was updated successfully, but these errors were encountered:

jsheunis · 2023-10-06T09:47:20Z

I think two approaches are relevant:

As you say, the script can be updated to let publications bubble up to (grand)parent datasets. Wrt identifying publications, I guess a step-wise approach could work, i.e. first try using DOI and if that is unavailable try some sort of text matching in the citation or title (if that exists). Or maybe there's an API that can take a citation and return a DOI?
In the context of generating a catalog from linked data (see Look at catalog rendering concept from semantic data view metadata-catalog#46 and Accept and render JSON-LD metadata datalad/datalad-catalog#341), the whole concept of the catalog schema (and in the current case specifically, a publication that is a property of a dataset) will likely change. It's likely that publications will stand on their own as entities with ontology-based definitions, and that they will have many possible relationships (in the sense of semantic data triples) to other semantic entities such as datasets. In this scenario, the "bubbling up" process would probably translate to adding another triple in the graph / metadata.

Regarding timelines, I think option 1 is more sensible for a short-term deliverable.

mslw · 2023-10-06T10:33:07Z

Or maybe there's an API that can take a citation and return a DOI?

There is api.crossref.org/works?query.bibliographic that I used through habanero python package here to scrape the non-standardized list of sfb publications. It's surprisingly good, but requires some processing - due to free-form citation nature it returns matches with scores, and sometimes e.g. publication and its preprint score similarily and have to be distinguished by type - see docstrings in that file to get a clue.

Regarding timelines, I think option 1 is more sensible for a short-term deliverable.

Yes, but I am not 100% sure that we need to go for this deliverable right now.

jsheunis mentioned this issue Oct 6, 2023

Allow publication to link to a dataset? sfb1451/metadata-catalog#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bubble-up publications #3

Bubble-up publications #3

mslw commented Sep 15, 2023

jsheunis commented Oct 6, 2023

mslw commented Oct 6, 2023

Bubble-up publications #3

Bubble-up publications #3

Comments

mslw commented Sep 15, 2023

jsheunis commented Oct 6, 2023

mslw commented Oct 6, 2023