-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue #5277: first implementation step for exporting related publicat… #8357
issue #5277: first implementation step for exporting related publicat… #8357
Conversation
…ication to DataCite
Hi @pkiraly - thanks! I know that this is a long-standing issue, and I appreciate the PR to start conversations around a potential solution. I think any solution would need to require user selection of the relationship, and that would require some UI changes. I don't think we'll review and merge this in it's current form, and this is one of my last few days with the project, but:
|
Hi @djbrooke - I know these considerations, and we discussed it within the issue and related issues. We think that we have to think about the relationship type, but it leads to changing the underlying metadata structure of the core metadata block. Also it involves the other "Related" fields. Moreover: Datacite's RelatedItems connects identifiers, so we should be sure that we do not add a citations like string, but an identifier in every related fields, so they should have a similar structure than that of Related publications (free text, identifier schema, identifier value, relation type). As I see it might be a big change, and probably can not be implemented in baby steps, but larger steps which involves metadata block change, database retrieval change, UI change, and maybe API change. So this PR keeps it in mind that it might be a preliminary, somewhat temporary suggestion which prepares landscape for further changes. |
I think we'd want to move forward with this PR because it's a small step toward functionality that's been on our radar since at least 2015, but it would also be best to agree on some details about the next steps. I've always thought that QDR's and Madrono's repositories took this PR's approach because they figured that doing something to reap the benefits of sending this metadata to DataCite (such as contributing to the Make Data Count effort, whose metrics they were also early to implement in their repositories) was better than potentially waiting years for a longer lasting solution, which is what wound up happening. If this work was done as part of the NIH work or the Harvard Data Commons work, would it be another year or longer before Dataverse repositories would be able to start sending related publication metadata to DataCite? The Make Data Count database considers several relationTypes when tracking and reporting relationships between resources, so in practice when Make Data Count says that a dataset has 3 "citation counts," that might not mean that three other resources formally cited that dataset. When looking at Make Data Count's data, I think we can reliably say only that the dataset has some sort of relationship with three other resources (and we know that we can't rely on the formal citation of datasets for measuring the influence of data). I heard in some DataCite webinar last year that researchers have been analyzing Make Data Count data collected so far, and maybe they're seeing patterns in publishers' uses of relationship types that can help DataCite and Crossref better define and maybe somehow enforce relationship type definitions. But right now I don't think that the benefits of specifying relationship types have been realized, while I think there are already some benefits to reporting to DataCite that a dataset is somehow related to another resource. And Dataverse repositories have been collecting this information for years. Many Dataverse repositories don't know and I think would be hard pressed to figure out the types of relationships to apply to the Related Publication metadata they already have. To me that means that repositories will need to apply some type of general relationship type by default. And I think that's what this PR does. Maybe the term IsCitedBy is too specific and shouldn't be used as a general, "somehow related to" term. When we were talking about relationTypes in #2778, in a Google Doc I proposed using IsReferencedBy, which DataCite defines as "indicates A is used as a source of information by B". But if we do use IsCitedBy to say that a dataset is somehow related, we should make sure that we all know this so that it's easier later to update the metadata with another term for "somehow related" if we want to reserve IsCitedBy for "this dataset appears in a citation in this publication". The more consistently the Dataverse community can use these terms, even if other publishing communities (like Zenodo) aren't using them the same way, the easier it'll be to apply changes if, for example, DataCite and/or CrossRef settle on fewer relationship types or revise their definitions based on how they're actually being used. |
@TaniaSchlatter @scolapasta Did you have a chance to take a look on this issue? For us it would be quite urgent, and seems nothing happened in the last 2 months. Do you think it is this out of scope/interest for the majority of the Dataverse community or at least it has a low priority? |
My sense is that one reason this is stuck is the choice of isCitedBy. From the discussion in #2778. isCitedBy is usually reserved for cases when a dataset is in the reference section of a paper. QDR decided on 'IsSupplementTo' in it's fork as an incremental step while waiting for support for a broader range of relationships to be supported (e.g. by adding a relationship type field to the related publications entry). (That said, it sounds like there is another site using isCitedBy). While recognizing that support for multiple types is of interest, perhaps a way forward, assuming there's consensus to do something ~now, would be to select one of the more vague relationship types (IsSupplementTo, IsReferencedBy) as a default and poll the community (e.g. via email) on a few questions:
Hopefully that would provide enough feedback to quickly tweak this PR (e.g. by adding a setting(s) to select the type and/or turn off relationship reporting) as needed. |
I'm just noting that we're still discussing what to send to DataCite and how: https://groups.google.com/g/dataverse-community/c/vxAw8vs7_K8/m/DRppjQjnEAAJ Also, @jggautier made a nice list of related issues with regard to sending more and better data to DataCite: https://docs.google.com/document/d/1z6pPVT4_fc833thD9MyhoAMy2lwKDmCFX5QtKq8I04Y Finally, some related discussion here:
|
If you are still interested in this PR, can you please merge and resolve any merge conflicts with the latest from develop? If so, we can prioritize reviewing and QAing the changes. If we don’t hear from you by May 22, 2024, we’ll go ahead and close this PR (it can always be reopened after that date, if there is still interest). |
I wrote that my proposal for merging the DataCite and OpenAIRE exports will do what @pkiraly proposed in this PR, and I wrote that I thought this PR could be closed (or I guess deleted?) after the DataCite and OpenAIRE exports are merged. @pkiraly if you're available, I'm hoping you can check out that proposal if you haven't already. @qqmyers said he'd work on it, too. The proposal also references work that we're planning, outside of that proposal, about defining how datasets are related to other research objects. |
@jggautier Is it OK for you if I do it early next week? |
Definitely, that's OK with me. Thanks! |
@jggautier I read your plan, and it is absolutely fine for me. |
Thanks again @pkiraly! |
It sounds like there's agreement to close this PR in favor of the proposal above. Great. Closing. |
…ion to DataCite
What this PR does / why we need it:
It was a feature request from our users, and as I found in the issue queue others also missed that related publications and also other "Related ..." metadata fields are not exported into DataCite metadata.
Which issue(s) this PR closes:
It is a "baby step" towards a better solution. Its intention is to move forward the discussion and to highlight theoretical and practical topics.
Special notes for your reviewer:
Suggestions on how to test this:
in the
<relatedIdentifiers>
section.Does this PR introduce a user interface change? If mockups are available, please link/include them here:
It changes the XML output of the DataCite export - see previous point.
Is there a release notes update needed for this change?:
If community accepts this approach, it would worth to mention in the release not since I believe it increases the number of connections between papers and dataset in the PID network, thus somehow it might increase the discoverability.
Additional documentation:
Related to #5277