Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10982 Request identifier support for oai dc harvesting #11010

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

stevenferey
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@coveralls
Copy link

Coverage Status

coverage: 21.892% (+0.04%) from 21.856%
when pulling 7519acc on Recherche-Data-Gouv:10982-header-id-support-for-oai-dc-harvesting
into b28812b on IQSS:develop.

@landreev
Copy link
Contributor

landreev commented Nov 8, 2024

Hi, thank you for the PR, however, this is likely a duplicate of what I am currently working on as part of #10909. We are expecting this feature - being able to use the OAI identifier (the <identifier> from the <header> section) as the persistent identifier for the harvested dataset, to be in 6.5, due in December.

@stevenferey
Copy link
Contributor Author

Thank you for your contribution!
I note that this publication request overlaps with work already underway with PR #11011
Please feel free to review and reuse any code I've developed if you find it helpful for your own work.

@landreev
Copy link
Contributor

@stevenferey Hi, I am in the process of finalizing this feature for the v6.5 release.
There are certain differences in how you and I approached implementing the use of this OAI-level identifier as the pid for the imported dataset. So, I'd like to reconcile our approaches in a way that will work for all of our use cases.

  • In my current implementation, it is a boolean flag on the HarvestingClient level - it is up to the admin whether to use the identifier from the <header> section or not. It is off by default; when enabled, the import code will attempt to use the OAI identifier as the only choice, i.e., it'll fail if the <header> does not contain what looks like a valid persistent id.
  • In your implementation, the code will always attempt to use the <header>-level identifier, as one of the possible options. Notably, you add it to the end of the list of the pid candidates, so it'll only be used if there are no other pids found in the body of the oai_dc record.

So, I'm thinking of a hybrid solution. The only part of my current implementation that is truly important for me is that I need to be able to (optionally) use the OAI identifier from the <header> as the FIRST choice for the pid. This is to accommodate my main use case, being able to harvest from DataCite directly. But in all other situations, I'm happy to adopt your approach. What I'm proposing therefore will work as follows:

  • I will keep the harvesting client-level boolean flag, as currently implemented in my 10909 Support for OAI-PMH harvesting from DataCite  #11011. When set to true via the API, the import will use the identifier as the first candidate for the pid.
  • When the flag is NOT set, i.e., the default behavior, it will do what your code does in this PR - will attempt to use the OAI identifier as the last choice.

Does this sound reasonable?

@landreev
Copy link
Contributor

@stevenferey
Thank you again for the PR. And specifically, for adding an import test. I'm very happy to incorporate it.

@stevenferey
Copy link
Contributor Author

@landreev Thanks for your feedback,

I think it's a good idea to integrate the two visions, indeed. Thanks for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚧 Dev by Recherche Data Gouv
Development

Successfully merging this pull request may close these issues.

Feature Request: Request identifier support for OAI_DC harvesting
3 participants