-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update old DataCite schema records to 4.5 #540
Comments
Rushiraj created a ticket for similar topic #559 with information on how to retrieve DataCite records by schema version To get stats on IDs with Schema 3 versions for a specific repository (e.d.cdl.cdl) is as follows:
|
Jing created related/duplicated ticket #556 with additional info. Copy additional info over and close dup. ticket. We received an email from DataCite regarding Schema 3 deprecating schedule and request of updating metadata to Schema 4. From: Kelly Stathis [email protected] I'm writing to share that DataCite plans to deprecate Schema 3 on January 1, 2025, and to request your assistance with communicating this change to the Consortium Organizations within your consortium. You can read more about what will change here: https://support.datacite.org/docs/updating-from-schema-3-to-schema-4. Once we deprecate Schema 3, repositories will be required to use Schema 4 for DOI registration and metadata updates. There are 8 Repositories in your consortium with at least one Schema 3 DOI. Of these, 2 actively used Schema 3 in the past year to register or update DOIs. The Repositories actively using Schema 3 will be impacted by this change. To assist you in understanding this usage, I have attached a spreadsheet of Repositories in your consortium to this email. This is broken down as follows: • Count of DOIs (Total) The counts of DOIs missing resourceTypeGeneral and using contributorType "Funder" are included because these DOIs are not compatible with Schema 4. For more information, please see the FAQ covering differences between Schema 3 and Schema 4. Please work with your Consortium Organizations as soon as possible to ensure that each has sufficient time to update their systems and workflows to use DataCite Metadata Schema 4. We're available to answer any questions you have about the process. Best regards, — |
DataCite report (Jan 2024) on Schema 3 usage within your consortium:
Query to find Schema 3 records: Query to find Schema 3 records that are missing resourceTypeGeneral: Query to find schema 3 records that use the contributorType "Funder" |
Records by schema versions (https://doi.datacite.org/providers/cdlco/dois):
v2.1 records: https://doi.datacite.org/providers/cdlco/dois?schema-version=2.1:
Version 3 and version 2.2 records are retrieved and saved in the Google Drive folder: |
Code for validating and formatting:
Notes:
proc-datacite.py => _create_or_update() => impl.datacite.uploadMetadata() => impl.datacite.formRecord(): Form an XML record for upload to DataCite, employing metadata mapping if necessary
|
The
|
Retrieved DataCite 3 records by campus:
Sample command:
Record files are saved in the Google Drive folder EZID/Identifiers/DataCite/DataCite_3_records Note: DataCite API offers two pagination options:
Example to retrieve the first 1,000 records:
Results file contains total records and page counts, plus the URL for retrieving the next page:
Note: need to manually add search criteria "schema-version=3" to the next page url: To: |
Counts of DOIs by campus and by categories (Retrieved on June 17, 2024):
|
Noting change in v3 record counts from from January 2024:
|
Retrieved v2.2 records using
|
Moving to backlog. Planning is we will send out a message in the fall to users with records in the old version of the schema, give them chance to upgrade, and covert if they have not after DataCite deprecates old schema versions in 2025. |
I refactored the retrieve records script to identify the < 4.x schema records by shoulder so we can contact the corresponding users. Updated files are here.
|
Unique DOI prefixes: 56 'doi:10.13022/M3', However, only 22 prefixes are in the ezid shoulder table. Find out the the not matched ones. |
@jsjiang After some review, some of these appear to be "super shoulder" (i.e DOI prefix only) values in EZID, e.g. 10.5063. The revised script derives from parsing all the DOIs into their corresponding shoulders (prefix + first two characters), but where a super shoulder exists, this may be an erroneous derivation/something that was never created in EZID. In these cases, I think we can just pull the user accounts and emails associated with the super shoulder/prefix instead. |
Not matched prefixes: DOI:10.15144/LT - no |
datacite_v2_v3_prefix_user_email.txt
Note:
|
Query:
|
As of January 2025, DataCite will require that DOIs be registered and updated with schema version 4.0 or newer. See the initial announcement here. https://datacite.org/blog/deprecating-schema-3/.
To keep up with DataCite policies and practices, EZID's DataCite configuration needs to be updated so that DataCite DOIs can no longer be registered or updated with schema versions older than 4.0. This affects DOIs created via the API, UI, and XML deposits. Users will need to be informed about the change in advance and provided with guidance about upgrading.
Steps
Planned Workflow after Jan 2025
Creating new ID:
Updating:
The text was updated successfully, but these errors were encountered: