Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DataCite EZID API Calls W/ DataCite REST API #2268

Closed
mbarnett opened this issue Apr 1, 2021 · 9 comments
Closed

Replace DataCite EZID API Calls W/ DataCite REST API #2268

mbarnett opened this issue Apr 1, 2021 · 9 comments

Comments

@mbarnett
Copy link
Contributor

mbarnett commented Apr 1, 2021

The EZID Compatibility API is sunsetting at the end of this year, per https://blog.datacite.org/sunsetting-of-the-ez-api/

We'll need to switch over to their normal API. A cursory glance at GitHub didn't turn out any useful gems (the only thing I saw was one project that hadn't seen an update in over 6 years), so we may need to wrap this ourselves.

@pgwillia
Copy link
Member

pgwillia commented Sep 7, 2021

https://rubygems.org/gems/datacite seems to do most of what we need at this point.

client = Datacite::Client.new(username: "foo",
                              password: "bar",
                              host: "api.test.datacite.org")

def create
return unless @item.unminted? && !@item.private?
ezid_identifer = Ezid::Identifier.mint(Ezid::Client.config.default_shoulder, ezid_metadata)
if ezid_identifer.present?
@item.tap do |uo|
uo.doi = ezid_identifer.id
uo.save!
end
@item.synced!
ezid_identifer
end
rescue StandardError => e
# Skip the next handle_doi_states after_save callback and roll back
# the state to it's previous value. By skipping the callback we can prevent
# it temporarily from queueing another job. As this could make it end up
# right back here again resulting in an infinite loop. This all works around a bug in ActiveFedora
# preventing us from skipping this automatically
@item.skip_handle_doi_states = true
@item.unpublish!
raise e
end

Looks like this will be two steps to 1. create and 2. update attributes

result = client.autogenerate_doi(prefix: '10.0001')
attributes = {
      "event": "publish",
      "doi": "10.5438/0012",
      "creators": [{
        "name": "DataCite Metadata Working Group"
      }],
      "titles": [{
        "title": "DataCite Metadata Schema Documentation for the Publication and Citation of Research Data v4.0"
      }],
      "publisher": "DataCite e.V.",
      "publicationYear": 2016,
      "types": {
        "resourceTypeGeneral": "Text"
      }
result = client.update(id: '10.0001/bc123df4567', attributes: attributes)

def update
return unless @item.awaiting_update?
ezid_identifer = Ezid::Identifier.modify(@item.doi, ezid_metadata)
return if ezid_identifer.blank?
if @item.private?
@item.unpublish!
else
@item.synced!
end
ezid_identifer
rescue StandardError => e
# Skip the next handle_doi_states after_save callback and roll back
# the state to it's previous value. By skipping the callback we can prevent
# it temporarily from queueing another job. As this could make it end up
# right back here again resulting in an infinite loop. This all works around a bug in ActiveFedora
# preventing us from skipping this automatically
@item.skip_handle_doi_states = true
if @item.private?
@item.synced!
else
@item.unpublish!
end
raise e
end

attributes = {
      "event": "publish",
      "doi": "10.5438/0012",
      "creators": [{
        "name": "DataCite Metadata Working Group"
      }],
      "titles": [{
        "title": "DataCite Metadata Schema Documentation for the Publication and Citation of Research Data v4.0"
      }],
      "publisher": "DataCite e.V.",
      "publicationYear": 2016,
      "types": {
        "resourceTypeGeneral": "Text"
      }
result = client.update(id: '10.0001/bc123df4567', attributes: attributes)

def self.remove(doi)
Ezid::Identifier.modify(doi, status: "#{Ezid::Status::UNAVAILABLE} | withdrawn", export: 'no')
end

I don't see an option to delete/remove in the gem. The docs https://support.datacite.org/reference/dois-2#delete_dois-id

require 'uri'
require 'net/http'
require 'openssl'

url = URI("https://api.test.datacite.org/dois/id")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Delete.new(url)

response = http.request(request)
puts response.read_body

Which isn't what we were doing. I don't know if there is an equivalent to status: "#{Ezid::Status::UNAVAILABLE} | withdrawn", export: 'no'

@pgwillia
Copy link
Member

pgwillia commented Sep 7, 2021

Maybe this is equivalent to withdrawn?

request.body = "{\"data\":{\"attributes\":{\"event\":\"hide\"}}}"

@johnhuck
Copy link

johnhuck commented Sep 7, 2021

It looks like the EZ API used its own terms for the 3 DOI states. [1] shows a table with the equivalent native DataCite terms. The state determines whether you are allowed to delete. EZ API had Reserved, Public, and Unavailable [2]. Deletion is an action that can only be applied to a Reserved DOI.

If you need to withdraw a DOI after it has been made public, you change the status to Unavailable, provide a reason (that's what "withdrawn" is in your example) and provide an updated URL that points to a tombstone page (which you are responsible for creating).

The official DataCite states are Draft (=Reserved), Findable (=Public), and Registered (=Unavailable). I think everything works the same and it's just the names that are different.

I can't tell what the difference is between the the REST API and the MDS (Metadata Store) API. They both seem to do the same thing. However, there is more descriptive information on the MDS API documentation [3], which may help you sort out how the REST API is supposed to work.

[1] DOI States
[2] EZ API docs
[3] MDS API docs

@johnhuck
Copy link

johnhuck commented Sep 7, 2021

@pgwillia I wasn't sure how to answer your question, because I didn't know the context of the function in the original gem. Feel free to throw more questions at me.

@pgwillia
Copy link
Member

pgwillia commented Sep 28, 2021

ezid metadata was fairly flat

{
      datacite_creator: @item.authors.join('; '),
      datacite_publisher: PUBLISHER,
      datacite_publicationyear: @item.sort_year.presence || '(:unav)',
      datacite_resourcetype: DATACITE_METADATA_SCHEME[@item.item_type_with_status_code],
      datacite_resourcetypegeneral: DATACITE_METADATA_SCHEME[@item.item_type_with_status_code].split('/').first,
      datacite_title: @item.title,
      target: Rails.application.routes.url_helpers.item_url(id: @item.id),
      # Can only set status if been minted previously, else its public
      status: @item.private? && @item.doi.present? ? UNAVAILABLE_MESSAGE : Ezid::Status::PUBLIC,
      export: @item.private? ? 'no' : 'yes'
    }

I think the same thing for the datacite api would be

data: {
      type: "dois",
      attributes: {
        prefix: prefix,
        identifiers: {
          identifier: Rails.application.routes.url_helpers.item_url(id: @item.id)
        },
        creators: {
          name: @item.authors.join('; ')
        },
        titles: {
          title: @item.title
        },
        types: {
          resourceType: DATACITE_METADATA_SCHEME[@item.item_type_with_status_code],
          resourceTypeGeneral: DATACITE_METADATA_SCHEME[@item.item_type_with_status_code].split('/').first
        },
        publisher: PUBLISHER,
        publicationYear: @item.sort_year.presence || '(:unav)',
      }
    }

@pgwillia
Copy link
Member

pgwillia commented Sep 28, 2021

Or rather we want these: https://support.datacite.org/docs/api-create-dois#create-a-findable-doi

{
  creators: [{
    name: @item.authors.join('; ')
  }],
  titles: [{
    title: @item.title
  }],
  publisher: PUBLISHER,
  publicationYear: @item.sort_year.presence || '(:unav)',
  types: {
    resourceTypeGeneral: DATACITE_METADATA_SCHEME[@item.item_type_with_status_code].split('/').first
  },
  url: Rails.application.routes.url_helpers.item_url(id: @item.id),
  schemaVersion: "http://datacite.org/schema/kernel-4"
}

[edit] I found out that schemaVersion is a constant that represents the standard that the metadata conforms to.

@pgwillia
Copy link
Member

DOIs are updated when certain fields change

DOI_FIELDS = ['title', 'creator', 'dissertant', 'item_type', 'publication_status'].freeze
def doi_fields_changed?
changed.any? do |changed_field|
DOI_FIELDS.include? changed_field
end
end

causing the altered! event
elsif (not_available? && transitioned_from_private?) ||
(available? && (doi_fields_changed? || transitioned_to_private?))
# If private, we only care if visibility has been made public
# If public, we care if visibility changed to private or doi fields have been changed
altered!(id)
end

which marks the object awaiting_update
event :altered, after: :queue_update_job do
transitions from: [:available, :not_available], to: :awaiting_update
end

and queues an update job
def queue_update_job
DOIUpdateJob.perform_later(id)
end

The job won't run if awaiting_update isn't marked
return unless @item.awaiting_update?

@pgwillia
Copy link
Member

pgwillia commented Nov 2, 2021

  • need to check if anything needs to happen to existing dois?
    • can existing (ezid) doi's be found and updated?

@pgwillia
Copy link
Member

pgwillia commented Dec 9, 2021

A number of dois have been generated with Datacite api including https://era.library.ualberta.ca/items/525bfc1f-4333-4a10-aa2a-40930be459f7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants