Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "ListSets" command fails during the creation of a harvesting client for Zenodo #8289

Closed
tjouneau opened this issue Dec 8, 2021 · 4 comments · Fixed by #9460
Closed
Assignees
Labels
Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Milestone

Comments

@tjouneau
Copy link

tjouneau commented Dec 8, 2021

What steps does it take to reproduce the issue?
As a superuser, going to the harvesting client section of the dashboard and trying to create a new client. The base URL is https://zenodo.org/oai2d OR https://www.zenodo.org/oai2d (only the first one is given by the official documentation at https://developers.zenodo.org).

  • When does this issue occur?
    After the first step of the setup wizard is completed (and the base URL has been specified).

  • Which page(s) does it occurs on?
    See above

  • What happens?
    The set list remains empty.
    The server.log registers the following lines :

[2021-12-08T10:09:11.805+0100] [Payara 5.2020] [INFO] [] [edu.harvard.iq.dataverse.HarvestingClientsPage] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1638954551805] [levelValue: 800] [[
  metadataformats: success]]

[2021-12-08T10:09:11.806+0100] [Payara 5.2020] [INFO] [] [edu.harvard.iq.dataverse.HarvestingClientsPage] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1638954551806] [levelValue: 800] [[
  10 metadata formats total.]]

[2021-12-08T10:09:16.767+0100] [Payara 5.2020] [WARNING] [] [edu.harvard.iq.dataverse.HarvestingClientsPage] [tid: _ThreadID=90 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1638954556767] [levelValue: 900] [[
  Failed to execute ListSets; com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 500]]

Important note, a curl command entered on the same server (curl -X GET https://zenodo.org/oai2d/?verb=ListsSets) OR directly in a browser retrieves a partial list of sets (the querying error and 500 response are not reproduced in these cases).
My opinion is it would help to know exactly what command is sent by Dataverse. I don't know any way to check this on my side.

  • To whom does it occur (all users, curators, superusers)?
    You have to be superuser to access the feature.

  • What did you expect to happen?
    See the set list populated at least partially.

Which version of Dataverse are you using?
5.2

Any related open or closed issues to this bug report?
#8267 for being able to get around this limitation by filling the "set" field through the API.
#8290 for not being able to do so (makes Dataverse crash).

@valentinapasquale
Copy link

Hi,

We have encountered the same issue also in Dataverse 5.6.

On the basis of our experience, ListSets command was working (returning a partial list of sets) until November 16th 2021, then stopped working the day after (following some maintenance on the Zenodo side).
We have not opened a ticket to Zenodo yet, given we are not able to debug on the Dataverse side which command is sent to Zenodo and that a curl command entered on the same server (curl -X GET https://zenodo.org/oai2d/?verb=ListsSets) works perfectly, as also reported by @tjouneau.

@mreekie mreekie added pm.epic.nih_harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons labels May 9, 2022
@mreekie mreekie added NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... and removed NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... labels Oct 25, 2022
@mreekie mreekie moved this to NIH (Stefano) in IQSS Dataverse Project Nov 2, 2022
@landreev
Copy link
Contributor

landreev commented Jan 9, 2023

There's a good chance this has already been fixed; either during the XOAI update (like many other older harvesting issues), or even earlier. Also, we have recently fixed up the harvesting clients API that could be used to create a client when it cannot be done via the UI for whatever reason.
I'll give it a 33, just in case. But it may end up being shorter/simpler.

@mreekie mreekie added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jan 9, 2023
@mreekie mreekie moved this from NIH bklog items (Stefano) to 1️⃣ ▶ORDERED BACKLOG (Stefano) in IQSS Dataverse Project Jan 10, 2023
@mreekie
Copy link

mreekie commented Jan 10, 2023

Priority Review with Stefano:

  • Moved from NIH Deliverables Backlog to Ordered Backlog

@mreekie mreekie moved this from 1️⃣ ▶ORDERED BACKLOG (Stefano) to 3️⃣▶ 💨👟SPRINT READY BACKLOG in IQSS Dataverse Project Jan 18, 2023
@mreekie mreekie moved this from 3️⃣▶ 💨👟SPRINT READY BACKLOG to 4️⃣▶⏱In This Sprint in IQSS Dataverse Project Jan 26, 2023
@mreekie mreekie moved this from 4️⃣▶⏱In This Sprint to 3️⃣▶ 💨👟SPRINT READY BACKLOG in IQSS Dataverse Project Jan 27, 2023
@mreekie mreekie added pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards labels Mar 20, 2023
@landreev landreev self-assigned this Mar 20, 2023
@landreev
Copy link
Contributor

To confirm what I said back in January, this appears to be another harvesting issue that we already fixed as part of the major overhaul of the underlying oai library used by Dataverse (xoai). Note the error message cited in the original user report:

Failed to execute ListSets; com.lyncode.xoai.serviceprovider.exceptions.HttpException: Error querying service. Returned HTTP Status Code: 500]]

- the package mentioned in it, com.lyncode.xoai has since been replaced by the much improved and updated version of xoai that is now hosted by gdcc as io.gdcc.xoai.

The scenario described, creating a client to harvest from zenodo.org now works, showing a very long list of sets to choose from in the pull down menu. I'm assuming that the reason it was failing in the past was exactly that, that the server was listing too many sets that required a few resumption tokens, with something breaking in the process.

Despite the fact that you can now successfully create the specific client as described in the issue, we should assume that something may still go wrong during the interactive steps involved in creating a client via the GUI. That process by design relies on querying the server in real time, to ensure that it's responding and to get the lists of the sets and the metadata formats that it supports. If any of these requests fail, the client cannot be created. Since this issue was opened, we have added (or rather, fixed) a working API for creating clients. One important thing about that API is that, unlike the GUI, the application does not try to validate the entered url of the server or to make any real time OAI calls. This is by design, giving an admin an option to be able to create a client in a rare case where the ListSets or ListMetadataFormats exchanges are failing with an otherwise valid OAI server, preventing a client from being created via the GUI. (This obviously requires that the admin really knows what they are doing, as they are responsible for supplying valid parameters to the API).

This is not explicitly spelled out in the guide, I realized. I will add that and make a PR closing this issue. But aside from that, I don't think there's anything we need to do here. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

5 participants