Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'key' in ZenodoRepository.download_url() after Zenodo migration #371

Closed
khaeru opened this issue Oct 16, 2023 · 8 comments · Fixed by #375
Closed

KeyError: 'key' in ZenodoRepository.download_url() after Zenodo migration #371

khaeru opened this issue Oct 16, 2023 · 8 comments · Fixed by #375
Labels
bug Report a problem that needs to be fixed

Comments

@khaeru
Copy link

khaeru commented Oct 16, 2023

Zenodo recently migrated to InvenioRDM, as described here (cf. #350). Since then, the service has had sporadic downtime; see a header message on https://zenodo.com ("Oct 15 08:30 UTC: We are continuing to work on resolving identified issues") and sporadic downtime here.

It appears that Zenodo's API responses have changed in a way that causes errors in pooch. Excerpting from the below output, I see:

{
  'files': [
    {
      'id': '96ec5297-801c-4fe8-b797-2804e88784c6',
      'filename': 'MESSAGEix-GLOBIOM_1.1_R11_no-policy_baseline.xlsx',
      'filesize': 135453950,
      'checksum': '222193405c25c3c29cc21cbae5e035f4',
      'links': {'self': 'https://zenodo.org/api/records/5793870/files/96ec5297-801c-4fe8-b797-2804e88784c6'}
    }
  ]
}

The single record in the "files" collection does not have a key 'key'.

Zenodo published What's changed? and What's new? pages describing the migration, but they don't indicate any API changes. Its API documentation doesn't indicate a new version. So I am not sure if:

  1. This is a well-advertised, expected, permanent change of Zenodo's API, to which pooch has not yet adapted, or perhaps more likely
  2. This is an unintended, erroneous change that may (sooner or later, if they are aware) be corrected by Zenodo. (More evidence for this case: the URL https://zenodo.org/api/records/5793870/files/96ec5297-801c-4fe8-b797-2804e88784c6 appearing in the record above gives a 404 error.)

Regardless of which is the case, a fix would be welcome! However I recognize in either case it is difficult to adapt to an API change which is either not documented or accidental.

Full code that generated the error

import pooch

args = dict(
    base_url="doi:10.5281/zenodo.5793870",
    registry={
        "MESSAGEix-GLOBIOM_1.1_R11_no-policy_baseline.xlsx": (
            "md5:222193405c25c3c29cc21cbae5e035f4"
        ),
    },
)

p = pooch.create(path=".", **args)

result = p.fetch(list(args["registry"].keys())[0])

print(result)

As well, to help diagnose/debug, I have edited pooch.downloads.ZenodoRepository.download_url(), inserting the line:

print(f"{self.api_response = }")

Full error message

Downloading file 'MESSAGEix-GLOBIOM_1.1_R11_no-policy_baseline.xlsx' from 'doi:10.5281/zenodo.5793870/MESSAGEix-GLOBIOM_1.1_R11_no-policy_baseline.xlsx' to '/home/khaeru/vc/iiasa/models'.                                                    
self.api_response = {'created': '2023-05-23T21:20:08.602906+00:00', 'modified': '2023-06-15T09:49:10.161370+00:00', 'id': 5793870, 'conceptrecid': '5793869', 'doi': '10.5281/zenodo.5793870', 'conceptdoi': '10.5281/zenodo.5793869', 'doi_url': 'https://doi.org/10.5281/zenodo.5793870', 'metadata': {'title': 'MESSAGEix-GLOBIOM R11 no-policy baseline', 'doi': '10.5281/zenodo.5793870', 'publication_date': '2023-05-23', 'description': '<p>This dataset contains the parameterization of a no-policy baseline scenario of the global 11-regional <a href="https://docs.messageix.org/projects/global/en/">MESSAGEix-GLOBIOM</a> integrated assessment model. <a href="https://docs.messageix.org/projects/models/en/latest/pkg-data/node.html#region-aggregation-r11">Regions</a>, <a href="https://docs.messageix.org/projects/models/en/latest/pkg-data/year.html">time periods</a>, <a href="https://docs.messageix.org/projects/models/en/latest/pkg-data/codelists.html#commodities-commodity-yaml">commodities</a>, <a href="https://docs.messageix.org/projects/models/en/latest/pkg-data/codelists.html#commodities-commodity-yaml">technologies</a> and <a href="https://docs.messageix.org/projects/models/en/latest/pkg-data/relation.html">relations</a> included in this model are described in a separate <a href="https://docs.messageix.org/projects/models/">repository</a>. The dataset relies on the <a href="https://docs.messageix.org/en/stable/">MESSAGEix modeling framework</a> (<a href="https://doi.org/10.1016/j.envsoft.2018.11.012">Huppmann et al. 2019</a>) and can be imported into MESSAGEix via the <a href="https://docs.messageix.org/en/stable/api.html?highlight=read_xls#message_ix.Scenario.read_excel">read_excel()</a> functionality for which a <a href="https://github.com/iiasa/message_ix/blob/main/tutorial/westeros/westeros_baseline_using_xlsx_import_part1.ipynb">tutorial</a> is available. After the import the scenario can be solved and modified to create new scenarios. Note that the published scenario as included in the <a href="https://zenodo.org/record/5553976">ENGAGE global scenarios dataset</a> has been run with a release candidate of <a href="https://docs.messageix.org/en/stable/whatsnew.html#v3-4-0-2022-01-27">version 3.4.0</a> of MESSAGEix.</p>', 'access_right': 'open', 'creators': [{'name': 'Fricko, Oliver', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-6835-9883'}, {'name': 'Frank, Stefan', 'affiliati
on': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0001-5702-8547'}, {'name': 'Gidden, Matthew', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0003-0687-414X'}, {'name': 'Huppmann, Daniel', 'af
filiation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-7729-7389'}, {'name': 'Johnson, Nils A.', 'affiliation': 'Electric Power Research Institute (EPRI)'}, {'name': 'Kishimoto, Paul Natsuo', 'affiliation': 'International Institute f
or Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-8578-753X'}, {'name': 'Kolp, Peter', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0003-0122-2839'}, {'name': 'Lovat, Francesco', 'affiliation': 'Danish Energy Agency',
 'orcid': '0000-0002-4331-980X'}, {'name': 'McCollum, David L.', 'affiliation': 'Oak Ridge National Labortory (ORNL) and International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0003-1293-0179'}, {'name': 'Min, Jihoon', 'affiliation': 'International Ins
titute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-0020-1174'}, {'name': 'Rao, Shilpa', 'affiliation': 'Norwegian Institute of Public Health', 'orcid': '0000-0003-4012-9063'}, {'name': 'Riahi, Keywan', 'affiliation': 'International Institute for Applied Syste
ms Analysis (IIASA)', 'orcid': '0000-0001-7193-3498'}, {'name': 'Rogner, Holger', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-1045-9830'}, {'name': 'van Ruijven, Bas', 'affiliation': 'International Institute for Applied
 Systems Analysis (IIASA)', 'orcid': '0000-0003-1232-5892'}, {'name': 'Vinca, Adriano', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-3051-178X'}, {'name': 'Zakeri, Behnam', 'affiliation': 'International Institute for App
lied Systems Analysis (IIASA)', 'orcid': '0000-0001-9647-2878'}, {'name': 'Augustynczik, Andrey Lessa Derci', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)'}, {'name': 'Deppermann, Andre', 'affiliation': 'International Institute for Applied Sy
stems Analysis (IIASA)', 'orcid': '0000-0002-7943-4842'}, {'name': 'Ermolieva, Tatiana', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)'}, {'name': 'Gusti, Mykola', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 
'orcid': '0000-0002-2576-9217'}, {'name': 'Lauri, Pekka', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0002-5472-2039'}, {'name': 'Heyes, Chris', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 
'orcid': '0000-0001-5254-493X'}, {'name': 'Schoepp, Wolfgang', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0001-5990-423X'}, {'name': 'Klimont, Zbigniew', 'affiliation': 'International Institute for Applied Systems Analysis 
(IIASA)', 'orcid': '0000-0003-2630-198X'}, {'name': 'Havlik, Petr', 'affiliation': 'International Institute for Applied Systems Analysis (IIASA)', 'orcid': '0000-0001-5551-5085'}, {'name': 'Krey, Volker', 'affiliation': 'International Institute for Applied Systems Analysis 
(IIASA)', 'orcid': '0000-0003-0307-3515'}], 'keywords': ['integrated assessment model', 'scenario', 'no-policy baseline'], 'related_identifiers': [{'identifier': '10.1016/j.envsoft.2018.11.012', 'relation': 'cites', 'resource_type': 'publication-article', 'scheme': 'doi'}, 
{'identifier': '10.22022/iacc/03-2021.17115', 'relation': 'cites', 'resource_type': 'publication-report', 'scheme': 'doi'}, {'identifier': '10.1038/s41558-021-01215-2', 'relation': 'isSupplementTo', 'resource_type': 'publication-article', 'scheme': 'doi'}, {'identifier': '1
0.1038/s41558-021-01218-z', 'relation': 'isSupplementTo', 'resource_type': 'publication-article', 'scheme': 'doi'}, {'identifier': '10.1088/1748-9326/ac09ae', 'relation': 'isSupplementTo', 'resource_type': 'publication-article', 'scheme': 'doi'}, {'identifier': '10.1038/s41
893-021-00772-w', 'relation': 'isSupplementTo', 'resource_type': 'publication-article', 'scheme': 'doi'}, {'identifier': '10.5281/zenodo.5553976', 'relation': 'isSupplementTo', 'resource_type': 'dataset', 'scheme': 'doi'}], 'version': '1.1', 'language': 'eng', 'grants': [{'
id': '10.13039/501100000780::821471'}], 'license': 'cc-by-sa-4.0', 'imprint_publisher': 'Zenodo', 'communities': [{'identifier': 'engage-climate'}, {'identifier': 'iiasa'}, {'identifier': 'iiasa-ece'}, {'identifier': 'message-ix'}], 'upload_type': 'dataset', 'prereserve_doi
': {'doi': '10.5281/zenodo.5793870', 'recid': 5793870}}, 'title': 'MESSAGEix-GLOBIOM R11 no-policy baseline', 'links': {'self': 'https://zenodo.org/api/records/5793870', 'self_html': 'https://zenodo.org/records/5793870', 'self_doi': 'https://zenodo.org/doi/10.5281/zenodo.57
93870', 'doi': 'https://doi.org/10.5281/zenodo.5793870', 'parent': 'https://zenodo.org/api/records/5793869', 'parent_html': 'https://zenodo.org/records/5793869', 'parent_doi': 'https://zenodo.org/doi/10.5281/zenodo.5793869', 'self_iiif_manifest': 'https://zenodo.org/api/iii
f/record:5793870/manifest', 'self_iiif_sequence': 'https://zenodo.org/api/iiif/record:5793870/sequence/default', 'files': 'https://zenodo.org/api/records/5793870/files', 'media_files': 'https://zenodo.org/api/records/5793870/media-files', 'archive': 'https://zenodo.org/api/
records/5793870/files-archive', 'archive_media': 'https://zenodo.org/api/records/5793870/media-files-archive', 'latest': 'https://zenodo.org/api/records/5793870/versions/latest', 'latest_html': 'https://zenodo.org/records/5793870/latest', 'draft': 'https://zenodo.org/api/re
cords/5793870/draft', 'versions': 'https://zenodo.org/api/records/5793870/versions', 'access_links': 'https://zenodo.org/api/records/5793870/access/links', 'access_users': 'https://zenodo.org/api/records/5793870/access/users', 'access_request': 'https://zenodo.org/api/recor
ds/5793870/access/request', 'access': 'https://zenodo.org/api/records/5793870/access', 'reserve_doi': 'https://zenodo.org/api/records/5793870/draft/pids/doi', 'communities': 'https://zenodo.org/api/records/5793870/communities', 'communities-suggestions': 'https://zenodo.org
/api/records/5793870/communities-suggestions', 'requests': 'https://zenodo.org/api/records/5793870/requests'}, 'record_id': 5793870, 'owner': 233639, 'files': [{'id': '96ec5297-801c-4fe8-b797-2804e88784c6', 'filename': 'MESSAGEix-GLOBIOM_1.1_R11_no-policy_baseline.xlsx', 'f
ilesize': 135453950, 'checksum': '222193405c25c3c29cc21cbae5e035f4', 'links': {'self': 'https://zenodo.org/api/records/5793870/files/96ec5297-801c-4fe8-b797-2804e88784c6'}}], 'state': 'done', 'submitted': True}                              
Traceback (most recent call last):                         
  File "/home/khaeru/vc/iiasa/models/bug.py", line 14, in <module>                                                                                                                                                                             
    result = p.fetch(list(args["registry"].keys())[0])                                                                 
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                 
  File "/home/khaeru/.venv/3.11/lib/python3.11/site-packages/pooch/core.py", line 588, in fetch                                                                                                                                                
    stream_download(                                                                                                   
  File "/home/khaeru/.venv/3.11/lib/python3.11/site-packages/pooch/core.py", line 803, in stream_download                                                                                                                                                                         
    downloader(url, tmp, pooch)                                                                                                                                                                                                                
  File "/home/khaeru/.venv/3.11/lib/python3.11/site-packages/pooch/downloaders.py", line 605, in __call__                                                                                                                                                                         
    download_url = data_repository.download_url(file_name)                                                                               
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                               
  File "/home/khaeru/.venv/3.11/lib/python3.11/site-packages/pooch/downloaders.py", line 805, in download_url                                                                                                                                                                     
    files = {item["key"]: item for item in self.api_response["files"]}                                                                   
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                   
  File "/home/khaeru/.venv/3.11/lib/python3.11/site-packages/pooch/downloaders.py", line 805, in <dictcomp>                                                                                                                                                                       
    files = {item["key"]: item for item in self.api_response["files"]}                                                                   
             ~~~~^^^^^^^                                            
KeyError: 'key'                                                     

System information

  • Operating system: Ubuntu Linux 23.04
  • Python installation: 3.11.4-1~23.04 from Ubuntu archives
  • Version of Python: 3.11.4
  • Version of this package: 1.7.0
@khaeru khaeru added the bug Report a problem that needs to be fixed label Oct 16, 2023
@paddyroddy
Copy link

I have the same problem, and couldn't work it out. Thanks for identifying!

@santisoler
Copy link
Member

santisoler commented Oct 17, 2023

Thanks @khaeru for opening this issue. I can reproduce the error with the script you shared, and I've also checked that our tests are failing because of the change in Zenodo API.

I'll try to work on a quick bugfix and make a new release. At the moment I see that https://developers.zenodo.org is down, so I won't be able to get further information about the new API and will have to rely on the JSON structure and try to cover most cases.

I'll probably ping you and @paddyroddy so you can test the bugfix against your use cases.

Thanks again for the detailed report! It's super helpful!

PS: I'm leaving the url for the API response for the repository in the example, just to have a quick way to access it when memory isn't helping: https://zenodo.org/api/records/5793870

@paddyroddy
Copy link

Thanks, @santisoler, for reference mine is just from here https://github.com/astro-informatics/sleplet/blob/12efdcd8d1b65900de7cea736c20d60c224aa9f4/src/sleplet/_data/setup_pooch.py#L11-L17

import pooch

_ZENODO_DATA_DOI = "10.5281/zenodo.7767698"
_POOCH = pooch.create(
    path=pooch.os_cache("sleplet"),
    base_url=f"doi:{_ZENODO_DATA_DOI}/",
    registry=None,
)
_POOCH.load_registry_from_doi()

@santisoler
Copy link
Member

Pooch v1.8.0 has been released, including the bugfix we merged in #375 to solve this issue.

The new release is already available in PyPI: https://pypi.org/project/pooch/

Availability through conda-forge might take a few hours.

Thanks @khaeru again for opening this issue and all of you that reported back.

@paddyroddy
Copy link

Thank you for the quick fix!

@santisoler
Copy link
Member

Just to keep everyone in the loop: I received another reply from Zenodo. They restored the old behaviour of the API, so our downloader is using the "legacy" version of the API. For the repository that @khaeru shared in the example, now we have the following list of files:

"files": [
    {
      "id": "878b8528-7706-436e-9536-b2a1a838ce14",
      "key": "santisoler/pooch-test-data-v1.zip",
      "size": 893,
      "checksum": "md5:6cdda261f5646a4089966fd0bf505233",
      "links": {"self": "https://zenodo.org/api/records/7632643/files/santisoler/pooch-test-data-v1.zip/content"}
    }
],

Note that:

  • The filename is under the "key" key again
  • The checksum contains the md5: string
  • The file can be downloaded from the link in "self".

Since the support for this API was kept, our downloader is still working just fine!

@paddyroddy
Copy link

Unrelated, but I've noticed that the API seems to be a lot more rate limiting that it used to be, i.e. my CI is breaking. Have you noticed the same?

@santisoler
Copy link
Member

I've rerun Pooch's tests both locally and through GitHub Actions and I haven't noticed that.

If this persists, I would recommend you to get in touch with Zenodo, they are super responsive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report a problem that needs to be fixed
Projects
None yet
3 participants