Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Europeana to collect image dimensions #1484

Closed
1 task
stacimc opened this issue Aug 2, 2022 · 4 comments · Fixed by #2782
Closed
1 task

Update Europeana to collect image dimensions #1484

stacimc opened this issue Aug 2, 2022 · 4 comments · Fixed by #2782
Assignees
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Collaborator

stacimc commented Aug 2, 2022

Problem

We'd like to make sure that all of our image provider scripts are collecting data about image dimensions.

Description

The Europeana script does not currently do so, but the information is available in the API via the single record endpoint. We should update the provider script to collect this info.

There is precedent for making an additional API request per record, notably Flickr but also some others like Metropolitan and NYPL. That being said we should make sure that adding it here doesn't have any dramatic effect on performance.

Additional context/dependent PRs

Note that Europeana is currently turned off in production, because turning it on will initiate a backfill. We should wait for WordPress/openverse-catalog#644 to make sure that we can omit the DAG from Slack notifications, to prevent flooding the channel. <- This has been merged.

We also have an issue to update Europeana to use the new API endpoint, and update any necessary fields: #1727. It may be helpful or even necessary to do this work first. <- WordPress/openverse-catalog#974 has been merged.

Implementation

  • 🙋 I would be interested in implementing this feature.
@stacimc stacimc added ✨ goal: improvement Improvement to an existing user-facing feature 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Aug 2, 2022
@obulat
Copy link
Contributor

obulat commented Aug 23, 2022

Adding here that Europeana script also needs to collect file type and file size information.

@krysal
Copy link
Member

krysal commented Oct 7, 2022

Given Europeana DAG hasn't been activated yet this is probably more dependent on updating the API endpoint, so I'll move it to the Provider DAG stability milestone.

@AetherUnbound AetherUnbound added 🟩 priority: low Low priority and doesn't need to be rushed and removed 🛠 goal: fix Bug fix labels Oct 19, 2022
@stacimc
Copy link
Collaborator Author

stacimc commented Oct 19, 2022

Since this ticket is specifically just for populating image dimensions, I think this should be in the Data normalization milestone. @krysal I see you moved it out of that milestone -- would you object to moving it back? We have other changes to Europeana (#109) in Stabilization.

@krysal
Copy link
Member

krysal commented Oct 20, 2022

It's okay! I moved it to out of the Data Normalization milestone because it is not an active DAG and thought it was going to be fixed after the milestone. So if it gets fixed before then it makes sense to (re)include it 👍

@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023
@rwidom rwidom self-assigned this Mar 1, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Openverse Backlog Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants