Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filesize to all images in the catalog #1558

Closed
obulat opened this issue May 20, 2022 · 1 comment
Closed

Add filesize to all images in the catalog #1558

obulat opened this issue May 20, 2022 · 1 comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon

Comments

@obulat
Copy link
Contributor

obulat commented May 20, 2022

Problem

Currently, 561,894,897 images do not have the filesize value set.

Description

We should audit the provider scripts to see which of them do not get the filesize and update them.
After that, we should run a backfill to get the file size information from providers, probably also updating the width, height and filetype at the same time.

@obulat obulat added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository data normalization labels May 20, 2022
@obulat obulat mentioned this issue May 20, 2022
29 tasks
@obulat
Copy link
Contributor Author

obulat commented Jun 16, 2022

Adding a note here that filesize is not available in some provider API responses. It doesn't seem necessary to provide filesize value for all items because frontend performance does not rely on it (unlike the image dimensions, for example). Getting the filesize through additional HEAD requests during the catalog ingestion will slow down the process. The API consumer can also get the filesize using a HEAD request if they need it.

It is not possible to backfill the filesize. Unlike the filetype, which we can extract from the media URL that already exists in the database, filesize information needs to be ingested from the providers. So, I am going to close this issue as solving #1545 should be enough.

@obulat obulat closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2022
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat moved this from 📋 Backlog to ✅ Done in Openverse Backlog Apr 24, 2023
@dhruvkb dhruvkb added this to the Data normalization milestone Dec 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon
Projects
Archived in project
Development

No branches or pull requests

2 participants