Add filetype
to all images in the catalog DB
#1560
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Milestone
Current Situation
There are currently 563 004 660 images without a file type in the database.
We need to add the filetype information to all images.
Suggested Improvement
There are several things that need to be done here:
Benefit
Data consistency.
Additional context
We are currently also extracting extensions during the Elasticsearch indexing:
https://github.com/WordPress/openverse-api/blob/2e85caf7aede8aaf9d77cd5cb050f50b860ee58e/ingestion_server/ingestion_server/elasticsearch_models.py#L135-L144
This should be done entirely in the catalog.
Here, we should also make sure that we don't create duplicate
filetype
for types such asjpg
/jpeg
.Implementation
The text was updated successfully, but these errors were encountered: