This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Extract MediaStorage entity as parent to ImageStore #83
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes WordPress/openverse#1739
This PR extracts the data and methods from the
ImageStore
class that are common to all media types, and creates an Abstract base class calledMediaStore
.MediaStore
has the methods for validating the tags, metadata, license information, the resulting TSV rows, and for writing the buffer to the disk. It also has an abstract method,add_item
, which will be implemented by all child classes (ImageStore
currently, butAudioStore
will be added shortly). This method will handle the validation of single item metadata.This is the third iteration of adding audio :) This time, in parallel with the API, this PR only handles the
MediaStore
extraction, so it can be merged before we have a final decision on what Audio metadata we want to save.All the data fields we currently collect for images from providers can be found in IMAGE_TSV_COLUMNS list. Here, they are listed in order they are written to TSV, and the fields that are common for all media are in bold:
There are several ways this PR can be tested:
docker exec cc_catalog_airflow_webserver_1 /usr/local/airflow/.local/bin/pytest
or
.tsv
file inside your/tmp
folder. Hopefully, all fields collected should be written in the.tsv
file.or
src/cc_catalog_airflow/dags/common
folder, and then selectRun 'pytest in common'
to run the tests only for the common module that was changed in the PR.