Move boilerplate required field None checks into ProviderDataIngester #1378
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟩 priority: low
Low priority and doesn't need to be rushed
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Current Situation
ProviderDataIngester child classes currently all must implement logic in
get_record_data
to return early when one of the required fields (foreign_landing_url
,image/audio_url
,foreign_identifier
,license_info
) is None. Example:This logic is repeated in each of our provider scripts, causing code duplication. There's also a risk that a new contributor might forget to implement this logic in a new provider script.
Suggested Improvement
We should try to move this logic into the base class. One idea is, instead of having a single abstract
get_record_data
method, to have an abstract method for each required field as well as aget_optional_fields
method. Thenget_record_data
could contain the shared logic:^ Just a starting point. For example, this does not support returning lists yet.
See the implementation of checks in Europeana for another idea: WordPress/openverse-catalog#821 (comment) using decorators. This approach is a bit cleaner especially in the event that the required fields list expands, and it also adds extra logging.
Some things we need to keep in mind:
get_record_data
. See SMK for an example of a complex use caseBenefit
Additional context
This was discussed on WordPress/openverse-catalog#821 in this comment thread.
Implementation
The text was updated successfully, but these errors were encountered: