Add configuration to skip specific ingestion errors #1447
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟩 priority: low
Low priority and doesn't need to be rushed
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🔧 tech: airflow
Involves Apache Airflow
🐍 tech: python
Involves Python
Problem
WordPress/openverse-catalog#650 added dagrun conf options to make a provider script skip ingestion errors. This is currently all-or-nothing; when
skip_ingestion_errors
istrue
, all errors (except for AirflowExceptions) are caught, allowing ingestion to continue, and then re-raised at the end of ingestion.In reality, it's likely that we'll want to skip specific errors, but we may still want ingestion to be halted if an unexpected error is thrown.
Description
We can update the conf option to be similar to the
silenced_slack_alerts
configuration (#654), which allows matching an error predicate. Instead of a boolean value,skip_ingestion_errors
could be a list of strings to match on.Like
silenced_slack_alerts
, we should make sure this allows us to match specific error message text and error types (ie adding 'KeyError' to the list should allow us to skip all KeyErrors). It may be nice to make sure there's still a way to easily skip all errors if that's truly what we want.Implementation
The text was updated successfully, but these errors were encountered: