Allow failed ingestion tasks to retry where they left off #1653
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟨 priority: medium
Not blocking but should be addressed soon
Problem
See WordPress/openverse-catalog#357
If an ingestion task fails, a retry can be attempted. This retry has no context for the previous run, and so it will begin reprocessing the data from the API at the very beginning, rather than where it left off.
Description
We should investigate the feasibility of allowing a task retry to pick up from the last set of query parameters.
We'll need to be careful how we implement this, as the ramifications might cause problems for us. If we attempt to retry within the same DAG, we'll create a situation where different task attempts actually have different outcomes. This goes a bit against the Airflow paradigm, particularly since Airflow displays the most recent log first and said log may not have all of the run information (if previous runs captured some amount of data but later failed). Additionally, the downstream loader steps will also require multiple attempts, each of which may result in separate successful data loads.
I think one alternative to this may be:
This would let us start another run after a failure manually, which would pick up the query parameters and continue from where the previous run left off. It would also encapsulate that in two (or more) distinct DAG runs, so we don't have to expect an Airflow user to look through multiple task retry logs to get the full picture.
Alternatives
Additional context
Implementation
The text was updated successfully, but these errors were encountered: