The add_license_url
DAG keeps timing out
#4348
Labels
💻 aspect: code
Concerns the software code in the repository
🛠 goal: fix
Bug fix
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
🔧 tech: airflow
Involves Apache Airflow
Description
This DAG keeps timing out for unknown reasons when the number of items to modify is relatively high (>500k). Instead, it was verified that the
batched_update
DAG can handle this kind of updates for loads of millions of row. It was tested to back fill the license (by-nc-sa, 2.0) and it updated 11,090,909 records successfully.However, continuous executions have resulted in the reappearance of licenses in the group of rows missing the field, so there could be ingestion flows that are not filling in this data or some other problem (#4318). I'd like to update the
add_license_url
DAG to use thebatched_update
and automate this process until we make sure all rows are complete.Additional context
Related to #3885 and #4318.
The text was updated successfully, but these errors were encountered: