-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Change approach to finding bad rows to LEFT OUTER JOIN. (#23528)
Rather than sub-selects (two for count, or one for the CREATE TABLE). For a _large_ database (27m TaskInstances, 2m DagRuns) this takes the time from 10minutes to around 3 minutes per table (we have 3) down to 3 minutes per table. (All times on Postgres.) Before: ```sql CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml + FROM rendered_task_instance_fields WHERE NOT ( EXISTS ( SELECT 1 FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id WHERE rendered_task_instance_fields.dag_id = task_instance.dag_id AND rendered_task_instance_fields.task_id = task_instance.task_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date ) ) ``` After: ```sql CREATE TABLE _airflow_moved__2_3__dangling__rendered_task_instance_fields AS SELECT rendered_task_instance_fields.dag_id AS dag_id, rendered_task_instance_fields.task_id AS task_id, rendered_task_instance_fields.execution_date AS execution_date, rendered_task_instance_fields.rendered_fields AS rendered_fields, rendered_task_instance_fields.k8s_pod_yaml AS k8s_pod_yaml + FROM rendered_task_instance_fields LEFT OUTER JOIN dag_run ON rendered_task_instance_fields.dag_id = dag_run.dag_id AND rendered_task_instance_fields.execution_date = dag_run.execution_date LEFT OUTER JOIN task_instance ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id AND rendered_task_instance_fields.task_id = task_instance.task_id WHERE task_instance.dag_id IS NULL OR dag_run.dag_id IS NULL ; ```
- Loading branch information
Showing
1 changed file
with
40 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters