You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I ran the airflow db clean task_instance command, it can take up to 9 hours to complete. The database around 3215220 rows in the task_instance table and 51602 rows in the dag_run table. The overall size of the database is around 1 TB.
I believe the issue is because of the cascade constraints on others tables as well as the lack of indexes on task_instance foreign keys.
Running delete on a small number of rows gives this shows most of the time is spent in xcom and task_fail tables
explain (analyze,buffers,timing) delete from task_instance t1 where t1.run_id = 'manual__2022-05-11T01:09:05.856703+00:00'; rollback;
Trigger for constraint task_reschedule_ti_fkey: time=3.208 calls=23
Trigger for constraint task_map_task_instance_fkey: time=1.848 calls=23
Trigger for constraint xcom_task_instance_fkey: time=4457.779 calls=23
Trigger for constraint rtif_ti_fkey: time=3.135 calls=23
Trigger for constraint task_fail_ti_fkey: time=1164.183 calls=23
I temporarily fixed it by adding these indexes.
create index idx_task_reschedule_dr_fkey on task_reschedule (dag_id, run_id);
create index idx_xcom_ti_fkey on xcom (dag_id, task_id, run_id, map_index);
create index idx_task_fail_ti_fkey on task_fail (dag_id, task_id, run_id, map_index);
What you think should happen instead
It should not take 9 hours to complete a clean up process. Before upgrading to 2.3.x, it was taking no more than 5 minutes.
Apache Airflow version
2.3.1
What happened
When I ran the
airflow db clean task_instance
command, it can take up to 9 hours to complete. The database around 3215220 rows in thetask_instance
table and 51602 rows in thedag_run
table. The overall size of the database is around 1 TB.I believe the issue is because of the cascade constraints on others tables as well as the lack of indexes on task_instance foreign keys.
Running delete on a small number of rows gives this shows most of the time is spent in xcom and task_fail tables
I temporarily fixed it by adding these indexes.
What you think should happen instead
It should not take 9 hours to complete a clean up process. Before upgrading to 2.3.x, it was taking no more than 5 minutes.
How to reproduce
No response
Operating System
N/A
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: