Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airflow db clean task_instance takes a long time #24484

Closed
2 tasks done
wolfier opened this issue Jun 15, 2022 · 1 comment · Fixed by #24488
Closed
2 tasks done

airflow db clean task_instance takes a long time #24484

wolfier opened this issue Jun 15, 2022 · 1 comment · Fixed by #24488
Labels
area:core kind:bug This is a clearly a bug

Comments

@wolfier
Copy link
Contributor

wolfier commented Jun 15, 2022

Apache Airflow version

2.3.1

What happened

When I ran the airflow db clean task_instance command, it can take up to 9 hours to complete. The database around 3215220 rows in the task_instance table and 51602 rows in the dag_run table. The overall size of the database is around 1 TB.

I believe the issue is because of the cascade constraints on others tables as well as the lack of indexes on task_instance foreign keys.

Running delete on a small number of rows gives this shows most of the time is spent in xcom and task_fail tables

explain (analyze,buffers,timing) delete from task_instance t1 where t1.run_id = 'manual__2022-05-11T01:09:05.856703+00:00'; rollback;
Trigger for constraint task_reschedule_ti_fkey: time=3.208 calls=23
Trigger for constraint task_map_task_instance_fkey: time=1.848 calls=23
Trigger for constraint xcom_task_instance_fkey: time=4457.779 calls=23
Trigger for constraint rtif_ti_fkey: time=3.135 calls=23
Trigger for constraint task_fail_ti_fkey: time=1164.183 calls=23

I temporarily fixed it by adding these indexes.

create index idx_task_reschedule_dr_fkey on task_reschedule (dag_id, run_id);
create index idx_xcom_ti_fkey on xcom (dag_id, task_id, run_id, map_index);
create index idx_task_fail_ti_fkey on task_fail (dag_id, task_id, run_id, map_index);

What you think should happen instead

It should not take 9 hours to complete a clean up process. Before upgrading to 2.3.x, it was taking no more than 5 minutes.

How to reproduce

No response

Operating System

N/A

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@wolfier wolfier added area:core kind:bug This is a clearly a bug labels Jun 15, 2022
@dstandish
Copy link
Contributor

@wolfier do you have a pgdump of this database so we can do testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants