-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airflow hangs on "Found x duplicates in table task_fail. Will attempt to move them" #23472
Comments
Hi @TPapajCin, We discovered recently that one of the pre-upgrade checks didn't perform Could you try with that patch applied in your environment and report back? I'd also be curious to know how many rows you have in dag_run, Thanks! |
I can confirm that the fix worked! The "Creating tables" log appeared 10 seconds after the "Found x duplicates" log. It successfully made migrations and my Airflow instances now work on 2.3.0 version correctly. Thank you 👍 About the environment, I do not have access to the database but from the UI I can tell that I have around 200k DAG runs and around 400-450k task runs in total. There are ~35 DAG files. |
Same thing impacted me, likely due to a lot of records and a custom entrypoint. 6,856,476 rows for task_instance Ultimately, I was able to get things up and running. To do so, here are the steps I took. I would not recommend this, but just in case someone gets stuck like I did.
|
Happy to hear it. Thanks for testing it out @TPapajCin! Just for some extra clarity, the inefficient query was part of the pre-migration checks which would have automatically moved those bad rows @ldacey ran into into a separate table. You actually skipped the inefficient query completely with how you upgraded, but you got to the same place at the end of the day ( In case you don't know, you can also easily get a patch file directly from github. Just add The fix is already slated for 2.3.1 which should be coming pretty soon here, so I'm going to close this issue. |
Got it, thanks. I did not know about that .patch method either. I think my issue was that the |
I see, that's good to know, thanks. We have some more optimizations in the pipeline that will hopefully speed it up even more. |
@ldacey What DB are you on? Do I remember you are an MSSQL shop? |
Sorry for the late reply, I am using PostgreSQL. I assume that I just became impatient and canceled the db upgrade command prematurely. I am not sure why my task_fail table had some rows (85) which did not exist in the task_instance table though |
#23458 is a life saver! 20 hours db migration becomes 20 seconds! 😂 curl -L https://github.com/apache/airflow/pull/23458.patch | git apply -v --index |
Apache Airflow version
2.3.0 (latest released)
What happened
After upgrade from 2.2.2 to 2.3.0, when I am trying to do
airflow db upgrade
all I get is following log and then airflow hangs:Found 16 duplicates in table task_fail. Will attempt to move them.
What you think should happen instead
Airflow should not hang at this log and properly move these duplicates
How to reproduce
Operating System
Debian 10
Versions of Apache Airflow Providers
Deployment
Other Docker-based deployment
Deployment details
Google GKE
2 Instances of Airflow processes running(HA configuration)
Anything else
Problem occurs every time I am trying to update database.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: