-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RdsDeleteDbInstanceOperator sometimes does not complete when run with deferrable=True
#35563
Comments
I think it might be fix in 8.11.0, it would be nice if you could check Reference PR: |
Oh, nice! I pulled that change into my Docker image and I've managed two successful runs of my actual DAG so far that delete six RDS instances each. I guess I'll be waiting eagerly for 8.11.0 to be released. |
After further testing of this fix, I discovered that it does not seem to make the problem any better when using |
@cliebBS Could you update update examples and description, that this operator won't work only in deferrable mode? |
deferrable=True
@Taragolis The title of the ticket, the description, and the reproduction case have all been updated to reflect the issue now being specific to |
I setup your repro dag on latest main and did about 30 RDS DB creations and deferred deletions and could not repro the issue. A lot of the code has changed recently, I suspect the issue is maybe resolved already? Let me know if you can reproduce the issue on latest main. |
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. |
This issue has been closed because it has not received response from the issue author. |
Apache Airflow version
2.7.3
What happened
When using the
RdsDeleteDbInstanceOperator
to delete an RDS instance in my DAG while usingdeferrable=True
, sometimes it misses the fact that the instance was deleted and leaves logs like:In the above example, at some point between 19:33:14 and 19:33:44, the actual instance was deleted, but the operator doesn't realize that it was deleted and instead continues to poll for the status of the RDS instance until it reaches the
waiter_max_attempts
, at which point it fails. Retries of the operator exit immediately with the log message:I /think/ what's happening is that the deletion is completing in the time between when the waiter times out and when it reschedules to run again. Since the RDS delete operation doesn't leave any sign of the DB in the AWS API (unlike a terminated EMR cluster, for example), running the waiter again leaves the operator in a weird state where it thinks the resource exists, but is unable to get a status for it.
What you think should happen instead
The
RdsDeleteDbInstanceOperator
should always complete in the success state when the RDS instance is deleted.How to reproduce
You can increase the number of RDS instances this will spin up at once to increase the odds that you'll trigger this problem. It also seems like this is much easier to reproduce when running on EKS using the official Helm chart vs running locally using the official
docker-compose.yaml
, though I have no idea why that should be making a difference.Operating System
Official Docker image
Versions of Apache Airflow Providers
Deployment
Docker-Compose
Deployment details
MacOS Sonoma 14.0
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: