Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application should be refreshed in-between sync retries #12904

Open
jannfis opened this issue Mar 16, 2023 · 2 comments
Open

Application should be refreshed in-between sync retries #12904

jannfis opened this issue Mar 16, 2023 · 2 comments
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request type:usability Enhancement of an existing feature

Comments

@jannfis
Copy link
Member

jannfis commented Mar 16, 2023

Summary

When a sync fails for some reason, and retry is enabled, the Application should be refreshed in between the sync retries instead of re-using the same sync context for each retry.

Motivation

When auto-sync is enabled, a sync that runs with retries enabled may take a long time to complete if there is some kind of unrecoverable error (for example, an erroneous manifest), even if it is already fixed at the source. Even if Argo CD receives a refresh in the time the broken sync is running in its retry-loop, it won't consider any new changes in the repository, ultimately failing auto-sync until the next commit or manual refresh of the application.

Similarly, if self-heal is enabled, the following situation can occur:

  • Applications manages the Namespace itself and a couple of resources in it
  • Somebody deletes the Namespace on the cluster
  • Kubernetes deletes resources within that Namespace
  • Argo CD receives an event that a managed resource was deleted and starts a sync for self-heal to restore the deleted resource
  • Meanwhile, Kubernetes also deleted the Namespace
  • Sync triggered by self-heal will fail, because the target namespace doesn't exist
  • Argo CD enters sync-retry loop with the same, previous sync-context without considering the deleted Namespace that should be self-healed too
  • After 5 retries, the sync fails and leaves the cluster in a broken state that needs to be recovered manually, despite auto-sync and self-heal is enabled

Proposal

With sync retries enabled, Argo CD should perform a refresh and update of its sync-context on sync error before proceeding to the next tries. It should:

  • Pick up any changes to targetRevision made in the source between the time the sync started and the retry and
  • Pick up any changes surfaced by self-heal to be included in the next retry
@jannfis jannfis added the enhancement New feature or request label Mar 16, 2023
@oscrx
Copy link
Contributor

oscrx commented Mar 16, 2023

I think this is related to the issue I reported earlier #10303

@jannfis
Copy link
Member Author

jannfis commented Mar 17, 2023

@oscrx Yep. It seems to be very related. Thanks for linking.

@jannfis jannfis added type:usability Enhancement of an existing feature component:core Syncing, diffing, cluster state cache labels Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request type:usability Enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants