Application should be refreshed in-between sync retries #12904

jannfis · 2023-03-16T20:43:35Z

Summary

When a sync fails for some reason, and retry is enabled, the Application should be refreshed in between the sync retries instead of re-using the same sync context for each retry.

Motivation

When auto-sync is enabled, a sync that runs with retries enabled may take a long time to complete if there is some kind of unrecoverable error (for example, an erroneous manifest), even if it is already fixed at the source. Even if Argo CD receives a refresh in the time the broken sync is running in its retry-loop, it won't consider any new changes in the repository, ultimately failing auto-sync until the next commit or manual refresh of the application.

Similarly, if self-heal is enabled, the following situation can occur:

Applications manages the Namespace itself and a couple of resources in it
Somebody deletes the Namespace on the cluster
Kubernetes deletes resources within that Namespace
Argo CD receives an event that a managed resource was deleted and starts a sync for self-heal to restore the deleted resource
Meanwhile, Kubernetes also deleted the Namespace
Sync triggered by self-heal will fail, because the target namespace doesn't exist
Argo CD enters sync-retry loop with the same, previous sync-context without considering the deleted Namespace that should be self-healed too
After 5 retries, the sync fails and leaves the cluster in a broken state that needs to be recovered manually, despite auto-sync and self-heal is enabled

Proposal

With sync retries enabled, Argo CD should perform a refresh and update of its sync-context on sync error before proceeding to the next tries. It should:

Pick up any changes to targetRevision made in the source between the time the sync started and the retry and
Pick up any changes surfaced by self-heal to be included in the next retry

The text was updated successfully, but these errors were encountered:

oscrx · 2023-03-16T21:41:56Z

I think this is related to the issue I reported earlier #10303

jannfis · 2023-03-17T00:07:52Z

@oscrx Yep. It seems to be very related. Thanks for linking.

jannfis added the enhancement New feature or request label Mar 16, 2023

jannfis added type:usability Enhancement of an existing feature component:core Syncing, diffing, cluster state cache labels Mar 17, 2023

oscrx mentioned this issue Apr 28, 2023

Argocd does not honour sync waves in partial syncs #10303

Closed

3 tasks

jannfis mentioned this issue Nov 29, 2023

Auto Sync terminate #16489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application should be refreshed in-between sync retries #12904

Application should be refreshed in-between sync retries #12904

jannfis commented Mar 16, 2023

oscrx commented Mar 16, 2023

jannfis commented Mar 17, 2023

Application should be refreshed in-between sync retries #12904

Application should be refreshed in-between sync retries #12904

Comments

jannfis commented Mar 16, 2023

Summary

Motivation

Proposal

oscrx commented Mar 16, 2023

jannfis commented Mar 17, 2023