Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto Sync terminate #16489

Open
alexmt opened this issue Nov 29, 2023 · 6 comments
Open

Auto Sync terminate #16489

alexmt opened this issue Nov 29, 2023 · 6 comments
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request

Comments

@alexmt
Copy link
Collaborator

alexmt commented Nov 29, 2023

Summary

An ability to override "stuck" auto-sync operation. It is known that sync might stuck due to various reasons: sync job could not be complete due to "image pull backoff"; deployment cannot reach a healthy state due to failing readiness probe, etc. Ideally, it should be enough to fix the root cause and let Argo CD deploy new changes. However, currently, Argo CD is not going to give up on a first sync if new changes are detected.

Motivation

Preview environments. Argo CD application generated by Appset for a pull request might fail, because code in the PR might have issue. Engineer should be able to just fix bug in the code, push new change to the PR and see updated synced applciation.

Proposal

Introduce a syncPolicy.terminate setting that allows configuring automatic operation termination when based on the state of configured "problematic" resources.

Example below cancels automatic sync :

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: guestbook
spec:
  project: default
  source:
    repoURL: https://github.com/argoproj/argocd-example-apps.git
    path: guestbook

  syncPolicy:
    terminate:
         timeout: 10m # global timeout (see https://github.com/argoproj/argo-cd/issues/6055 )
         resource:
             - group: batch
                kind: Job
                name: upgrade-database-*
                status: Progressing
                timeout: 1m # sync should be terminated if `upgrade-database` sync hook stuck for longer than 1 minute
@alexmt alexmt added enhancement New feature or request component:core Syncing, diffing, cluster state cache labels Nov 29, 2023
@alexmt alexmt changed the title Auto Sync override Auto Sync terminate Nov 29, 2023
@jannfis
Copy link
Member

jannfis commented Nov 29, 2023

I think the problem is rooted deeper. Retry in auto sync never picks up new changes from the source, and keeps iterating over the same target revision until the retry policy is exhausted.

With a progressing timeout, this will not cater for above use case. While termination of long progressing resources is also required (and a great idea btw), I think that Argo CD should check the source (and parameters in the Application) for changes and potentially re-start the sync upon such changes.

@jannfis
Copy link
Member

jannfis commented Nov 29, 2023

Refer: #12904

@blakepettersson
Copy link
Member

blakepettersson commented Dec 2, 2023

Potentially (partially) addressed with #15603?

@phyzical
Copy link
Contributor

phyzical commented Sep 27, 2024

@blakepettersson i dont think so as that is around retries where as this addresses the issue where you find syncs stuck at times like 20+ hours, or rather they still have not retried once

@phyzical
Copy link
Contributor

@jannfis i guess the motivation based on the issue trail that led me here is for example,

  • new image push
  • a rollout gets stuck in a degraded condition due to a analysis failing
  • another sync occurs causing sync to get stuck syncing waiting for the rollout to finish indefinitely (maybe its an issue with rollouts?..)
  • new image ref is pushed to git
  • app pulls new image ref starts a sync
  • As the previous sync never cancels stuck for an infinite amount of time with 0 retries waiting for the old rollout to become healthy before actually starting the new sync
  • manually cancel 20+ hour sync and new sync occurs with new image <--- this is where i would hope this new mechanism comes into play so it auto clicks cancel for me
  • all goes green

@sfynx
Copy link
Contributor

sfynx commented Oct 15, 2024

Yeah, this is something I'd need as well. We wish to control lifecycle of ApplicationSet-managed apps purely through Git, but right now we need to have a CI process in between which does a terminate op to any app that is currently syncing to prevent the eternal sync issue when things get stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants