Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application controller should be more resilient to network latency or K8s API server hiccups #7692

Closed
jannfis opened this issue Nov 12, 2021 · 0 comments · Fixed by #16154 · May be fixed by aborilov/argo-cd#3
Closed

Application controller should be more resilient to network latency or K8s API server hiccups #7692

jannfis opened this issue Nov 12, 2021 · 0 comments · Fixed by #16154 · May be fixed by aborilov/argo-cd#3
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request type:scalability Issues related to scalability and performance related issues type:supportability Enhancements that help operators to run Argo CD

Comments

@jannfis
Copy link
Member

jannfis commented Nov 12, 2021

Summary

On network links with a very high latency, or which are very slow, sometimes a remote Kubernetes API drops the connection before a certain request (e.g. fetching remote resources, list of available APIs, etc) could be completed. This currently leads to comparison errors in the application or in aborting other operations (e.g. retrieving manifests using argocd app manifests). Effectually, this can prevent a sync from happening when such a hiccup occurs within a given operation, because the errors are considered fatal.

This can be observed whenever the network link between the Argo CD control plane and the managed cluster may be a little flaky.

Motivation

Improve stability on high latency networks and don't fail operations on the first hiccup.

Proposal

On certain, non-permanent errors when accessing remote Kubernetes API endpoints, we should retry the request when it fails.

@jannfis jannfis added enhancement New feature or request component:core Syncing, diffing, cluster state cache type:scalability Issues related to scalability and performance related issues type:supportability Enhancements that help operators to run Argo CD labels Nov 12, 2021
alexmt pushed a commit that referenced this issue Nov 2, 2023
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
jmilic1 pushed a commit to jmilic1/argo-cd that referenced this issue Nov 13, 2023
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
Signed-off-by: jmilic1 <[email protected]>
aborilov added a commit to aborilov/argo-cd that referenced this issue Nov 21, 2023
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
vladfr pushed a commit to vladfr/argo-cd that referenced this issue Dec 13, 2023
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
tesla59 pushed a commit to tesla59/argo-cd that referenced this issue Dec 16, 2023
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
alexmt pushed a commit to alexmt/argo-cd that referenced this issue Jan 19, 2024
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
alexmt pushed a commit to alexmt/argo-cd that referenced this issue Jan 19, 2024
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
lyda pushed a commit to lyda/argo-cd that referenced this issue Mar 28, 2024
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
Signed-off-by: Kevin Lyda <[email protected]>
aborilov added a commit to aborilov/argo-cd that referenced this issue Apr 29, 2024
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
Hariharasuthan99 pushed a commit to AmadeusITGroup/argo-cd that referenced this issue Jun 16, 2024
* add retry logic for k8s client

Signed-off-by: Pavel Aborilov <[email protected]>

* add docs for retry logic and envs to manifests

Signed-off-by: Pavel Aborilov <[email protected]>

---------

Signed-off-by: Pavel Aborilov <[email protected]>
Signed-off-by: Pavel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:core Syncing, diffing, cluster state cache enhancement New feature or request type:scalability Issues related to scalability and performance related issues type:supportability Enhancements that help operators to run Argo CD
Projects
None yet
1 participant