Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load between controllers (argocd-application-controller) is not evenly distributed #6125

Open
vikas027 opened this issue Apr 29, 2021 · 54 comments
Assignees
Labels
component:application-controller component:core Syncing, diffing, cluster state cache enhancement New feature or request type:scalability Issues related to scalability and performance related issues

Comments

@vikas027
Copy link

vikas027 commented Apr 29, 2021

Describe the bug

I have a ArgoCD High Availability setup where I have also scaled the number of replicas in argocd-application-controller as shown in the documentation.

To Reproduce

  • Follow the steps to deploy ArgoCD in HA mode
  • Edit the argocd-application-controller as below
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: argocd-application-controller
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: argocd-application-controller
        env:
        - name: ARGOCD_CONTROLLER_REPLICAS
          value: "3"

Expected behavior

I was expecting the controller to distribute the load to all three controllers but only one took up all the load, rest two are sitting idle.

Screenshots

All pods running in HA mode

❯ k get po                        
NAME                                      READY   STATUS    RESTARTS   AGE
argocd-application-controller-0           1/1     Running   0          160m
argocd-application-controller-1           1/1     Running   0          160m
argocd-application-controller-2           1/1     Running   0          161m
argocd-dex-server-7b6f9b7f-qh4kv          1/1     Running   0          3h6m
argocd-redis-ha-haproxy-d6dbf6695-4q5cj   1/1     Running   0          3h4m
argocd-redis-ha-haproxy-d6dbf6695-4sh7k   1/1     Running   0          3h5m
argocd-redis-ha-haproxy-d6dbf6695-hjn2d   1/1     Running   0          3h4m
argocd-redis-ha-server-0                  2/2     Running   0          176m
argocd-redis-ha-server-1                  2/2     Running   0          177m
argocd-redis-ha-server-2                  2/2     Running   0          179m
argocd-repo-server-5f4d4775d4-4mw4j       1/1     Running   0          173m
argocd-repo-server-5f4d4775d4-vhgxk       1/1     Running   0          174m
argocd-server-86896bd76f-gz48t            1/1     Running   0          173m
argocd-server-86896bd76f-k5r9h            1/1     Running   0          174m

Screenshot of the pods resources
2021-04-29_15-46-09

Version

❯ argocd version                                                                                                                        
argocd: v2.0.1+33eaf11.dirty
  BuildDate: 2021-04-17T04:23:35Z
  GitCommit: 33eaf11e3abd8c761c726e815cbb4b6af7dcb030
  GitTreeState: dirty
  GoVersion: go1.16.3
  Compiler: gc
  Platform: darwin/amd64
@vikas027 vikas027 added the bug Something isn't working label Apr 29, 2021
@fewkso
Copy link

fewkso commented Jun 8, 2021

I think the load is distributed/sharded by cluster, so it might be more of an improvement than an actual bug
https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L1517

@alexmt alexmt self-assigned this Jun 8, 2021
@PatTheSilent
Copy link

Having an option to shard by something else than cluster would be much appreciated. Because I really dislike having single points of failure I have an ArgoCD stack in each of my clusters and due to this limitation I can only vertically scale the Application Controller, which is far from ideal.

@cwwarren
Copy link

cwwarren commented Aug 12, 2021

Use case: we deploy a number of QA/Staging/Preview environments for one product in a single cluster to save costs. In addition to having many more environments (ArgoCD Applications) these environments are much higher churn, being updated anywhere from a few times a day to many times per hour at peak.

Ideally applications would be evenly distributed across shards to balance load so we aren't stuck over-allocating resources to underutilized shards.

Update 8/12 - ArgoCD 2.1.0-rc3 reduced resource use significantly, but the issue of highly imbalanced shards remains.

@maxbrunet
Copy link
Contributor

I have written a script parsing the output of the argocd-util cluster stats command and suggesting a shard number for each cluster based on their resources count:

https://gist.github.com/maxbrunet/373374690b5064203e5e714de97d37fa

The script currently works offline, but we could imagine integrating such logic in the application controller or build a dedicated service around it. The result would likely read the cluster stats from Redis directly.

The algorithm is not perfect as I note in the caveats (see README.md), we could also base it on a different metric and/or work with a static number of controllers.

@victorboissiere
Copy link
Contributor

We have the same use case as well where we usually have 3 to 6k of ArgoCD applications in our staging environment. The fact that the sharding is on a per-cluster basis instead of per-app is not helping much because we deploy everything into the same cluster (in staging).

@kvendingoldo
Copy link

We have the same issue with load, only one shard is working and during the high load it fails, but all others do not pick tasks.

@aiceball
Copy link

aiceball commented Jun 8, 2022

Same scenario here, we have a very heavy application deployed on a single cluster and adding another application controller replica does not distribute the load evenly.

@dtelaroli
Copy link

dtelaroli commented Jun 8, 2022

I have the same issue.
With 3 replicas, 1 replica restarts each time when it drinks the entire instance memory (16Gb) and the 2 other use 2Gb of memory at most.
I will upgrade the node-group to 32Gb and to pray to Lord to be enough.

@musabmasood
Copy link

Same here. We have 8 application-controllers and 8 clusters managed by this ArgoCD instance. As you can see the usage is really not distributed.

image

@mchudinov
Copy link

Version ArgoCD 2.6.3 has still the same issue.
A single instance of app controller uses all CPU while other instances are not loaded at all.

@mchudinov
Copy link

Version ArgoCD v2.7.1 still the same trouble. Whatever number of application controllers are in use the one the only gets all CPU/memory load.
Screenshot 2023-05-11 at 23-05-53 Dashboards - Elastic
We are using Kubernetes version 1.24.10 Azure AKS. Same picture is on all 3 our clusters.

@AswinT22
Copy link

AswinT22 commented Aug 1, 2023

How are you guys working around this ?

@crenshaw-dev
Copy link
Member

A new feature in 2.8 attempts to mitigate the issue by letting folks decide their own cluster placement algorithms: #13018

There's also ongoing work on a dynamic, runtime rebalancing mechanism: #13221

Fact is, if you have 3 clusters over 3 shards, and one cluster is super active while the others are inactive, the imbalance is inevitable. There's no current effort to split the load of a single cluster across multiple shards. But if you're managing 30 clusters, and 3 of them are "hot," the new balancing algorithms might help place those 3 clusters on different shards so that overall load is more evenly placed.

@pre
Copy link

pre commented Aug 1, 2023

There's no current effort to split the load of a single cluster across multiple shards.

This is clearly stated but its sad news.

Having one "hot" cluster very often overloads the single application-controller replica handling it, while the other replicas are idle. Scaling up resources for the single application-controller replica will also beef up the other replicas as each replica has the same resource request.

It would be great to balance the load of a single cluster - other than dedicating a completely separate argocd installation for it / each "hot" cluster.

@crenshaw-dev
Copy link
Member

There are efforts and features which may be able to "cool down" a cluster by cutting out unnecessary work. ignoreResourceUpdates is one such feature: https://argo-cd.readthedocs.io/en/release-2.8/operator-manual/reconcile/

But splitting a single cluster across multiple shards will require deep knowledge of core Argo CD code and a proposal describing how that work could be effectively split. I expect that efforts to minimize unnecessary work on hot clusters will take priority, at least in the short- to medium-term.

@HariSekhon
Copy link
Contributor

HariSekhon commented Aug 18, 2023

This could be done via hash-mod sharding on the cluster+app name string - this is similar to what NoSQL systems have been doing for well over a decade to figure out if they should manage a shard of data or not without needing negotiation with other nodes.

It's pretty easy in terms of code too.

Each application controller can generate this same hash map in 1 second (it doesn't require an expensive cryptographic algorithm) and knows its own zero-indexed ID and only manages the apps which have a matching modulus result from the hash mod on their cluster + app name.

The hash map should be cached and only needs to be recalculated if the number of replicas is set differently, so a check once a minute of the modulus value used vs the replica count requested in the statefulset is a trivially cheap check.

Notice that you may want to switch application controllers from being deployments to being statefulsets so that you get their zero-index ID for free, it's just the suffix of the pod name.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 18, 2023

Yes, assigning applications to different controllers is easy.

Actually reducing controller workload in the process is hard.

Much of the work on each controller has to do with maintaining a "cluster cache" for each managed cluster. Maintaining that cache has nothing (directly) to do with applications.

By splitting applications targeting a single cluster across multiple controllers, you duplicate the work of maintaining that cluster cache across the controllers.

So the problem isn't as easy as "spread the applications." It's "spread the applications in a way that significantly reduces each controller's workload."

@HariSekhon
Copy link
Contributor

@crenshaw-dev ah that explains a lot, thanks.

What is the cluster cache, a cached dump of all live objects on the cluster to compare application manifests to?

Since most applications are in their own specific namespace, sharding by application could allow for the cluster cache to be optimized to only contain the namespaces for objects in those apps in most cases, therefore potentially reducing the cluster cache each application controller has to maintain and reconcile against?

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 18, 2023

What is the cluster cache, a cached dump of all live objects on the cluster to compare application manifests to?

Yep, exactly!

sharding by application could allow for the cluster cache to be optimized to only contain the namespaces for objects in those apps

I do like that idea.... dynamically establishing and dropping watches as the set of managed namespaces changes would require some overhead (both in terms of code and processing), but it would be possible. I think you still hit the problem of there being significant overlap and of "hot" namespaces - i.e. when one namespace accounts for 95% of the controller load.

I think the time spent building this highly-dynamic system is probably better spent just designing and implementing an agent model, like Red Hat's or Akuity's. That lets you scale on a per-cluster basis by scaling the controller which is already dedicated to that cluster.

@HariSekhon
Copy link
Contributor

HariSekhon commented Aug 18, 2023

If people are putting everything into the default namespace then they probably don't have much sophistication or scale.

The kube-system namespace is the only one I can think of off the top of my head that is likely to have several applications in it and be processed by several argocd application controllers.

For bespoke apps, people may put many applications into the same namespace out of laziness, but as they grow it's a simple doc to tell them not to do that for performance reasons.

Yes having an agent-based approach might be easier to offload from the application controllers to each cluster.

This would be similar to sharding at the cluster level at first.

Perhaps then a second level of sharding over multiple agent pods within the cluster by sharding on app names within the agent replicas?

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 18, 2023

For bespoke apps, people may put many applications into the same namespace out of laziness

Not always... it depends on how heterogeneous the kind of apps are. For example, I might have a simple "hello world" API in the hello namespace. Small resource count, low churn - my informers are happy. But then I might have a very large, sophisticated app in namespace prod-wild-thingy that involves hundreds of resources with high churn rates. And some large apps don't lend themselves to being spread across multiple namespaces.

Not saying this is common, just that it's not entirely uncommon or entirely the fault of the app designer.

Perhaps then a second level of sharding over multiple agent pods within the cluster by sharding on app names within the agent replicas?

Yep! You'd get agents tuned on a per-cluster basis, and then you should shard within that cluster using any of the existing sharding techniques or new ones. But it significantly limits the problem space, now that you're dealing with just one cluster.

@HariSekhon
Copy link
Contributor

I think if you have a large number of apps then a random hash-mod sharding distribution within each cluster agent should on average level out a mix of large and small apps between different agent pods.

Statistically the more apps the more the spread should even out due to the natural bell curve distribution, and since this scaling problem is going to be caused by more apps, this should be fine in practice. I guess we'll see when it's implemented!

@lukaspj
Copy link
Contributor

lukaspj commented Dec 1, 2023

If I understand this correctly, it effectively means ArgoCD runs in a hot-standby model when you are running it in-cluster. We are battling this a bit because it makes it very hard for us to manage the resources for ArgoCD replicas:
image

An in-cluster sharding algorithm would be very good to have.

@bygui86
Copy link

bygui86 commented Dec 1, 2023

@lukaspj in my case the workload is quite distributed, but definitely not ideal yet...

NAME                                                CPU(cores)   MEMORY(bytes)
argocd-application-controller-0                     39m          890Mi
argocd-application-controller-1                     264m         627Mi
argocd-application-controller-2                     98m          1075Mi
argocd-application-controller-3                     3m           51Mi

@peschmae
Copy link
Contributor

I wonder if we could in theory load balance this by creating many cluster configs to the same local cluster.

Just did a quick test, and it seems that argocd is using the cluster URL as a unique attribute. at least when i created a new cluster using the URL as the in-cluster config, it just merged them.

To make something like this work, it would need atleast a few more services pointing towards the api-server, or maybe using a cluster external URL/IP to connect to.

@ashishkamat2791
Copy link

ashishkamat2791 commented Mar 20, 2024

from 2.8.0 and later releases we can the Round-Robin sharding algorithm
check this https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/#argocd-application-controller
we have also come across this situation, but after use of sharding issue is resolved.

How to configure the Argo CD Application Controller to use a round-robin sharding algorithm?

kubectl patch configmap argocd-cmd-params-cm -n <argocd-namespace> --type merge -p '{"data":{"controller.sharding.algorithm":"round-robin"}}'

After updating the configmap successfully, roll out the restart of the Argo CD Application Controller statefulset using the following command:

kubectl rollout restart -n <argocd-namespace> statefulset argocd-application-controller

Now, to verify that the Argo CD Application Controller is using a round-robin sharding algorithm, run the following command:

kubectl exec -it argocd-application-controller-0 -- env | grep ARGOCD_CONTROLLER_SHARDING_ALGORITHM

kubectl exec -it argocd-application-controller-0 -- env | grep ARGOCD_CONTROLLER_SHARDING_ALGORITHM
Expected output

ARGOCD_CONTROLLER_SHARDING_ALGORITHM=round-robin

@chris-ng-scmp
Copy link
Contributor

from 2.8.0 and later releases we can the Round-Robin sharding algorithm check this https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/#argocd-application-controller we have also come across this situation, but after use of sharding issue is resolved.

How to configure the Argo CD Application Controller to use a round-robin sharding algorithm?

kubectl patch configmap argocd-cmd-params-cm -n <argocd-namespace> --type merge -p '{"data":{"controller.sharding.algorithm":"round-robin"}}'

After updating the configmap successfully, roll out the restart of the Argo CD Application Controller statefulset using the following command:

kubectl rollout restart -n <argocd-namespace> statefulset argocd-application-controller

Now, to verify that the Argo CD Application Controller is using a round-robin sharding algorithm, run the following command:

kubectl exec -it argocd-application-controller-0 -- env | grep ARGOCD_CONTROLLER_SHARDING_ALGORITHM

kubectl exec -it argocd-application-controller-0 -- env | grep ARGOCD_CONTROLLER_SHARDING_ALGORITHM Expected output

ARGOCD_CONTROLLER_SHARDING_ALGORITHM=round-robin

I think the sharding algorithm round-robin is not really helping, as this is still shard apps by cluster
What we want is shard apps by app

@alexmt alexmt added component:core Syncing, diffing, cluster state cache type:bug labels Jul 9, 2024
@bygui86
Copy link

bygui86 commented Jul 9, 2024

I totally agree with @chris-ng-scmp

What is really needed here is an algorithm to distribute the single Applications evenly between argocd-application-controller replicas.
Distributing the load based on cluster/project supposes that all clusters/projects have same amount of apps, which is not always the case.

@tebaly
Copy link

tebaly commented Jul 28, 2024

By splitting applications targeting a single cluster across multiple controllers, you duplicate the work of maintaining that cluster cache across the controllers.

I don't understand what cache you're talking about. Your cache is located on the Redis server. Isn't it a distributed cache, actually? If I have one cluster and one distributed cache - what problems do you have with distributing rendering tasks across different nodes?

Consider the option of asynchronous rendering with saving Queues in the same Redis. Before starting rendering, select a leader with the lowest load, it does everything itself except for those tasks that can/need to be distributed across replicas asynchronously.

@dhruvang1
Copy link
Contributor

@tebaly Each Application controller uses in-memory cache of all the resources from the cluster it manages. So a simple split of a cluster across multiple shards is not possible. Redis is only used for caching git/helm repositories and it cannot replace the in-memory cache.

@bygui86
Copy link

bygui86 commented Jul 29, 2024

@dhruvang1 why it's not possible to replace the in-memory cache with Redis?

@tebaly
Copy link

tebaly commented Jul 29, 2024

@tebaly Each Application controller uses in-memory cache of all the resources from the cluster it manages. So a simple split of a cluster across multiple shards is not possible. Redis is only used for caching git/helm repositories and it cannot replace the in-memory cache.

Plan B: identify resource-intensive tasks and make some kind of asynchronous worker on each replica, so that only the worker performs its stateless task, and let the cache remain as is.

@lukasz-leszczuk-airspace-intelligence

@tebaly Each Application controller uses in-memory cache of all the resources from the cluster it manages. So a simple split of a cluster across multiple shards is not possible. Redis is only used for caching git/helm repositories and it cannot replace the in-memory cache.

What does that mean for running multiple controllers that handle a single server?
Is it just increased memory usage (loading a bunch of applications/resources that are supposed to be handled by another controller replica), but besides that, everything is working fine, or is there a risk of breaking deployments (for example, controller A deleting resources created by controller B)?

@rouke-broersma
Copy link
Contributor

@tebaly Each Application controller uses in-memory cache of all the resources from the cluster it manages. So a simple split of a cluster across multiple shards is not possible. Redis is only used for caching git/helm repositories and it cannot replace the in-memory cache.

What does that mean for running multiple controllers that handle a single server?
Is it just increased memory usage (loading a bunch of applications/resources that are supposed to be handled by another controller replica), but besides that, everything is working fine, or is there a risk of breaking deployments (for example, controller A deleting resources created by controller B)?

Your extra controllers will literally do nothing, because they will not have applications assigned.

@bygui86
Copy link

bygui86 commented Sep 6, 2024

@tebaly and @rouke-broersma so is there already any proposal to solve this? And meanwhile what do you suggest as a suitable workaround?
Maybe duplicating a cluster? For example cluster-A-part-1 and cluster-A-part-2, so to have 2 app-controllers instead of one, and distributing applications equally between those two?

In our use case, we have 4 clusters, around 400 applications (increasing) and 4 app-controller replicas with quite some resources assigned to them (can't be more precise because I don't remember at the moment). The controllers go OOM when there are some applications (for example around 20) syncing. We scaled the controllers vertically already 2 times, but we can't keep adding memory.

@rouke-broersma
Copy link
Contributor

@tebaly and @rouke-broersma so is there already any proposal to solve this? And meanwhile what do you suggest as a suitable workaround? Maybe duplicating a cluster? For example cluster-A-part-1 and cluster-A-part-2, so to have 2 app-controllers instead of one, and distributing applications equally between those two?

In our use case, we have 4 clusters, around 400 applications (increasing) and 4 app-controller replicas with quite some resources assigned to them (can't be more precise because I don't remember at the moment). The controllers go OOM when there are some applications (for example around 20) syncing. We scaled the controllers vertically already 2 times, but we can't keep adding memory.

I don't have a good answer for you as we are in the same boat unfortunately.

@pre
Copy link

pre commented Sep 6, 2024

This is not a good answer, but we solved the issue by scaling argocd horizontally.

That is, we have multiple completely individual argocd deployments. Downside is extra overhead, but the benefit is that all the other components than application-controller get ”scaled” as well.

@bygui86
Copy link

bygui86 commented Sep 6, 2024

@pre so basing on our use case, I should deploy 4 different "argocd entire system" one for each cluster... But how does this solve the issue?

I mean if the key of the issue is the in-memory info completely within the app-controller, what's the difference (apart from what's obvious) between 4 app-controller replicas and 4 entire argocd?

@pre
Copy link

pre commented Sep 6, 2024

It’d be the same as the ”part 1” and ”part 2” approach mentioned above. Point is to divide the load of a single cluster to different argocd’s.

@bygui86
Copy link

bygui86 commented Sep 6, 2024

@pre got it thanks! Unfortunately such approach introduces multiple UIs :( which is really difficult for other teams to manage

Actually I'm really curious how big companies with thousands of ArgoCD Apps manage them... Just vertically scaling the app-controllers? Or deploying multiple ArgoCDs?

@pre
Copy link

pre commented Sep 7, 2024

For us multiple ArgoCDs wasn’t a big deal since we had already decided to have independent ArgoCD setups per environment.

We have a Wiki Cheat Sheet page with links to correct places :epic_face: Since ArgoCD authenticates with OpenID and authorizations are defined in a single place, it’s just a matter of finding the correct link and pressing the SSO button.

@bygui86
Copy link

bygui86 commented Sep 7, 2024

@pre thanks for sharing your experience! :)

I will do some tests and try to mitigate somehow our issue as from what I see from this thread this feature won't be implemented soon or at all :((

@chris-ng-scmp
Copy link
Contributor

Setting up multiple ArgoCD for the same cluster is not really a solution for this good CNCF project...

@bygui86
Copy link

bygui86 commented Nov 8, 2024

@chris-ng-scmp I agree, but there is are alternative!
The project is amazing but distributing the load across application-controllers per cluster instead of per application makes everything really complex and absolutely not flexible :(

@andrii-korotkov-verkada
Copy link
Contributor

See #14346 for a similar discussion. I'll bring this up in the contributors meeting.

@andrii-korotkov-verkada andrii-korotkov-verkada added enhancement New feature or request component:application-controller and removed bug Something isn't working type:bug labels Nov 15, 2024
@andrii-korotkov-verkada
Copy link
Contributor

Also, changing this to enhancement proposal, as it's expected behavior for now due to sharding by cluster.

@moore-nathan
Copy link

Writing as a +1. Would very much like to see this. It has become a bottleneck where if a large number of Applications get created (think PRs during peak hours) the Application Controller is now a blocker for teams.
cc: @todaywasawesome

@RomyKess
Copy link

RomyKess commented Dec 10, 2024

Would love to add on to this. For us it is experienced as we have around a hundred clusters, some of which are turned off and some of which are up. Their state changes quite a bit. Round robin isn't helping as it divides all clusters equally without differentiating between their status (on or off). So a controller with 20 clusters can have 16 which are up, while another controller with 20 clusters can have 5 that are up. The first one might experience 60 OOM kills in one day while the other one will consume half of its memory limits.

Assigning clusters a shard manually is not ideal, especially because the clusters turn on and off often, so a sharding algorithm which takes resource usage into account would be a great and much needed solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:application-controller component:core Syncing, diffing, cluster state cache enhancement New feature or request type:scalability Issues related to scalability and performance related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.