-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ignoreResourceUpdates
to reduce controller CPU usage (#13534)
#13912
feat: add ignoreResourceUpdates
to reduce controller CPU usage (#13534)
#13912
Conversation
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #13912 +/- ##
==========================================
- Coverage 49.61% 49.61% -0.01%
==========================================
Files 256 257 +1
Lines 43829 44146 +317
==========================================
+ Hits 21744 21901 +157
- Misses 19948 20091 +143
- Partials 2137 2154 +17
☔ View full report in Codecov by Sentry. |
Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Alexandre Gaudreault <[email protected]>
ignoreResourceUpdates
to reduce controller CPU usage (#13534)
This PR looks fantastic. It looks like could fix the app controller CPU high issues. |
Hi @agaudreault-jive |
@jaideepr97 If your question is related to why there are 2 different settings and ignoreDifferences could not be reused to skip the reconcile as well: In our case, ignore difference has more configuration than what is necessary for the reconcile optimization. It is also hard/impossible to know what everyone has configured. Having 2 configurations prevents the possibility of conflicts. However, |
Signed-off-by: Alexandre Gaudreault <[email protected]>
…/argo-cd into reduce-object-reconcile
Agreed, this pr amazing. We are seeing 100% cpu endlessly, with the application-controller monitoring out 2k pods. This seems to be due metric changes on the HPA's. |
@donkeyx are you in a position to run this branch internally and monitor the effects? I'd be happy to help cherry pick these changes to whatever version you're running now. |
@crenshaw-dev we are currently running |
@donkeyx for HPA, we had the same issue and I used the following configs to make sure it works with v1 too. You might still see a couple of reconciles due to the ReplicaSet/Pods/Deployments updates, but no more from the HPA.
|
Signed-off-by: Alexandre Gaudreault <[email protected]>
One "glitch" that I am seeing is when a ReplicaSet is scaled down (by HPA). When a pod is set to terminating, its health turns to "progressing", the Application health also changes to "progressing". However, when the pods "disappear" from the UI, the Application status is not updated and stays "Progressing". I expect the The app status is the following, but no resources in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one remaining substantial thought.
…cile Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]>
Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Co-authored-by: Alexandre Gaudreault <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
Signed-off-by: Michael Crenshaw <[email protected]>
@agaudreault-jive thanks to you and your company for this significant contribution to improving performance |
I Added to my config an additional section:
Then I restarted
image: quay.io/argoproj/argocd:v2.8.0-rc1 |
Actually, after setting
On the other hand, when |
@everythings-gonna-be-alright #14304 until this is merged and cherry-picked, you can use "debug" logs while |
…oproj#13534) (argoproj#13912) * feat: ignore watched resource update Signed-off-by: Alexandre Gaudreault <[email protected]> * add doc and CLI Signed-off-by: Alexandre Gaudreault <[email protected]> * update doc index Signed-off-by: Alexandre Gaudreault <[email protected]> * add command Signed-off-by: Alexandre Gaudreault <[email protected]> * codegen Signed-off-by: Alexandre Gaudreault <[email protected]> * revert formatting Signed-off-by: Alexandre Gaudreault <[email protected]> * do not skip on health change Signed-off-by: Alexandre Gaudreault <[email protected]> * update doc Signed-off-by: Alexandre Gaudreault <[email protected]> * update logging to use context Signed-off-by: Alexandre Gaudreault <[email protected]> * fix typos. local build broken... Signed-off-by: Alexandre Gaudreault <[email protected]> * change after review Signed-off-by: Alexandre Gaudreault <[email protected]> * manifestHash to string Signed-off-by: Alexandre Gaudreault <[email protected]> * more doc Signed-off-by: Alexandre Gaudreault <[email protected]> * example for argoproj Application Signed-off-by: Alexandre Gaudreault <[email protected]> * add unit test for ignored logs Signed-off-by: Alexandre Gaudreault <[email protected]> * codegen Signed-off-by: Alexandre Gaudreault <[email protected]> * Update docs/operator-manual/reconcile.md Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * move hash and set log to debug Signed-off-by: Alexandre Gaudreault <[email protected]> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * feature flag Signed-off-by: Michael Crenshaw <[email protected]> * fix Signed-off-by: Michael Crenshaw <[email protected]> * less aggressive managedFields ignore rule Signed-off-by: Michael Crenshaw <[email protected]> * Update docs/operator-manual/reconcile.md Co-authored-by: Alexandre Gaudreault <[email protected]> * use local settings Signed-off-by: Michael Crenshaw <[email protected]> * latest settings Signed-off-by: Michael Crenshaw <[email protected]> * safety first Signed-off-by: Michael Crenshaw <[email protected]> * since it's behind a feature flag, go aggressive on overrides Signed-off-by: Michael Crenshaw <[email protected]> --------- Signed-off-by: Alexandre Gaudreault <[email protected]> Signed-off-by: Michael Crenshaw <[email protected]> Co-authored-by: Michael Crenshaw <[email protected]>
…oproj#13534) (argoproj#13912) * feat: ignore watched resource update Signed-off-by: Alexandre Gaudreault <[email protected]> * add doc and CLI Signed-off-by: Alexandre Gaudreault <[email protected]> * update doc index Signed-off-by: Alexandre Gaudreault <[email protected]> * add command Signed-off-by: Alexandre Gaudreault <[email protected]> * codegen Signed-off-by: Alexandre Gaudreault <[email protected]> * revert formatting Signed-off-by: Alexandre Gaudreault <[email protected]> * do not skip on health change Signed-off-by: Alexandre Gaudreault <[email protected]> * update doc Signed-off-by: Alexandre Gaudreault <[email protected]> * update logging to use context Signed-off-by: Alexandre Gaudreault <[email protected]> * fix typos. local build broken... Signed-off-by: Alexandre Gaudreault <[email protected]> * change after review Signed-off-by: Alexandre Gaudreault <[email protected]> * manifestHash to string Signed-off-by: Alexandre Gaudreault <[email protected]> * more doc Signed-off-by: Alexandre Gaudreault <[email protected]> * example for argoproj Application Signed-off-by: Alexandre Gaudreault <[email protected]> * add unit test for ignored logs Signed-off-by: Alexandre Gaudreault <[email protected]> * codegen Signed-off-by: Alexandre Gaudreault <[email protected]> * Update docs/operator-manual/reconcile.md Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * move hash and set log to debug Signed-off-by: Alexandre Gaudreault <[email protected]> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * Update util/settings/settings.go Co-authored-by: Michael Crenshaw <[email protected]> Signed-off-by: Alexandre Gaudreault <[email protected]> * feature flag Signed-off-by: Michael Crenshaw <[email protected]> * fix Signed-off-by: Michael Crenshaw <[email protected]> * less aggressive managedFields ignore rule Signed-off-by: Michael Crenshaw <[email protected]> * Update docs/operator-manual/reconcile.md Co-authored-by: Alexandre Gaudreault <[email protected]> * use local settings Signed-off-by: Michael Crenshaw <[email protected]> * latest settings Signed-off-by: Michael Crenshaw <[email protected]> * safety first Signed-off-by: Michael Crenshaw <[email protected]> * since it's behind a feature flag, go aggressive on overrides Signed-off-by: Michael Crenshaw <[email protected]> --------- Signed-off-by: Alexandre Gaudreault <[email protected]> Signed-off-by: Michael Crenshaw <[email protected]> Co-authored-by: Michael Crenshaw <[email protected]>
Just to be sure, argocd is taking |
@ebuildy correct. You have a few example in https://argo-cd.readthedocs.io/en/stable/operator-manual/reconcile |
Closes #13534 #6108 #13614 #8471 #8100 #7406 #9014 #9819
Changes:
ignoreResourceUpdates
global configuration to ignore fields before to hash resources.ignoreDifferencesOnResourceUpdates
config to use ignoreDifferences automatically toignoreResourceUpdates
.Refreshing app %s for change in cluster of object %s of type %s/%s
debug log to info to help get statistics and configureignoreResourceUpdates
.msg="Refreshing app*for change*" | rex field=msg "Refreshing app (?<application>\S+) for change in cluster of object (?<resource>\S+) of type (?<type>\S+)" | stats count by application resource type | sort -count
can be used.This was a result of adding the following config
During business hours, after optimization.
Checklist:
Please see Contribution FAQs if you have questions about your pull-request.