-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROX-15980: Set resources requests and limits for fleetshard-sync #990
ROX-15980: Set resources requests and limits for fleetshard-sync #990
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a brief explanation of why you chose the particular requests and limits? My intuition leads me to believe the cpu and memory requirements for fleet manager are quite low, but it'd be nice to have that backed by data.
Also, it looks like your memory requests and limits are not powers of two. I assume they should be, right?
See PR description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of nitpicks inline, but LGTM overall.
Also, this is not yet the straw that broke the camel's back, but I think we're now in the range where the length of the --set
parameters in the helm command line is getting ridiculous. Would you mind filing a ticket (and putting its ID in a TODO somewhere in terraform_cluster.sh
) to restructure this into per-environment values files? Then we could use the same base file to ensure consistency, and override some values on a per-environment basis using per-environment values files as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kovayur, ludydoo, porridge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Yes, I've checked the current fleetshard-sync memory and cpu usage on prometheus. The actual current resource usage is very low. For example, it does not really go north of 128Mi, nor 0.1 CPU. Though, I assumed that for a production workload, we should give it a bit more oumph, just to be on the safe side and try to avoid running into resource problems, since it's a pretty important part of the puzzle. |
/retest |
🤦♂️ not sure how I missed this. Thanks for restating your reasoning, @ludydoo |
@@ -56,6 +64,10 @@ case $ENVIRONMENT in | |||
OBSERVABILITY_OPERATOR_VERSION="v4.0.4" | |||
OPERATOR_USE_UPSTREAM="true" | |||
OPERATOR_VERSION="v4.0.0" | |||
FLEETSHARD_SYNC_CPU_REQUEST="${FLEETSHARD_SYNC_CPU_REQUEST:-"1000m"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO setting requests to limits for cpu is not as desirable as it is for memory. The impact of cpu overcommit (slowness) is generally not as severe as memory (evictions), and setting cpu requests too high can more easily lead to scheduling issues when the cpu usage on the cluster is relatively low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the CPU request to be lower, but kept the memory requests == limits
Description
As part of app-sre onboarding requirements, the request and limits must be set for all components of the ACSCS service. This PR adds resources limits and requests for the fleetshard-sync component.
It uses sensible values derived from the available metrics.
Checklist (Definition of Done)
Unit and integration tests addedAdded test description underTest manual
ROX-12345: ...
Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.Add secret to app-interface Vault or Secrets Manager if necessary