Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-15980 set resource requests and limits to the egress-proxy #991

Merged
merged 9 commits into from
May 3, 2023

Conversation

ludydoo
Copy link
Collaborator

@ludydoo ludydoo commented Apr 27, 2023

Sets the resource requests and limits for the egress-proxy.

The values were derived from the observed metrics on prometheus/grafana. These resource defaults are a bit overkill for the actual observed usage. I was not comfortable putting less than this for a production deployment.

It also seems like the values.yaml file was ignored for the tenant-resources. This PR also adds changes to use these values by default and apply overrides on top of them.

The current cluster configuration deploys 1 replica of the egress-proxy, so I've changed the value to reflect that as well.

@ludydoo ludydoo temporarily deployed to development April 27, 2023 10:59 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 10:59 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 10:59 — with GitHub Actions Inactive
@ludydoo ludydoo requested review from porridge and kylape April 27, 2023 10:59
@ludydoo ludydoo temporarily deployed to development April 27, 2023 11:02 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 11:02 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 11:02 — with GitHub Actions Inactive
@ludydoo ludydoo requested a review from kurlov April 27, 2023 12:49
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:50 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:50 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:50 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:53 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:53 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:53 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:53 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:54 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 12:54 — with GitHub Actions Inactive
limits:
cpu: 100m
memory: 128Mi
requests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got the request and limit values backwards.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦

@ludydoo ludydoo temporarily deployed to development April 27, 2023 13:36 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 13:36 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 13:36 — with GitHub Actions Inactive
@ludydoo ludydoo requested a review from kylape April 27, 2023 13:43
@ludydoo
Copy link
Collaborator Author

ludydoo commented Apr 27, 2023

/retest

@ludydoo ludydoo temporarily deployed to development April 27, 2023 16:11 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 16:11 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development April 27, 2023 16:11 — with GitHub Actions Inactive
egressProxy:
image: ubuntu/squid:5.2-22.04_beta
replicas: 2

replicas: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be fixed in this PR, but this does make me wonder if we should actually run two replicas in the cloud service for all the usual reasons.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even 3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kylape @porridge perhaps with a nodeAntiAffinity on other egress proxies (preferredDuringScheduling) ?

Copy link
Contributor

@kylape kylape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Comment on lines +6 to +10
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use requests==limits at least until we have more replicas? We'll likely be setting resources for a bunch of pods in the next rollouts (see sibling PRs) so the risk of facing evictions will be higher than normal and I'm a bit worried with one replica this may cause service degradation for non-trivial central configurations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the replicas in this PR perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind.

@openshift-ci openshift-ci bot removed the lgtm label May 2, 2023
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@ludydoo ludydoo requested review from porridge and kylape May 2, 2023 09:30
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@ludydoo ludydoo temporarily deployed to development May 2, 2023 09:30 — with GitHub Actions Inactive
@openshift-ci openshift-ci bot added the lgtm label May 3, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 3, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kurlov, kylape, ludydoo, porridge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ludydoo ludydoo merged commit aad51d1 into main May 3, 2023
@ludydoo ludydoo deleted the ROX-15980-egress-proxy-resources-requests-and-limits branch May 3, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants