-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise k8s-infra-prow-build quotas in anticipation of handling merge-blocking jobs #1132
Comments
Submitted requests for:
|
Well, the 1024 CPU request went through just fine. The 100 in-use IP's...
So, I'll hold this open and see what comes back in two days. Quota is 69 in-use IP addresses until then. |
Part of kubernetes/test-infra#18550 |
/priority critical-urgent I have repeatedly tried to file for 100 in-use IP's and been rejected every time. We bumped into IP quota yesterday when autoscaling to handle PR traffic I'm escalating because in the grand scheme of things our PR load looks pretty low, and and I anticipate will bump into this more once we see real traffic (opening up for v1.20) There are some things we can do to workaround or address:
It would be really nice to be able to just raise our quota |
I will see what else I can learn internally, but to the mitigations, I think some should be considered:
16 core is a good sweet-spot, I think we should try it
We should try this (slowly) - we don't want to be wasteful
I don't see why we would not do this anyway, just for sanity in case of failure.
How would this affect the quota?
I think this is the real solution. I don't think we really need each node to have an IP anyway? |
Submitted request for 40960GB SSD in us-central1 (quota claims we were hitting our 20480 quota), which was approved
I'll see if I can setup a pool2 nodepool on the existing cluster and shift things over during some quiet time.
Tried asking for 100 IP's in us-west1 and us-east1, both rejected.
I agree. I just anticipate it could bump into the most unknowns along the way, and my bandwidth is currently limited. |
@spiffxp do we need to move some jobs out of that cluster while waiting? |
Maybe we can't get 100 IPs in each, but could we spread the load
between regions, so we get 50 in each?
…On Fri, Aug 21, 2020 at 12:51 PM ZhiFeng_5160 ***@***.***> wrote:
@spiffxp do we need to move some jobs out of that cluster while waiting?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
This basically is the "setup more small build clusters" option, since each region would need its own regional build cluster anyway. This avoids setting up a new GCP project for each cluster though. I'll look into it, we might be able to split up jobs in a way that makes sense |
@ZhiFeng1993
I would like to hold off on moving things away from community-accessible infra for now. Flipping back to k8s-prow-builds is a pretty quick change if we decide we have to move quickly and/or are out of options. |
I tried raising CPU and SSD quota in us-west1 to be able to create an equivalently sized build cluster in the same k8s-infra-prow-build project over there. Both requests automatically rejected. |
There is suspicion that moving to n1-highmem-16's has actually increased flakiness. Specifically for these jobs, across release branches:
I have opened #1172 to start rolling back |
Opened #1173 to track the rollback |
I was able to raise CPU quota in us-east1 to 1024, but was rejected for SSD and IP quota requests. Next step would be to try raising quotas for a different GCP project, in case k8s-infra-prow-build has gotten flagged for some reason. |
OK quota changes came through (thank you @thockin), I'm feeling better about our immediate capacity requirements being met in us-central1:
So now we'll be able to bump into our autoscaling limits at least /remove-priority critical-urgent |
I have broken this out into its own issue #1178 |
Quotas for us-central1 are now at:
Based on how things have been behaving today with v1.20 merges, I'm comfortable calling this done. We can open further issues as our needs evolve /close |
@spiffxp: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The node pool is currently setup as 3 * (6 to 30) n1-highmem-8's. We don't have enough quota to hit max nodepool size
At least in terms of resources, we need at least:
3 * 30 * 8
= 720 CPU's3 * 30 * 250
= 22500 Gi SSD capacity3 * 30
= 90 in-use IP addressesIf we want to match the size of the k8s-prow-builds cluster, which has 160 nodes, we should ask for more
/wg k8s-infra
/area prow
The text was updated successfully, but these errors were encountered: