Raise k8s-infra-prow-build cluster nodepool max size #1231

spiffxp · 2020-09-10T01:47:13Z

Well, it took about two weeks from the last "let's wait and see" comment (ref: #1132 (comment))

This is a regional cluster spread across 3 zones, with a nodepool that will autoscale up to 30 nodes per zone. So, 90 nodes total. We've hit that limit a few times now.

Metrics explorer: VM instance uptime (count)

Over the last month

Let's zoom in on some of those peaks

Enhance

I'm not sure if there is an alert or a log line I could be searching for to show me exactly how often this occurs, but it's happening. I think this would result in more jobs hitting "error" state, since they can't find someplace to schedule to.

The plank graph for the past two weeks shows a few increases in jobs hitting error state (noon thursday is pretty prominent), but nothing catastrophic. Again though, this is based on CR's, not discrete events in time, so it's unclear to me how many prowjob CR's are being added/removed at any given time.

spiffxp · 2020-09-10T01:58:41Z

Given that the build cluster we are supposed to be replacing is 160 nodes (plus whatever capacity RBE was offering), and we still have some critical kubernetes/kubernetes jobs to move over, I think we should raise max nodepool size from 90 (3x30) to 150 (3x50)

spiffxp · 2020-09-10T02:10:03Z

/assign

Part of kubernetes/test-infra#18550, need more capacity to feel confident we've got room for the rest of the jobs being migrated over

spiffxp · 2020-09-10T02:12:34Z

/area prow
/sig testing
/wg k8s-infra

spiffxp · 2020-09-10T02:18:39Z

Other quotas may need to be bumped to accomodate this:

3 * 50 * 8 = 1200 CPU's
3 * 30 * 500 = 75000 Gi SSD capacity
3 * 50 * 1 = 150 in-use IP addresses

Per https://console.cloud.google.com/iam-admin/quotas?project=k8s-infra-prow-build quotas for us-central1 are now at:

1440 CPUs
81920 GB SSD
150 in-use IP addresses

The IP's could stand to be raised. The others we may want to raise if we want to try more cpu or more SSD for increased IOPS

3 * 50 * 16 = 2400 CPU's for n1-highmem-16's
3 * 50 * 834 = 125100 GB SSD

spiffxp · 2020-09-10T04:26:52Z

Quotas for us-central1 are now:

2,500 CPUs
130,000 GB SSD
160 IPs

spiffxp · 2020-09-10T14:52:37Z

/close
Calling this done

k8s-ci-robot · 2020-09-10T14:52:51Z

@spiffxp: Closing this issue.

In response to this:

/close
Calling this done

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This was referenced Sep 10, 2020

Raise prow-build max nodepool size to 50 #1232

Merged

Kubernetes CI Policy: merge-blocking jobs must run in dedicated cluster kubernetes/test-infra#18550

Closed

k8s-ci-robot assigned spiffxp Sep 10, 2020

k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/k8s-infra labels Sep 10, 2020

k8s-ci-robot closed this as completed Sep 10, 2020

spiffxp mentioned this issue Feb 24, 2021

increase prow build/test cluster capacity #1703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise k8s-infra-prow-build cluster nodepool max size #1231

Raise k8s-infra-prow-build cluster nodepool max size #1231

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020 •

edited

Loading

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

k8s-ci-robot commented Sep 10, 2020

Raise k8s-infra-prow-build cluster nodepool max size #1231

Raise k8s-infra-prow-build cluster nodepool max size #1231

Comments

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020 • edited Loading

spiffxp commented Sep 10, 2020

spiffxp commented Sep 10, 2020

k8s-ci-robot commented Sep 10, 2020

spiffxp commented Sep 10, 2020 •

edited

Loading