Migrate k8s-infra-prow-build to a nodepool with more IOPS #1187

spiffxp · 2020-08-28T17:37:40Z

This is followup to #1168 and #1173, made possible by quota changes done via #1132

My goal is to make these graphs go down

Our jobs are hitting I/O limits (both IOPS and throughput). That was made extra clear last weekend when we switched to larger nodes, thus causing more jobs to share the same amount of I/O.

We're seeing more jobs scheduled into the cluster now that v1.20 PRs are being merged. While our worst case node performance is about the same, we are seeing more throttling across the cluster in aggregate.

Kubernetes doesn't give us a way to provision I/O, so we're left optimizing per-node performance. Based on https://cloud.google.com/compute/docs/disks/performance I think we can get just under 2x the IOPS for a ~14% increase in cluster cost.

From there, going to next tier would require a 90% increase in cost, for only 66% more performance.

The most ideal thing would be local SSD but:

that's not a node bootdisk option at present
cluster autoscaling doesn't work with local PersistentVolumes
leaving us with hostPath volumes as the only option; maybe we could cobble something together that would replace Pods' emptyDir volumes with hostPath volumes? But that sounds like a maintenance nightmare

The text was updated successfully, but these errors were encountered:

spiffxp · 2020-08-28T17:39:34Z

/area prow
/sig testing
/wg k8s-infra

spiffxp · 2020-08-28T17:41:01Z

Opened #1186 to start migrating to first option (14% more cost for ~100% more IOPS)

spiffxp · 2020-08-28T18:46:39Z

New nodepool is up, old nodepool cordoned

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ for n in $(k get nodes -l cloud.google.com/gke-nodepool=pool3-20200824192452986800000001 -o=name); do k cordon $n; done
node/gke-prow-build-pool3-2020082419245298-65156f0e-0cwp cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-4xr9 cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-7bnv cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-7x2j cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-9q3g cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-fmpb cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-v5l5 cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-z4lq cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-5sqt cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-6kr5 cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-7wr8 cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-b5ns cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-h4v9 cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-nb31 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-0sn6 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-8qq0 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-b496 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-jt7h cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-lk1n cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-lnkj cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-mrt2 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-qwl2 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-s9x6 cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-tkwx cordoned
node/gke-prow-build-pool3-2020082419245298-f02ec66b-x2g2 cordoned

spiffxp · 2020-08-28T19:08:06Z

And I forgot to disable autoscaling for pool3 until just now

spiffxp@cloudshell:~ (k8s-infra-prow-build)$ for n in $(k get nodes -l cloud.google.com/gke-nodepool=pool3-20200824192452986800000001 -o=name); do k cordon $n; done | grep -v already
node/gke-prow-build-pool3-2020082419245298-65156f0e-4bsj cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-bzg3 cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-l8n1 cordoned
node/gke-prow-build-pool3-2020082419245298-65156f0e-n6mz cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-1bgk cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-968j cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-fpqb cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-nv8r cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-qgx2 cordoned
node/gke-prow-build-pool3-2020082419245298-d2051b67-s4kj cordoned

spiffxp · 2020-08-28T20:43:38Z

Deleted boskos

$ date; k delete pod -n test-pods -l app=boskos; date
Fri 28 Aug 2020 08:01:08 PM UTC
pod "boskos-564f5594dd-sk2jv" deleted
Fri 28 Aug 2020 08:01:17 PM UTC

$ date; k delete pod -n test-pods -l app=boskos-janitor; date
Fri 28 Aug 2020 08:01:38 PM UTC
pod "boskos-janitor-58c6d75dc9-684lm" deleted
pod "boskos-janitor-58c6d75dc9-s9b6s" deleted
Fri 28 Aug 2020 08:06:47 PM UTC

$ date; k delete pod -n test-pods -l app=boskos-reaper; date
Fri 28 Aug 2020 08:16:51 PM UTC
pod "boskos-reaper-56b467f9d8-c6rfw" deleted
Fri 28 Aug 2020 08:16:57 PM UTC

Waiting on the following to finish up

$ date; k get pods -n test-pods --field-selector=status.phase=Running -o=json | jq -r '.items | map(select(.spec.nodeName | match("pool3")))[] | "\(.status.startTime) \(.metadata.labels["prow.k8s.io/job"]) \(.metadata.name) \(.spec.nodeName)"' | sort | tee old-nodepool-jobs
Fri 28 Aug 2020 08:43:02 PM UTC
2020-08-28T16:05:56Z ci-kubernetes-e2e-gce-cos-k8sbeta-serial 53dbcab4-e948-11ea-aa1a-c6580c04344b gke-prow-build-pool3-2020082419245298-f02ec66b-0sn6
2020-08-28T16:12:56Z ci-kubernetes-e2e-gci-gce-serial 4e00d2eb-e949-11ea-aa1a-c6580c04344b gke-prow-build-pool3-2020082419245298-f02ec66b-tkwx
2020-08-28T17:34:56Z ci-kubernetes-e2e-gce-cos-k8sstable1-serial c2d6b387-e954-11ea-aa1a-c6580c04344b gke-prow-build-pool3-2020082419245298-d2051b67-b5ns
2020-08-28T18:26:56Z ci-kubernetes-gce-conformance-latest-1-16 067d54f8-e95c-11ea-aa1a-c6580c04344b gke-prow-build-pool3-2020082419245298-f02ec66b-0sn6
2020-08-28T18:47:48Z ci-kubernetes-gce-conformance-latest-kubetest2 8a2d1fdd-e95e-11ea-aa1a-c6580c04344b gke-prow-build-pool3-2020082419245298-65156f0e-bzg3

spiffxp · 2020-08-28T20:47:39Z

Trimmed empty nodes

spiffxp · 2020-08-28T22:30:51Z

Removed old nodepool with #1188

spiffxp · 2020-08-28T22:32:46Z

Holding this open to see what impact, if any, this has on the graphs shown in the description

https://console.cloud.google.com/monitoring/dashboards/custom/10925237040785467832?project=k8s-infra-prow-build&timeDomain=1w

spiffxp · 2020-08-31T20:00:33Z

No real change in the graphs, other than a reflection of PR traffic. This certainly didn't make things worse and isn't urgently more expensive, so I'm not inclined to rollback at the moment.

Supposedly PER_GB means increase the persistent disk size (ref: https://cloud.google.com/compute/docs/disks/review-disk-metrics#throttling_metrics)

One option would be to increase disk size to the next "tier" and see what happens. But I think I'd like to do a little more reading and focused testing to understand what's going on, and what options we have.

fejta-bot · 2020-11-29T20:30:16Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-12-29T21:14:50Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

spiffxp · 2021-01-20T16:28:12Z

/remove-lifecycle rotten
https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd

It's possible (under pre-GA terms) to create node pools with Local SSD as of 1.18. We'd need to upgrade the cluster to that version first

spiffxp · 2021-01-21T19:53:06Z

I'm really interested in seeing this happen, but I can't guarantee I'll have bandwidth this cycle, so leaving out of milestone. Gated on migrating cluster to 1.18

spiffxp · 2021-01-22T22:26:37Z

/priority backlog
/milestone v1.21
changed my mind about milestoning, I'll assign low priority

spiffxp · 2021-09-27T20:18:49Z

Provisioning a nodepool with ephemeral-ssd-storage=true as a taint: #2835

I'll try cutting over some canary presubmits to see what the behavior is

spiffxp · 2021-09-27T21:26:52Z

kubernetes/test-infra#23783 will cutover:

pull-kubernetes-e2e-gce-canary
pull-kubernetes-integration-go-canary
pull-kubernetes-e2e-kind-canary
pull-kubernetes-unit-go-canary
pull-kubernetes-verify-go-canary

Which are all manually triggered

spiffxp · 2021-09-27T23:35:41Z

After an initial round of canary jobs against a single PR, I have kicked off the canary jobs against a handful of arbitrary kubernetes/kubernetes PR's to trigger autoscaling and evaluate the node disk usage under some level of concurrency / load

https://console.cloud.google.com/monitoring/dashboards/builder/f0163540-a8b7-4618-8308-66652d3d4794?project=k8s-infra-prow-build&dashboardBuilderState=%257B%2522editModeEnabled%2522:false%257D&timeDomain=1h is the dashboard I'm using to watch the pot boil

Old pool is on left (pool4), new pool is on right (pool5). By default, Google Cloud Monitoring doesn't appear to let me manually set the Y axis scales. I added an arbitrary threshold to each graph to give them the same scale. You can see we're experiencing way less throttling with the new pool.

spiffxp · 2021-09-27T23:40:41Z

Need to cost out and estimate quota before rolling this out more generally. The numbers look good enough that I'm interested in doing so. However...

I legitimately can't tell that there's any immediately obvious speedup from doing this? I'll let the other jobs finish and take a look at PR history for a quick check tomorrow. The only other thing I can think that this might allow is lowering of some CPU/memory resource limits to pack jobs more densely if they're in fact not going to be as noisy to each other. That will probably require more attention than I have time for right now.

spiffxp · 2021-09-28T12:30:01Z

https://cloud.google.com/compute/disks-image-pricing#localssdpricing

Local SSDs are $30/mo, so x2 = 60/mo
Zonal SSDs are .17/GB/mo, so 500GB = 85/mo
Zonal PDs are .04/GB/mo, so 100GB = 4/mo

https://cloud.google.com/compute/vm-instance-pricing

n1-highmem-8 are ~ 241/mo

pool4 instances are n1-highmem-8 + 500GB pd-ssd = 241 + 85 = 326
pool5 instances are n1-highmem-8 + 100GB pd-standard + 2x local SSD = 241 + 60 + 4 = 305

That's about 7% savings... we could bring that to 16% if we used only 1 local SSD. Looking at our total spend just for k8s-infra-prow-build over the last year, it was ~258K. 7% savings would be ~18K, 16% ~40K. Not nothing, but not incredibly significant compared against our total budget.

Quota:

pool4 is capped at 80 instances * 3 zones * 2 local ssds * 375GB/ssd = 180,000 GB
current quota for us-central1 is 100,000 GB

Conclusions:

bump quota to 200,000 GB local SSD
pool5's current configuration doesn't appear to impact performance negatively
pool5 will be slightly cheaper than before
let's move forward with this, and consider denser/higher-CPU nodes in the future

spiffxp · 2021-09-28T12:44:50Z

Looks like I'm going to have to force recreation of the node pool to drop the taint

$ gcloud beta container node-pools update pool5-20210927201718330300000001 --project=k8s-infra-prow-build --zone=us-central1 --cluster=prow-build --node-taints=""
ERROR: (gcloud.beta.container.node-pools.update) ResponseError: code=400, message=Updates for 'taints' are not supported in node pools with autoscaling enabled (as a workaround, consider temporarily disabling autoscaling or recreating the node pool with the updated values.).

aojea · 2021-09-28T13:15:02Z

out of curiosity, why 2 local ssds?

spiffxp · 2021-09-28T13:34:32Z

Why not less:

I wanted to match the 500GB amount we used previously, and 375GB was too little
I wanted to over-provision capacity so as not to saturate a single ssd

Why not more:

Based on experiment (see dashboard pic above), it seemed like there was an order of magnitude improvement in IOPS/throttled bytes, but no real corresponding change in build performance
Anything more starts to raise our costs, and I'm starting to get slightly price sensitive

spiffxp · 2021-09-28T17:26:33Z

OK, migration to new nodepool with local SSDs for ephemeral storage complete, see #2839 for details

Throttled I/O is way down post-migration

Throttled bytes from old pool nodes on the left, new pool nodes on the right

spiffxp · 2021-09-28T17:36:44Z

I'll hold this open for a day to see if this had any negative impact, but I otherwise consider this issue closed.

It'll take a bit to determine whether this has had any impact on job / build time. Again, my guess based on a brief survey of the canary jobs from yesterday is negligible impact at best, but hopefully fewer noisy neighbors. We lack a great way to display this data at present, though I suspect the data will be available in some form in bigquery k8s-gubernator:build

spiffxp · 2021-09-29T16:08:13Z

/close
Calling this done

k8s-ci-robot · 2021-09-29T16:08:28Z

@spiffxp: Closing this issue.

In response to this:

/close
Calling this done

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp · 2021-09-29T16:09:49Z

/assign
/remove-help

k8s-ci-robot added area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/k8s-infra labels Aug 28, 2020

This was referenced Aug 28, 2020

Add n1-highmem-8 nodepool with larger SSDs to k8s-infra-prow-build #1186

Merged

Kubernetes CI Policy: merge-blocking jobs must run in dedicated cluster kubernetes/test-infra#18550

Closed

spiffxp mentioned this issue Aug 28, 2020

Remove old low-IOPS n1-highmem-8 nodepool #1188

Merged

spiffxp mentioned this issue Sep 1, 2020

(ci|pull)-kubernetes-kind-e2e.* jobs are failing kubernetes/test-infra#19080

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 29, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 20, 2021

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 22, 2021

k8s-ci-robot added this to the v1.21 milestone Jan 22, 2021

spiffxp mentioned this issue Jan 29, 2021

ci-kubernetes-build-canary jobs are failing to be scheduled kubernetes/test-infra#20670

Closed

spiffxp mentioned this issue Sep 27, 2021

config/jobs: mv some canary jobs to local ssd nodepool kubernetes/test-infra#23783

Merged

spiffxp mentioned this issue Sep 28, 2021

audit: update as of 2021-09-28 #2836

Merged

spiffxp mentioned this issue Sep 28, 2021

terraform/k8s-infra-prow-build: migrate to localssd nodepool #2839

Merged

spiffxp mentioned this issue Sep 29, 2021

Revert "config/jobs: mv some canary jobs to local ssd nodepool" kubernetes/test-infra#23813

Merged

k8s-ci-robot closed this as completed Sep 29, 2021

k8s-ci-robot assigned spiffxp Sep 29, 2021

This was referenced Oct 1, 2021

Change instance type for GKE build clusters #2438

Open

[release-1.22] pull-kubernetes-integration is failing kubernetes/kubernetes#105436

Closed

config/jobs: update 1.22 pull-kubernetes-integration to match ci kubernetes/test-infra#23936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate k8s-infra-prow-build to a nodepool with more IOPS #1187

Migrate k8s-infra-prow-build to a nodepool with more IOPS #1187

spiffxp commented Aug 28, 2020 •

edited

Loading

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 31, 2020

fejta-bot commented Nov 29, 2020

fejta-bot commented Dec 29, 2020

spiffxp commented Jan 20, 2021 •

edited

Loading

spiffxp commented Jan 21, 2021

spiffxp commented Jan 22, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 28, 2021

spiffxp commented Sep 28, 2021

aojea commented Sep 28, 2021

spiffxp commented Sep 28, 2021 •

edited

Loading

spiffxp commented Sep 28, 2021

spiffxp commented Sep 28, 2021

spiffxp commented Sep 29, 2021

k8s-ci-robot commented Sep 29, 2021

spiffxp commented Sep 29, 2021

Migrate k8s-infra-prow-build to a nodepool with more IOPS #1187

Migrate k8s-infra-prow-build to a nodepool with more IOPS #1187

Comments

spiffxp commented Aug 28, 2020 • edited Loading

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 28, 2020

spiffxp commented Aug 31, 2020

fejta-bot commented Nov 29, 2020

fejta-bot commented Dec 29, 2020

spiffxp commented Jan 20, 2021 • edited Loading

spiffxp commented Jan 21, 2021

spiffxp commented Jan 22, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 27, 2021

spiffxp commented Sep 28, 2021

spiffxp commented Sep 28, 2021

aojea commented Sep 28, 2021

spiffxp commented Sep 28, 2021 • edited Loading

spiffxp commented Sep 28, 2021

spiffxp commented Sep 28, 2021

spiffxp commented Sep 29, 2021

k8s-ci-robot commented Sep 29, 2021

spiffxp commented Sep 29, 2021

spiffxp commented Aug 28, 2020 •

edited

Loading

spiffxp commented Jan 20, 2021 •

edited

Loading

spiffxp commented Sep 28, 2021 •

edited

Loading