Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: expose knob to disable GC assist #115585

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

nvanbenschoten
Copy link
Member

Fixes #115584.

This commit updates our patched Go runtime to expose a knob to disable the runtime's GC assist mechanism. The knob is exposed under the GODEBUG environment variable, and can be accessed with GODEBUG=gcnoassist=1.

For now, this is just meant for experimentation purposes. It will assist us as we look to reduce the impact the GC has on tail latency.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member Author

nvanbenschoten commented Dec 5, 2023

There's more to do here before we can merge this. Hoping to get agreement on proceeding with the approach first.

  • Adjust the Pebble tests to run in new version.
  • Adjust version in the TeamCity agent image (setup script)
  • Update build/teamcity/internal/release/build-and-publish-patched-go/impl.sh with the new version and adjust SHA256 sums as necessary.
  • Adjust GO_VERSION and GO_FIPS_COMMIT for the FIPS Go toolchain (source).
  • Run the Internal / Cockroach / Build / Toolchains / Publish Patched Go for Mac build configuration in TeamCity with your latest version of the script above. Note the job depends on another job Build and Publish Patched Go. That job prints out the SHA256 of all tarballs, which you will need to copy-paste into WORKSPACE (see below). Publish Patched Go for Mac is an extra step that publishes the signed go binaries for macOS. That job also prints out the SHA256 of the Mac tarballs in particular.
  • Adjust --@io_bazel_rules_go//go/toolchain:sdk_version in .bazelrc.
  • Bump the version in WORKSPACE under go_download_sdk. You may need to bump rules_go. Also edit the filenames listed in sdks and update all the hashes to match what you built in the step above.
  • Bump the version in WORKSPACE under go_download_sdk for the FIPS version of Go (go_sdk_fips).
  • Run ./dev generate bazel to refresh distdir_files.bzl, then bazel fetch @distdir//:archives to ensure you've updated all hashes to the correct value.
  • Bump the go version in go.mod.
  • Bump the default installed version of Go in bootstrap-debian.sh (source).
  • Replace other mentions of the older version of go (grep for golang:<old_version> and go<old_version>).
  • Ask the Developer Infrastructure team to deploy new TeamCity agent images according to packer/README.md

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable at a high level, haven't looked into GC code in depth. Curious to see how it flies.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/gcNoAssist branch from 060870c to 03b397e Compare December 5, 2023 17:49
@nvanbenschoten
Copy link
Member Author

Some basic experimentation with kv95/1kb:

roachprod create nathan-gc -n4 --gce-machine-type='n2-standard-16' --local-ssd=false
roachprod put    nathan-gc artifacts/cockroach

# run with defaullt Go GC.
roachprod start  nathan-gc:1-3
roachprod run    nathan-gc:4 -- './cockroach workload init kv --splits=15 {pgurl:1}'
roachprod run    nathan-gc:4 -- './cockroach workload run kv --read-percent=95 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=512 --max-rate=30000 --duration=3m --ramp=30s {pgurl:1-3}'

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        5130630        28503.5      0.9      0.9      1.3      2.9     60.8  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         269582         1497.7      2.5      2.2      4.2      7.1     65.0  write


# run without Go GC assist.
roachprod stop   nathan-gc
roachprod start  nathan-gc:1-3 --env=GODEBUG=gcnoassist=1
roachprod run    nathan-gc:4 -- './cockroach workload run kv --read-percent=95 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=512 --max-rate=30000 --duration=3m --ramp=30s {pgurl:1-3}'

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        5129567        28497.4      0.8      0.9      1.2      1.7     27.3  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         270662         1503.7      2.2      2.2      2.9      4.1     23.1  write


# run without Go GC assist and reduced GC rate.
roachprod stop   nathan-gc
roachprod start  nathan-gc:1-3 --env=GODEBUG=gcnoassist=1 --env=GOGC=800
roachprod run    nathan-gc:4 -- './cockroach workload run kv --read-percent=95 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=512 --max-rate=30000 --duration=3m --ramp=30s {pgurl:1-3}'

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        5130241        28501.3      0.9      0.9      1.2      1.6     60.8  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         269844         1499.1      2.2      2.2      2.8      3.8     56.6  write


# run with Go GC assist and reduced GC rate.
roachprod stop   nathan-gc
roachprod start  nathan-gc:1-3 --env=GOGC=800
roachprod run    nathan-gc:4 -- './cockroach workload run kv --read-percent=95 --min-block-bytes=1024 --max-block-bytes=1024 --concurrency=512 --max-rate=30000 --duration=3m --ramp=30s {pgurl:1-3}'

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        5128790        28493.3      0.9      0.9      1.2      1.7     62.9  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         271302         1507.2      2.4      2.2      3.8      5.2     56.6  write

@erikgrinaker
Copy link
Contributor

Wow, nice tail latencies.

Fixes cockroachdb#115584.

This commit updates our patched Go runtime to expose a knob to disable the
runtime's GC assist mechanism. The knob is exposed under the `GODEBUG`
environment variable, and can be accessed with `GODEBUG=gcnoassist=1`.

For now, this is just meant for experimentation purposes. It will assist us
as we look to reduce the impact the GC has on tail latency.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/gcNoAssist branch from 03b397e to 06bc890 Compare December 5, 2023 22:43
@nvanbenschoten
Copy link
Member Author

TFTR! Merging since @rail is planning to bump the go version to go1.21.5 later today.

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 6, 2023

Build succeeded:

@craig craig bot merged commit d8f6bfd into cockroachdb:master Dec 6, 2023
9 checks passed
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/gcNoAssist branch December 11, 2023 19:22
@shralex

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runtime: expose knob to disable GC assist
4 participants