Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvflowcontrol: annotate/fix perf regressions #109833

Merged
merged 1 commit into from
Sep 6, 2023

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Aug 31, 2023

  • Replace the flow controller level mutex-backed kvflowcontrol.Stream => token bucket map with sync.Map. On kv0/enc=false/nodes=3/cpu=96 accessing this map contributed to a high amount of mutex contention. We observe that this bucket is effectively read-only - entries for keys are written once (on creation) and read frequently after. We don't currently GC these buckets, but even if we did, the same access pattern would hold. We'll note that using a sync.Map is slightly more expensive CPU-wise.
  • Replace various map accesses with individual variables. We were needly using maps to access one of two variables, keyed by work class, for example when maintaining metrics per work class, or tracking token adjustments. The map accesses appeared prominently in CPU profiles and was unnecessary overhead.
  • Avoid using log.ExpensiveLogEnabled in hot code paths; it shows up in CPU profiles.
  • Slightly reduce the surface area of kvflowhandle.Handle.mu when returning flow tokens.
  • We also annotate various other points in the code where peep-hole optimizations exist, as surfaced by kv0/enc=false/nodes=3/cpu=96.

Part of #104154.

Release note: None

@irfansharif irfansharif requested a review from a team as a code owner August 31, 2023 17:59
@irfansharif irfansharif marked this pull request as draft August 31, 2023 17:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif changed the title [wip] kvflowcontrol: annotate/fix perf regressions kvflowcontrol: annotate/fix perf regressions Sep 1, 2023
@irfansharif irfansharif force-pushed the 230831.kvflowcontrol-perf branch 2 times, most recently from fe6c9b3 to a59d5ed Compare September 1, 2023 17:40
@irfansharif irfansharif marked this pull request as ready for review September 1, 2023 17:40
@irfansharif irfansharif requested a review from a team September 1, 2023 17:40
- Replace the flow controller level mutex-backed kvflowcontrol.Stream =>
  token bucket map with sync.Map. On kv0/enc=false/nodes=3/cpu=96
  accessing this map contributed to a high amount of mutex contention.
  We observe that this bucket is effectively read-only - entries for
  keys are written once (on creation) and read frequently after. We
  don't currently GC these buckets, but even if we did, the same access
  pattern would hold. We'll note that using a sync.Map is slightly more
  expensive CPU-wise.
- Replace various map accesses with individual variables. We were needly
  using maps to access one of two variables, keyed by work class, for
  example when maintaining metrics per work class, or tracking token
  adjustments. The map accesses appeared prominently in CPU profiles and
  was unnecessary overhead.
- Avoid using log.ExpensiveLogEnabled in hot code paths; it shows up in
  CPU profiles.
- Slightly reduce the surface area of kvflowhandle.Handle.mu when
  returning flow tokens.
- We also annotate various other points in the code where peep-hole
  optimizations exist, as surfaced by kv0/enc=false/nodes=3/cpu=96.

Part of cockroachdb#104154.

Release note: None
@irfansharif irfansharif force-pushed the 230831.kvflowcontrol-perf branch from a59d5ed to e571ffe Compare September 1, 2023 21:48
Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@irfansharif
Copy link
Contributor Author

I'll bors but if there are any follow ups/need to revert, I'll do it then.

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 6, 2023

Build succeeded:

@craig craig bot merged commit 2b6e80e into cockroachdb:master Sep 6, 2023
@irfansharif irfansharif deleted the 230831.kvflowcontrol-perf branch September 6, 2023 20:22
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 72 at r1 (raw file):

		// or when completely inactive (no tokens deducted/returned over 30+
		// minutes), clear these out.
		buckets     sync.Map // kvflowcontrol.Stream => *bucket

could use a code comment that says writes require the mutex.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 113 at r1 (raw file):

			// NB: We're holding the controller mutex here, which guards against
			// new buckets being added, synchronization we don't get out of
			// sync.Map.Range() directly.

is this the only reason we need updates to the map to hold the mutex?


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 376 at r1 (raw file):

func (b *bucket) tokensLocked(wc admissionpb.WorkClass) kvflowcontrol.Tokens {
	if wc == regular {
		return b.mu.tokensPerWorkClass.regular

this would be simpler if tokensPerWorkClass was an array [NumWorkClasses]kvflowcontrol.Tokens.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 401 at r1 (raw file):

	defer b.mu.Unlock()
	// TODO(irfansharif,aaditya): On kv0/enc=false/nodes=3/cpu=96 this mutex is
	// responsible for ~1.8% of the mutex contention. Maybe address it as part

We could use some more context before deciding whether to try to optimize this.
For example https://cockroachlabs.slack.com/archives/C01SRKWGHG8/p1667521022234139?thread_ts=1667515074.752629&cid=C01SRKWGHG8


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 133 at r1 (raw file):

type metrics struct {
	ElasticFlowTokensDeducted    *metric.Counter

why this change of not using an array?
Is array indexing really a performance hit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants