Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: record batch requests with no gateway #85178

Merged
merged 1 commit into from
Aug 1, 2022

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Jul 28, 2022

Previously, batch requests with no GatewayNodeID would not be
accounted for on the QPS of a replica. By extension, the store QPS would
also not aggregate this missing QPS over replicas it holds. This patch
introduces tracking for all requests, regardless of the GatewayNodeID.

This was done to as follow the workload lease transfers consider the
per-locality counts, therefore untagged localities were not useful. This
has since been updated to ignore filter out localities directly, so it
is not necessary to exclude them anymore.

leaseholderStats, which previously tracked the QPS, and writeStats
tracking the mvcc keys written, have also been removed. They are
duplicated in batchRequest and writeKeys respectively, within the
loadStats of a replica.

resolves #85157

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 220718.gateway-replstats branch 3 times, most recently from 04c3cef to eac47e9 Compare July 28, 2022 00:09
@kvoli kvoli changed the title kvserver: honor lease preferences in store balance kvserver: record batch requests with no gateway Jul 28, 2022
@kvoli kvoli force-pushed the 220718.gateway-replstats branch 5 times, most recently from fd17a2a to 04b43de Compare July 28, 2022 21:14
@kvoli kvoli marked this pull request as ready for review July 28, 2022 22:52
@kvoli kvoli requested a review from a team as a code owner July 28, 2022 22:52
@kvoli kvoli requested a review from lidorcarmel July 28, 2022 22:52
@kvoli kvoli self-assigned this Jul 28, 2022
Copy link
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 18 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replica_application_state_machine.go line 1014 at r1 (raw file):

	// Record the write activity, passing a 0 nodeID because replica.writeStats
	// intentionally doesn't track the origin of the writes.
	b.r.loadStats.writeKeys.RecordCount(float64(b.mutations), 0)

is it guaranteed that loadStats cannot be nil here?

Code quote:

b.r.loadStats.writeKeys.RecordCount(float64(b.mutations), 0

pkg/kv/kvserver/replica_send.go line 1006 at r1 (raw file):

func (r *Replica) recordBatchRequestLoad(ctx context.Context, ba *roachpb.BatchRequest) {
	if r.loadStats == nil {
		log.Eventf(

when would this happen? only in tests maybe? if in prod, will it spam the log? if yes then either move this to be a level N log (vmodule) or log every N messages or something like that..

Code quote:

r.loadStats == nil {

pkg/kv/kvserver/replica_send.go line 1017 at r1 (raw file):

	// estimate of a BatchRequest. See getBatchRequestQPS() for the
	// calculation.
	scaledQPS := r.getBatchRequestQPS(ctx, ba)

nit: adjustedQPS maybe? when I see scale I think of other things, sorry :(

Code quote:

scaledQPS

pkg/kv/kvserver/replica_send.go line 1023 at r1 (raw file):

	// requests, for follow the workload lease transfers. We now record these
	// regardless of the gateway node id, as the only user of this information
	// does not consider untagged localities. See

I'm wondering whether this really has value here? definitely this should be in the PR/commit description, and it is, but do readers a year from now would need to know this info? feel free to keep it if you think it has any value..

Code quote:

	// NB: Previously, we would ignore batch requests which had no gateway node
	// id. This was in order to retrieve accurate per-locality counts of
	// requests, for follow the workload lease transfers. We now record these
	// regardless of the gateway node id, as the only user of this information
	// does not consider untagged localities. See
	// allocator.shouldTransferLeaseForAccessLocality() .

pkg/kv/kvserver/store_merge.go line 149 at r1 (raw file):

	if leftRepl.leaseholderStats != nil {
		leftRepl.leaseholderStats.ResetRequestCounts()

question: what does this change mean? that we used to reset qps stats here and now we don't how will it affect the behavior after merge? I'll try to look later.

Code quote:

	if leftRepl.leaseholderStats != nil {
		leftRepl.leaseholderStats.ResetRequestCounts()
	}
	if leftRepl.writeStats != nil {
		// Note: this could be drastically improved by adding a replicaStats method
		// that merges stats. Resetting stats is typically bad for the rebalancing
		// logic that depends on them.
		leftRepl.writeStats.ResetRequestCounts()
	}

@kvoli kvoli force-pushed the 220718.gateway-replstats branch from 04b43de to e422a4c Compare August 1, 2022 14:10
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lidorcarmel)


pkg/kv/kvserver/replica_application_state_machine.go line 1014 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

is it guaranteed that loadStats cannot be nil here?

Yes, it cannot be nil. This is more of a cleanup - this guard was added initially out of caution but its redundant.

The only case where it could be nil is in some tests which manually create a store, then never pass in the store pool config - which is checked when creating an unloaded replica, in deciding to initialize the load stats.


pkg/kv/kvserver/replica_send.go line 1006 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

when would this happen? only in tests maybe? if in prod, will it spam the log? if yes then either move this to be a level N log (vmodule) or log every N messages or something like that..

Only in tests. However i updated to 3.


pkg/kv/kvserver/replica_send.go line 1017 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

nit: adjustedQPS maybe? when I see scale I think of other things, sorry :(

updated.


pkg/kv/kvserver/replica_send.go line 1023 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

I'm wondering whether this really has value here? definitely this should be in the PR/commit description, and it is, but do readers a year from now would need to know this info? feel free to keep it if you think it has any value..

Fair point - removed.


pkg/kv/kvserver/store_merge.go line 149 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

question: what does this change mean? that we used to reset qps stats here and now we don't how will it affect the behavior after merge? I'll try to look later.

The previous behavior was a relic of before having a "merge stats" function.

It would wipe the lhs - before merging in the rhs.

However this is illogical - the whole point of merging is to combine both the lhs and rhs.

This is updated to be correct. I don't think it will have much effect on behavior because they are being merged for low size/qps reason.

Previously, batch requests with no `GatewayNodeID` would not be
accounted for on the QPS of a replica. By extension, the store QPS would
also not aggregate this missing QPS over replicas it holds. This patch
introduces tracking for all requests, regardless of the `GatewayNodeID`.

This was done to as follow the workload lease transfers consider the
per-locality counts, therefore untagged localities were not useful. This
has since been updated to ignore filter out localities directly, so it
is not necessary to exclude them anymore.

`leaseholderStats`, which previously tracked the QPS, and `writeStats`
tracking the mvcc keys written, have also been removed. They are
duplicated in `batchRequest` and `writeKeys` respectively, within the
`loadStats` of a replica.

resolves cockroachdb#85157

Release note: None
@kvoli kvoli force-pushed the 220718.gateway-replstats branch from e422a4c to 0fea845 Compare August 1, 2022 15:18
Copy link
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lidorcarmel)

@kvoli
Copy link
Collaborator Author

kvoli commented Aug 1, 2022

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 1, 2022

Build succeeded:

@craig craig bot merged commit 314baa5 into cockroachdb:master Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: account for no node id in leaseholder stats
3 participants