Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvclient: optimize and clean up sorting computation #113942

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

andrewbaptist
Copy link
Collaborator

Previously the locality distance and the latency function were computed multiple times for each node in the sort.Slice method. This change computes the values once when the ReplicaSlice is created and uses simple comparisions within the sorting loop.

Epic: none
Informs: #112351

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andrewbaptist andrewbaptist force-pushed the 2023-11-06-faster-sorting branch 2 times, most recently from a3a351a to 3559c6d Compare November 9, 2023 20:20
Copy link

blathers-crl bot commented Nov 9, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@andrewbaptist andrewbaptist force-pushed the 2023-11-06-faster-sorting branch from 3559c6d to ee63e8f Compare November 9, 2023 20:54
@andrewbaptist andrewbaptist marked this pull request as ready for review November 10, 2023 14:13
@andrewbaptist andrewbaptist requested a review from a team as a code owner November 10, 2023 14:13
@andrewbaptist andrewbaptist force-pushed the 2023-11-06-faster-sorting branch from ee63e8f to 2d9a90d Compare November 10, 2023 14:17
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


-- commits line 6 at r1:
nit: as written, we're computing these values in OptimizeReplicaOrder, not when the ReplicaSlice is created. We should use local variables instead of saving these fields on ReplicaInfo instead.


pkg/kv/kvclient/kvcoord/replica_slice.go line 198 at r1 (raw file):

		return
	}
	// populate the tier match length and locality before starting the loop.

nit: s/populate/Populate/g


pkg/kv/kvclient/kvcoord/replica_slice.go line 200 at r1 (raw file):

	// populate the tier match length and locality before starting the loop.
	for i := range rs {
		rs[i].tierMatchLength = locality.SharedPrefix(rs[i].Locality)

It's a bit odd that we're modifying ReplicaInfo.tierMatchLength and ReplicaInfo.latency here, on the receiver, and then reading them right below. Should these be local function variables instead?


pkg/kv/kvclient/kvcoord/replica_slice.go line 214 at r1 (raw file):

	sort.Slice(rs, func(i, j int) bool {
		// Replicas on the same node have the same latency.
		if rs[i].NodeID == rs[j].NodeID {

Did you mean to remove this because this isn't possible?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)


pkg/kv/kvclient/kvcoord/replica_slice.go line 29 at r1 (raw file):

	roachpb.ReplicaDescriptor
	Locality        roachpb.Locality
	tierMatchLength int

Consider adding a // Fields precomputed when ordering replicas in a range. above this line.


pkg/kv/kvclient/kvcoord/replica_slice.go line 198 at r1 (raw file):

		return
	}
	// populate the tier match length and locality before starting the loop.

nit: "Populate"


pkg/kv/kvclient/kvcoord/replica_slice.go line 200 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

It's a bit odd that we're modifying ReplicaInfo.tierMatchLength and ReplicaInfo.latency here, on the receiver, and then reading them right below. Should these be local function variables instead?

I think the point is to avoid a new heap allocation by hanging this information on the existing slice.


pkg/kv/kvclient/kvcoord/replica_slice.go line 214 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Did you mean to remove this because this isn't possible?

I'm also not sure. Why remove this? If two replicas are on the same node then the following checks should probably all fall through and we should return false, but that's not actually the case if rs[i].NodeID == nodeID.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @nvanbenschoten)


-- commits line 6 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

nit: as written, we're computing these values in OptimizeReplicaOrder, not when the ReplicaSlice is created. We should use local variables instead of saving these fields on ReplicaInfo instead.

As we're not going with the local variables approach to avoid heap allocations, let's update the commit message to better reflect where these fields are assigned.


pkg/kv/kvclient/kvcoord/replica_slice.go line 29 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider adding a // Fields precomputed when ordering replicas in a range. above this line.

Could you also add some words about who the latency and tierMatchLength is between?


pkg/kv/kvclient/kvcoord/replica_slice.go line 200 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think the point is to avoid a new heap allocation by hanging this information on the existing slice.

Ack.

@andrewbaptist andrewbaptist force-pushed the 2023-11-06-faster-sorting branch from 2d9a90d to 486ea13 Compare November 13, 2023 14:46
Copy link
Collaborator Author

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)


-- commits line 6 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

As we're not going with the local variables approach to avoid heap allocations, let's update the commit message to better reflect where these fields are assigned.

I added this to both the commit and a header on the method. I have another PR which makes these structs more "private" that I'll put up later today as part of #114033. But it is easier to pull this one out.


pkg/kv/kvclient/kvcoord/replica_slice.go line 29 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Could you also add some words about who the latency and tierMatchLength is between?

Added text.


pkg/kv/kvclient/kvcoord/replica_slice.go line 198 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: "Populate"

Fixed


pkg/kv/kvclient/kvcoord/replica_slice.go line 200 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Ack.

Yep - this is not great right now, but I'm hoping to clean this up more in the next few commits.


pkg/kv/kvclient/kvcoord/replica_slice.go line 214 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm also not sure. Why remove this? If two replicas are on the same node then the following checks should probably all fall through and we should return false, but that's not actually the case if rs[i].NodeID == nodeID.

You're both correct that it was wrong before. I simplified the sorting a lot by introducing a -1 value for local node latency. This handles all the cases the same as before and I think is much simpler to reason about since there are less special cases. The key to note is that if NodeID is equal, then it will always drop to the last comparison and now always return false. We don't need to special case with checking if we are "on the local node" anymore.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andrewbaptist and @arulajmani)

Previously the locality distance, latency and healthfunctions were
computed multiple times for each node in the sort.Slice method.
This change computes the values once when the ReplicaSlice is
created and uses simple comparisions within the sorting loop.

Note that the added fields are only used in the Optimize loop.

Epic: none
Informs: cockroachdb#112351

Release note: None
@andrewbaptist andrewbaptist force-pushed the 2023-11-06-faster-sorting branch from 486ea13 to f3913e0 Compare November 13, 2023 18:58
@andrewbaptist
Copy link
Collaborator Author

bors r=nvanbenschoten

TFTR!

@craig
Copy link
Contributor

craig bot commented Nov 13, 2023

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Nov 13, 2023

Build succeeded:

@craig craig bot merged commit f9e21c4 into cockroachdb:master Nov 13, 2023
3 checks passed
@andrewbaptist andrewbaptist deleted the 2023-11-06-faster-sorting branch December 15, 2023 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants