crosscluster/physical: split PCR procs dest side instead #135637

dt · 2024-11-18T22:02:45Z

This allows the dest clsuter to choose to split less when it has fewer nodes.

Release note: none.
Epic: none.

blathers-crl · 2024-11-18T22:02:49Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2024-11-18T22:02:55Z

This change is

dt · 2024-11-18T22:32:50Z

I think this should be fine in mixed verisons without any sort of cross-cluster flags:

If src is newer than dst, we just fall back to one proc per src node.
If dst is newer, it just doesn't re-split since it is already at target size.

This allows the dest clsuter to choose to split less when it has fewer nodes. Release note: none. Epic: none.

msbutler · 2024-11-19T16:05:21Z

pkg/ccl/crosscluster/producer/stream_lifetime.go

@@ -320,14 +310,6 @@ func buildReplicationStreamSpec(
 		return nil, err
 	}

-	// If more partitions were requested, try to repartition the spans.


glad to see this logic moved to the consumer.

msbutler · 2024-11-19T16:08:35Z

pkg/ccl/crosscluster/physical/stream_ingestion_dist.go

+
+		// If we have fewer partitions than we have nodes, try to repartition the
+		// topology to have more partitions.
+		topology = repartitionTopology(topology, len(sqlInstanceIDs)*8)


any reason you wanted to remove this tunable cluster setting? Like, if we encounter another pcr oom, it may be nice to have it.

The OOM was in smaller dest clusters than src clusters where we'd create >8 procs per node, whereas now we are using the src cluster size directly here, so we no longer risk that specific behavior. The setting also wasn't settable in CC where the sys tenant is inaccessible so I think having the code tune itself, as we do now, rather than having a tuning knob, is what we need anyway.

msbutler · 2024-11-19T16:22:55Z

pkg/ccl/crosscluster/physical/stream_ingestion_dist.go

+	out.Partitions = make([]streamclient.PartitionInfo, 0, targetPartCount)
+	// For each partition in the input, put some number of copies of it into the
+	// output each containing some fraction of its spans.
+	for _, p := range in.Partitions {


you removed the round robin behavior from the previous repartition algorithm. why? I'm referring to this code block from the old func:

for x, sp := range partitions[part].Spans { repartitioned[x%parts].Spans = append(repartitioned[x%parts].Spans, sp) }

We like adjacent spans in the same proc -- we added a whole new planning mode to get it! -- and round-robin seems antithetical to that, so I think I was just misguided when I added it initially. If I'm wrong we can being it back.

dt · 2024-11-19T18:43:38Z

TFTR!

bors r+

craig · 2024-11-19T19:14:00Z

Build succeeded:

Release note: none. Epic: none. This regressed in cockroachdb#135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token.

135564: build/teamcity: add changes to enable openmetrics in nightly roachtests r=nameisbhaskar a=sambhav-jain-16 There was a regression in #135239. The change was reverted and this PR aims to fix the regressions Epic: https://cockroachlabs.atlassian.net/browse/CRDB-41852 Release note: None 135596: security: bugfix, ensure cert expiry metrics reflect reloaded certs r=angles-n-daemons a=angles-n-daemons security: bugfix, ensure cert expiry metrics reflect reloaded certs The PR #130110 added certificate TTL metrics alongside our existing expiration metrics. Prior to that change, the certificate metrics values were updated on each metrics load. Afterwards, new metrics objects were created for each load of certificates. This created a bug in that the new expiration values would not be found in any of the system exhaust (metrics scrape or tsdb) because the registered metrics objects were the ones created on startup. This new change instead allows the metrics to close the whole CertificateManager object, so that they only need to be created once, and therefore the initial registration of metrics reflects persistently valid values. Release note (bug fix): security.certificate.* metrics will now be updated if a node loads new certificates while running. Epic: none Fixes: #135093 136122: crosscluster/physical: update tokens when altering topology r=dt a=dt Release note: none. Epic: none. This regressed in #135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token. 136182: kvserver: use leader leases in various flow control tests r=kvoli a=arulajmani See individual commits for details. Co-authored-by: Sambhav Jain <[email protected]> Co-authored-by: Brian Dillmann <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>

dt requested review from jeffswenson and msbutler November 18, 2024 22:02

dt requested a review from a team as a code owner November 18, 2024 22:02

dt force-pushed the pcr-shard-dest-side branch from 780b4c9 to 1618391 Compare November 18, 2024 22:31

crosscluster/physical: split PCR procs dest side instead

4bac5aa

This allows the dest clsuter to choose to split less when it has fewer nodes. Release note: none. Epic: none.

dt force-pushed the pcr-shard-dest-side branch from 1618391 to 4bac5aa Compare November 19, 2024 13:31

msbutler reviewed Nov 19, 2024

View reviewed changes

msbutler approved these changes Nov 19, 2024

View reviewed changes

craig bot merged commit 2b243f2 into cockroachdb:master Nov 19, 2024
22 of 23 checks passed

msbutler mentioned this pull request Nov 22, 2024

roachtest: c2c/initialscan/kv0 failed #135793

Closed

dt mentioned this pull request Nov 25, 2024

crosscluster/physical: update tokens when altering topology #136122

Merged

dt deleted the pcr-shard-dest-side branch November 26, 2024 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crosscluster/physical: split PCR procs dest side instead #135637

crosscluster/physical: split PCR procs dest side instead #135637

dt commented Nov 18, 2024

blathers-crl bot commented Nov 18, 2024

cockroach-teamcity commented Nov 18, 2024

dt commented Nov 18, 2024

msbutler Nov 19, 2024

msbutler Nov 19, 2024

dt Nov 19, 2024 •

edited

Loading

msbutler Nov 19, 2024

dt Nov 19, 2024

dt commented Nov 19, 2024

craig bot commented Nov 19, 2024

crosscluster/physical: split PCR procs dest side instead #135637

crosscluster/physical: split PCR procs dest side instead #135637

Conversation

dt commented Nov 18, 2024

blathers-crl bot commented Nov 18, 2024

cockroach-teamcity commented Nov 18, 2024

dt commented Nov 18, 2024

msbutler Nov 19, 2024

Choose a reason for hiding this comment

msbutler Nov 19, 2024

Choose a reason for hiding this comment

dt Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

msbutler Nov 19, 2024

Choose a reason for hiding this comment

dt Nov 19, 2024

Choose a reason for hiding this comment

dt commented Nov 19, 2024

craig bot commented Nov 19, 2024

dt Nov 19, 2024 •

edited

Loading