-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crosscluster/physical: split PCR procs dest side instead #135637
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
780b4c9
to
1618391
Compare
I think this should be fine in mixed verisons without any sort of cross-cluster flags:
|
This allows the dest clsuter to choose to split less when it has fewer nodes. Release note: none. Epic: none.
1618391
to
4bac5aa
Compare
@@ -320,14 +310,6 @@ func buildReplicationStreamSpec( | |||
return nil, err | |||
} | |||
|
|||
// If more partitions were requested, try to repartition the spans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad to see this logic moved to the consumer.
|
||
// If we have fewer partitions than we have nodes, try to repartition the | ||
// topology to have more partitions. | ||
topology = repartitionTopology(topology, len(sqlInstanceIDs)*8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason you wanted to remove this tunable cluster setting? Like, if we encounter another pcr oom, it may be nice to have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OOM was in smaller dest clusters than src clusters where we'd create >8 procs per node, whereas now we are using the src cluster size directly here, so we no longer risk that specific behavior. The setting also wasn't settable in CC where the sys tenant is inaccessible so I think having the code tune itself, as we do now, rather than having a tuning knob, is what we need anyway.
out.Partitions = make([]streamclient.PartitionInfo, 0, targetPartCount) | ||
// For each partition in the input, put some number of copies of it into the | ||
// output each containing some fraction of its spans. | ||
for _, p := range in.Partitions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed the round robin behavior from the previous repartition algorithm. why? I'm referring to this code block from the old func:
for x, sp := range partitions[part].Spans {
repartitioned[x%parts].Spans = append(repartitioned[x%parts].Spans, sp)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We like adjacent spans in the same proc -- we added a whole new planning mode to get it! -- and round-robin seems antithetical to that, so I think I was just misguided when I added it initially. If I'm wrong we can being it back.
TFTR! bors r+ |
Release note: none. Epic: none. This regressed in cockroachdb#135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token.
Release note: none. Epic: none. This regressed in cockroachdb#135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token.
Release note: none. Epic: none. This regressed in cockroachdb#135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token.
135564: build/teamcity: add changes to enable openmetrics in nightly roachtests r=nameisbhaskar a=sambhav-jain-16 There was a regression in #135239. The change was reverted and this PR aims to fix the regressions Epic: https://cockroachlabs.atlassian.net/browse/CRDB-41852 Release note: None 135596: security: bugfix, ensure cert expiry metrics reflect reloaded certs r=angles-n-daemons a=angles-n-daemons security: bugfix, ensure cert expiry metrics reflect reloaded certs The PR #130110 added certificate TTL metrics alongside our existing expiration metrics. Prior to that change, the certificate metrics values were updated on each metrics load. Afterwards, new metrics objects were created for each load of certificates. This created a bug in that the new expiration values would not be found in any of the system exhaust (metrics scrape or tsdb) because the registered metrics objects were the ones created on startup. This new change instead allows the metrics to close the whole CertificateManager object, so that they only need to be created once, and therefore the initial registration of metrics reflects persistently valid values. Release note (bug fix): security.certificate.* metrics will now be updated if a node loads new certificates while running. Epic: none Fixes: #135093 136122: crosscluster/physical: update tokens when altering topology r=dt a=dt Release note: none. Epic: none. This regressed in #135637 which was assigning all conusmer sub-partitions the whole partition due to sharing the original token. 136182: kvserver: use leader leases in various flow control tests r=kvoli a=arulajmani See individual commits for details. Co-authored-by: Sambhav Jain <[email protected]> Co-authored-by: Brian Dillmann <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>
This allows the dest clsuter to choose to split less when it has fewer nodes.
Release note: none.
Epic: none.