-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: remove maybeMarkReplicaInitializedLockedReplLocked #94912
Comments
cc @cockroachdb/replication |
Only the second one even removes the uninited replica from cockroach/pkg/kv/kvserver/store_split.go Lines 312 to 323 in 352d2ef
There are probably reasons for that. We have already called cockroach/pkg/kv/kvserver/store_split.go Lines 253 to 257 in 352d2ef
Also, it seems sketchy how we're updating the span of the LHS replica and that of the RHS replica in separate cockroach/pkg/kv/kvserver/store_split.go Lines 312 to 325 in 352d2ef
|
cc @pavelkalinnikov |
There are mainly two aspects of initializing a
Also, the two should probably be done with some synchronization. The two code paths linked in the issue description reside in Seemingly, it is the "glue" between these two things that is currently ad-hoc / scattered all over the place. For example, the split trigger application path does (1) through a "good" function, but not so well on (2) - it's hidden in split trigger code. The snapshot application does the opposite: on (2) it uses the "maybe mark" func, but does (1) in |
94943: kvserver: don't load uninitialized replicas r=tbg a=pavelkalinnikov This change stops using `loadRaftMuLockedReplicaMuLocked` when creating an uninitialized replica, to make it clearer what happens in this case. It turns out that what it did for uninitialized desc is equivalent to setting an empty `ReplicaState` and asserting the match between in-memory and on-disk state. Informs #94912, #93898 Epic: CRDB-220 Release note: None Co-authored-by: Pavel Kalinnikov <[email protected]>
96031: sql: add mixed version test for system.role_members user ids upgrade r=rafiss a=andyyang890 This patch adds a mixed version logictest that ensures that GRANT ROLE continues to work properly in a cluster with both 22.2 and 23.1 nodes (i.e. nodes that have run the system.role_members user ids upgrade). Part of #92342 Release note: None 96127: kvserver: introduce cpu rebalancing r=nvanbenschoten a=kvoli This patch allows the store rebalancer to use CPU in place of QPS when balancing load on a cluster. This patch adds `cpu` as an option with the cluster setting: `kv.allocator.load_based_rebalancing.objective` When set to `cpu`, rather than `qps`. The store rebalancer will perform a mostly identical function, however target balancing the sum of all replica's cpu time on each store, rather than qps. The default remains as `qps` here. Similar to QPS, the rebalance threshold can be set to allow controlling the range above and below the mean store CPU is considered imbalanced, either overfull or underfull respectively: `kv.allocator.cpu_rebalance_threshold`: 0.1 In order to manage with mixed versions during upgrade and some architectures not supporting the cpu sampling method, a rebalance objective manager is introduced in `rebalance_objective.go`. The manager mediates access to the rebalance objective and overwrites it in cases where the objective set in the cluster setting cannot be supported. The results when using CPU in comparison to QPS can be found [here](https://docs.google.com/document/d/1QLhD20BTamjj3-dSG9F1gW7XMBy9miGPpJpmu2Dn3yo/edit#) (internal). <details> <summary>Results Summary</summary> ![image](https://user-images.githubusercontent.com/39606633/215580650-b12ff509-5cf5-4ffa-880d-8387e2ef0afa.png) ![image](https://user-images.githubusercontent.com/39606633/215580626-3d748ba1-e9a4-4abb-8acd-2c319203932e.png) ![image](https://user-images.githubusercontent.com/39606633/215580585-58e6000d-b6cf-430a-b4b7-d14a77eab3bd.png) </details> <details> <summary>Detailed Allocbench Results</summary> ``` kv/r=0/access=skew master median cost(gb):05.81 cpu(%):14.97 write(%):37.83 stddev cost(gb):01.87 cpu(%):03.98 write(%):07.01 cpu rebalancing median cost(gb):08.76 cpu(%):14.42 write(%):36.61 stddev cost(gb):02.66 cpu(%):01.85 write(%):04.80 kv/r=0/ops=skew master median cost(gb):06.23 cpu(%):26.05 write(%):57.33 stddev cost(gb):02.92 cpu(%):05.83 write(%):08.20 cpu rebalancing median cost(gb):04.28 cpu(%):11.45 write(%):31.28 stddev cost(gb):02.25 cpu(%):02.51 write(%):06.68 kv/r=50/ops=skew master median cost(gb):04.36 cpu(%):22.84 write(%):48.09 stddev cost(gb):01.12 cpu(%):02.71 write(%):05.51 cpu rebalancing median cost(gb):04.64 cpu(%):13.49 write(%):43.05 stddev cost(gb):01.07 cpu(%):01.26 write(%):08.58 kv/r=95/access=skew master median cost(gb):00.00 cpu(%):09.51 write(%):01.24 stddev cost(gb):00.00 cpu(%):01.74 write(%):00.27 cpu rebalancing median cost(gb):00.00 cpu(%):05.66 write(%):01.31 stddev cost(gb):00.00 cpu(%):01.56 write(%):00.26 kv/r=95/ops=skew master median cost(gb):0.00 cpu(%):47.29 write(%):00.93 stddev cost(gb):0.09 cpu(%):04.30 write(%):00.17 cpu rebalancing median cost(gb):0.00 cpu(%):08.16 write(%):01.30 stddev cost(gb):0.01 cpu(%):04.59 write(%):00.20 ``` </details> resolves: #95380 Release note (ops change) Add option to balance cpu time (cpu) instead of queries per second (qps) among stores in a cluster. This is done by setting `kv.allocator.load_based_rebalancing.objective='cpu'`. `kv.allocator.cpu_rebalance_threshold` is also added, similar to `kv.allocator.qps_rebalance_threshold` to control the target range for store cpu above and below the cluster mean. 96440: ui: add execution insights to statement and transaction fingerprint details r=ericharmeling a=ericharmeling This commit adds execution insights to the Statement Fingerprint and Transaction Fingerprint Details pages. Part of #83780. Loom: https://www.loom.com/share/98d2023b672e43fa8016829aa641a829 Note that the SQL queries against the `*_execution_insights` tables are updated to `SELECT DISTINCT ON (*_fingerprint_id, problems, causes)` (equivalent to `GROUP BY (*_fingerprint_id, problems, causes)`) from the latest results in the tables, rather than `row_number() OVER ( PARTITION BY stmt_fingerprint_id, problem, causes ORDER BY end_time DESC ) AS rank... WHERE rank = 1`. Both patterns return the same result, but one uses aggregation and the other uses a window function. I find the `DISTINCT ON/GROUP BY` pattern easier to understand, I'm not seeing much difference in the planning/execution time between the two over the same set of data, and I'm seeing `DISTINCT ON/GROUP BY` coming up as more performant in almost all the secondary sources I've encountered. Release note (ui change): Added execution insights to the Statement Fingerprint Details and Transaction Fingerprint Details Pages. 96828: collatedstring: support default, C, and POSIX in expressions r=otan a=rafiss fixes #50734 fixes #95667 informs #57255 --- ### collatedstring: create new package Move the small amount of code from tree/collatedstring.go --- ### collatedstring: support C and POSIX in expressions Release note (sql change): Expressions of the form `COLLATE "default"`, `COLLATE "C"`, and `COLLATE "POSIX"` are now supported. Since the default collation cannot be changed currently, these expressions are all equivalent. The expressions are evaluated by treating the input as a normal string, and ignoring the collation. This means that comparisons between strings and collated strings that use "default", "C", or "POSIX" are now supported. Creating a column with the "C" or "POSIX" collations is still not supported. 96870: kvserver: use replicasByKey addition func in snapshot path r=tbg a=pavelkalinnikov This commit makes one step towards better code sharing between `Replica` initialization paths: split trigger and snapshot application. It makes both to use the same method to check and insert the initialized `Replica` to `replicasByKey` map. Touches #94912 96874: roachtest: run scheduled backup only on clusters with enterprise license r=stevendanna a=msbutler Epic: none Release note: None 96883: go.mod: bump Pebble to 829675f94811 r=RaduBerinde a=RaduBerinde 829675f9 db: fix ObsoleteSize stat 2f086b74 db: refactor compaction splitting to reduce key comparisons Release note: None Epic: none Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Eric Harmeling <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Pavel Kalinnikov <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Radu Berinde <[email protected]>
96943: kvserver: refactor Replica and Store updates on replica init r=tbg a=pavelkalinnikov This PR further increases code sharing between the two replica initialization paths: split trigger and snapshot application. It also improves readability of these paths. Touches #94912 Co-authored-by: Pavel Kalinnikov <[email protected]>
Related to #94911.
We seem to have three paths that initialize a Replica. The first one is
cockroach/pkg/kv/kvserver/replica_init.go
Lines 157 to 238 in 3d6fe9e
and as explain in its comment, this is what transitions most uninitialized replicas to initialized.
However, then there's also
cockroach/pkg/kv/kvserver/store_create_replica.go
Lines 367 to 430 in e1c24ed
which has as its sole caller the
applySnapshot
path.We should unify them by replacing the latter use with the former.
Jira issue: CRDB-23225
Epic: CRDB-220
The text was updated successfully, but these errors were encountered: