-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add MERGING descriptor state #75663
Conversation
I split this out from the index-backfiller PR to make it a bit easier to review. |
34936b0
to
089cb37
Compare
089cb37
to
e5b6ee1
Compare
e5b6ee1
to
50e05a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just left some suggestions.
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rhu713 and @stevendanna)
-- commits, line 22 at r1:
In this timeline, when is a=2
written to the index?
-- commits, line 28 at r1:
Can you expand on this? When does the ForceMode get turned on and off in the example above?
pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):
query T kvtrace UPDATE ti SET b = 2 WHERE a = 4
Can you add similar tests for UPSERT
and INSERT .. ON CONFLICT .. DO UPDATE ..
? There's two important cases for the latter: one where there is no conflict and a duplicate b
is inserted successfully, and another where there is a conflict and b
is successfully updated to a duplicate value.
pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):
c INT, FAMILY (a, b, c), INDEX (b, c)
In these tests, I guess we don't care whether the index is unique or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)
Previously, mgartner (Marcus Gartner) wrote…
In this timeline, when is
a=2
written to the index?
a=2 would get written to new-index
as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen between t0
and t6
. This index is them merged into the newly added index at some time above t6
not currently in the timeline.
Previously, mgartner (Marcus Gartner) wrote…
Can you expand on this? When does the ForceMode get turned on and off in the example above?
ForceMode would get turned on at t0
and would get turned off at a time not currently in the timeline after the "index merge" has taken place. I'll add this to the timeline.
pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Can you add similar tests for
UPSERT
andINSERT .. ON CONFLICT .. DO UPDATE ..
? There's two important cases for the latter: one where there is no conflict and a duplicateb
is inserted successfully, and another where there is a conflict andb
is successfully updated to a duplicate value.
Just for clarity: "and another where there is a conflict" do you mean conflict on the primary column a
?
pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):
Should we add some validation in |
pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file): Previously, stevendanna (Steven Danna) wrote…
Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true). |
pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file): Previously, mgartner (Marcus Gartner) wrote…
Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)
pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a
ForcePut=true
index is ever an "active" index.
I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.
It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to this field is a new index state in the state machine. We've taken to liking these in the new schema changer. Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY
put you use the force put. You need to move to DELETE_AND_WRITE_ONLY
before you can start validating (right?).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)
pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.
It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.
Big 👍 on doing descriptor validation in the descriptor validation framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY put you use the force put. You need to move to DELETE_AND_WRITE_ONLY before you can start validating (right?).
This has the nice benefit of it being impossible for the ForcePut index to ever become "PUBLIC".
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)
50e05a9
to
3176055
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the new MERGING state as recommended and moved the tests out of logic test and over to a small data-driven test that supports moving an index into the merging state.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @mgartner, and @rhu713)
pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):
Previously, ajwerner wrote…
Big 👍 on doing descriptor validation in the descriptor validation framework.
I'll add some validation for UseDeletePreservingIndex in a different PR. For now, we no longer need to validate this because it no longer exists.
pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true).
Done.
pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
In these tests, I guess we don't care whether the index is unique or not?
Can't hurt to make these unique as far as I can see. I made as many of the indexes under test unique as I could.
3176055
to
0ea9520
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add validation around here that this state (and the BACKFILLING state) are only used for indexes?
func validateMutation(m *descpb.DescriptorMutation) error { |
Reviewed 2 of 12 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)
pkg/sql/catalog/table_elements.go, line 205 at r2 (raw file):
// them. // // New indexes will be checked for uniqueness at the end of the may miss
missing something around at the end of the may miss
0ea9520
to
d10972c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 12 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rhu713, and @stevendanna)
Previously, stevendanna (Steven Danna) wrote…
a=2 would get written to
new-index
as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen betweent0
andt6
. This index is them merged into the newly added index at some time abovet6
not currently in the timeline.
Thanks for clarifying the timelines.
-- commits, line 5 at r3:
To be clear, will non-unique indexes use the MERGING
state?
-- commits, line 32 at r3:
nit: end sentence in period
This PR adds a MERGING index descriptor state. This state is similar to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in cases where they would typically used CPut or InitPut. This change is in support of the mvcc-compatible index backfiller. As part of the backfill process, newly added indexes will be added in a BACKFILLING state in which they see no writes or deletes. However, this would allow for the following sequence of events. For simplicity, the delete-preserving temporary index state transitions have been elided: t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state and a delete-preserving "temp index" t1: `INSERT INTO` with a = 1 (new index does not see this write) t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index") t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes) t4: Backfilling completes t5: "new index" to DELETE-ONLY t6: "new index" to DELETE-AND-WRITE-ONLY t7: `INSERT INTO` into with a = 1 t8: a = 2 merged from "temp index" into "new index" Unfortunately, at t7 the user would encounter an erroneous duplicate key violation when they attempt their insert because the attempted CPut into "new index" would fail as there is an existing value (a=1). The correct value of a=2 will not be written into "new index" until a follow-up merge pass that occurs at t8. The MERGING phase is intended to solve this problem by allowing for the following sequence: t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state and a delete-preserving "temp index" t1: `INSERT INTO` with a = 1 (new index does not see this write) t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index") t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes) t4: Backfilling completes t5: "new index" to DELETE-ONLY t5: "new index" to MERGING t7: `INSERT INTO` into with a = 1 t8: a = 2 merged from "temp index" into "new index" t9: "new index" to DELETE-AND-WRITE-ONLY Note that while in the MERGING state, the new index may accumulate duplicate keys. This is OK because it will be checked for uniqueness before being brought online. The tests added here are nearly identical to the tests added for DeletePreservingEncoding but have been written using a small data-drive test that supports putting indexes into a mutated state. Release note: None
d10972c
to
0ee2977
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @rhu713, and @stevendanna)
Previously, mgartner (Marcus Gartner) wrote…
To be clear, will non-unique indexes use the
MERGING
state?
Yes, the plan is to use this state for all newly added indexes.
TFTR! bors r=mgartner,ajwerner |
Build failed (retrying...): |
Build succeeded: |
Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in cockroachdb#75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None Co-authored-by: Rui Hu <[email protected]>
73878: sql: introduce MVCC-compliant index backfiller r=ajwerner a=stevendanna Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in #72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in #75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None 76252: kvserver: account for AddSSTableRequests in QPS r=kvoli a=kvoli Previously, Queries-Per-Second (QPS) was calculated uniformly per `BatchRequest` as 1. This patch introduces variable QPS calculation for `AddSSTableRequest`, which use an order of magnitude more resources than other request types. This patch introduces the `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to attribute QPS to `AddSSTableRequest` sizes. The calculation is done as QPS = 1 + size(AddSSTableRequest) / factor. When `kv.replica_stats.addsst_request_size_factor` is less than 1, or no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1; the current behavior today. resolves #73731 Release note (performance improvement): Introduced `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to tune Queries-Per-Second (QPS) sensitivity to large imports. By default, this setting is disabled. When enabled, the size of any AddSSTableRequest will contribute to QPS in inverse relation to this settings magnitude. By default this setting configured to a conservative 50,000; every 50 kilobytes will be accounted for as an additional 1 QPS. 76617: bazel: make `CrdbTestBuild` `const` r=RaduBerinde a=rickystewart This partially reverts #72838. That change was made to avoid thrashing the cache when swapping between test- and non-test configurations. Today we have `dev cache` which can retain build artifacts across the different configurations, so this doesn't really serve a purpose any more. Indeed, you can now swap between `bazel build pkg/cmd/cockroach-short` and `bazel build pkg/cmd/cockroach-short --config=test` very quickly with Bazel downloading old versions of the built libraries if the built-in cache gets wiped. We still filter out the `go:build` lines in `crdb_test_{off,on}.go` so we don't have to set `gotags` for test and non-test, which still saves a lot of time and unnecessary recompilation. We have a check for this in CI so no one should be able to add a build constraint without us knowing. Release note: None 76627: util/tracing: improve span recording interface r=andreimatei a=andreimatei Before this patch, the Span's recording interface was left over from a time when there were only one recording mode: verbose. We now have two modes: verbose recording and structured recording. They can be enabled at span creation time through the WithRecording(<type>) option. This patch changes the Span's SetVerbose() method to expose the two options. Release note: None 76667: roachtest/test: fix ORM testing due to removed cluster setting r=rafiss a=otan `sql.catalog.unsafe_skip_system_config_trigger.enabled` got removed recently and was part of an alpha. Let's clean it up in ORMs too. Resolves #76654 Resolves #76655 Resolves #76656 Resolves #76657 Resolves #76658 Resolves #76659 Resolves #76660 Resolves #76661 Resolves #76662 Resolves #76663 Resolves #76664 Resolves #76665 Resolves #76666 Release note: None Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Rui Hu <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in cockroachdb#75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None Co-authored-by: Rui Hu <[email protected]>
This PR adds a MERGING index descriptor state. This state is similar
to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in
cases where they would typically used CPut or InitPut.
This change is in support of the mvcc-compatible index backfiller.
As part of the backfill process, newly added indexes will be added in
a BACKFILLING state in which they see no writes or deletes.
However, this would allow for the following sequence of events. For
simplicity, the delete-preserving temporary index state transitions
have been elided:
Unfortunately, at t7 the user would encounter an erroneous duplicate
key violation when they attempt their insert because the attempted
CPut into "new index" would fail as there is an existing
value (a=1). The correct value of a=2 will not be written into "new
index" until a follow-up merge pass that occurs at t8
The MERGING phase is intended to solve this problem by allowing for
the following sequence:
Note that while in the MERGING state, the new index may accumulate
duplicate keys. This is OK because it will be checked for uniqueness
before being brought online.
The tests added here are nearly identical to the tests added for
DeletePreservingEncoding but have been written using a small
data-drive test that supports putting indexes into a mutated state.
Fixes #75658
Release note: None