sql: add MERGING descriptor state #75663

stevendanna · 2022-01-28T18:54:43Z

This PR adds a MERGING index descriptor state. This state is similar
to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in
cases where they would typically used CPut or InitPut.

This change is in support of the mvcc-compatible index backfiller.

As part of the backfill process, newly added indexes will be added in
a BACKFILLING state in which they see no writes or deletes.

However, this would allow for the following sequence of events. For
simplicity, the delete-preserving temporary index state transitions
have been elided:

t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
     and a delete-preserving "temp index"
t1: `INSERT INTO` with a = 1 (new index does not see this write)
t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
t4: Backfilling completes
t5: "new index" to DELETE-ONLY
t6: "new index" to DELETE-AND-WRITE-ONLY
t7: `INSERT INTO` into with a = 1
t8: a = 2 merged from "temp index" into "new index"

Unfortunately, at t7 the user would encounter an erroneous duplicate
key violation when they attempt their insert because the attempted
CPut into "new index" would fail as there is an existing
value (a=1). The correct value of a=2 will not be written into "new
index" until a follow-up merge pass that occurs at t8

The MERGING phase is intended to solve this problem by allowing for
the following sequence:

t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
     and a delete-preserving "temp index"
t1: `INSERT INTO` with a = 1 (new index does not see this write)
t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
t4: Backfilling completes
t5: "new index" to DELETE-ONLY
t5: "new index" to MERGING
t7: `INSERT INTO` into with a = 1
t8: a = 2 merged from "temp index" into "new index"
t9: "new index" to DELETE-AND-WRITE-ONLY

Note that while in the MERGING state, the new index may accumulate
duplicate keys. This is OK because it will be checked for uniqueness
before being brought online.

The tests added here are nearly identical to the tests added for
DeletePreservingEncoding but have been written using a small
data-drive test that supports putting indexes into a mutated state.

Fixes #75658

Release note: None

cockroach-teamcity · 2022-01-28T18:54:50Z

This change is

stevendanna · 2022-01-28T18:56:00Z

I split this out from the index-backfiller PR to make it a bit easier to review.

mgartner

Looks great, just left some suggestions.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rhu713 and @stevendanna)

-- commits, line 22 at r1:
In this timeline, when is a=2 written to the index?

-- commits, line 28 at r1:
Can you expand on this? When does the ForceMode get turned on and off in the example above?

pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):


query T kvtrace
UPDATE ti SET b = 2 WHERE a = 4

Can you add similar tests for UPSERT and INSERT .. ON CONFLICT .. DO UPDATE ..? There's two important cases for the latter: one where there is no conflict and a duplicate b is inserted successfully, and another where there is a conflict and b is successfully updated to a duplicate value.

pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):

    c INT,
    FAMILY (a, b, c),
    INDEX (b, c)

In these tests, I guess we don't care whether the index is unique or not?

stevendanna

Thanks for the review!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

-- commits, line 22 at r1:

Previously, mgartner (Marcus Gartner) wrote…

In this timeline, when is a=2 written to the index?

a=2 would get written to new-index as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen between t0 and t6. This index is them merged into the newly added index at some time above t6 not currently in the timeline.

-- commits, line 28 at r1:

Previously, mgartner (Marcus Gartner) wrote…

Can you expand on this? When does the ForceMode get turned on and off in the example above?

ForceMode would get turned on at t0 and would get turned off at a time not currently in the timeline after the "index merge" has taken place. I'll add this to the timeline.

pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add similar tests for UPSERT and INSERT .. ON CONFLICT .. DO UPDATE ..? There's two important cases for the latter: one where there is no conflict and a duplicate b is inserted successfully, and another where there is a conflict and b is successfully updated to a duplicate value.

Just for clarity: "and another where there is a conflict" do you mean conflict on the primary column a?

mgartner · 2022-01-31T17:03:56Z

pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

  // This is used by the MVCC-compatible index backfiller to ensure
  // that unique indexes do not produce erroneous conflicts.
  optional bool force_put = 25 [(gogoproto.nullable) = false];

Should we add some validation in pkg/sql/catalog/tabledesc/index.go:validateTableIndexes to ensure this is only true when the index is part of an ongoing backfill? Should we do the same for use_delete_preserving_encoding?

mgartner · 2022-01-31T17:04:25Z

pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Just for clarity: "and another where there is a conflict" do you mean conflict on the primary column a?

Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true).

mgartner · 2022-01-31T17:15:37Z

pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should we add some validation in pkg/sql/catalog/tabledesc/index.go:validateTableIndexes to ensure this is only true when the index is part of an ongoing backfill? Should we do the same for use_delete_preserving_encoding?

Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a ForcePut=true index is ever an "active" index.

stevendanna

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a ForcePut=true index is ever an "active" index.

I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.

It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.

ajwerner

An alternative to this field is a new index state in the state machine. We've taken to liking these in the new schema changer. Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY put you use the force put. You need to move to DELETE_AND_WRITE_ONLY before you can start validating (right?).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.

It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.

Big 👍 on doing descriptor validation in the descriptor validation framework.

stevendanna

Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY put you use the force put. You need to move to DELETE_AND_WRITE_ONLY before you can start validating (right?).

This has the nice benefit of it being impossible for the ForcePut index to ever become "PUBLIC".

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

stevendanna

I've added the new MERGING state as recommended and moved the tests out of logic test and over to a small data-driven test that supports moving an index into the merging state.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @mgartner, and @rhu713)

pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, ajwerner wrote…

Big 👍 on doing descriptor validation in the descriptor validation framework.

I'll add some validation for UseDeletePreservingIndex in a different PR. For now, we no longer need to validate this because it no longer exists.

pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true).

Done.

pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In these tests, I guess we don't care whether the index is unique or not?

Can't hurt to make these unique as far as I can see. I made as many of the indexes under test unique as I could.

ajwerner

Can you add validation around here that this state (and the BACKFILLING state) are only used for indexes?

cockroach/pkg/sql/catalog/tabledesc/validate.go

Line 245 in 218f887

func validateMutation(m *descpb.DescriptorMutation) error {

Otherwise,

Reviewed 2 of 12 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

pkg/sql/catalog/table_elements.go, line 205 at r2 (raw file):

	// them.
	//
	// New indexes will be checked for uniqueness at the end of the may miss

missing something around at the end of the may miss

mgartner

Very nice!

Reviewed 6 of 12 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rhu713, and @stevendanna)

-- commits, line 22 at r1:

Previously, stevendanna (Steven Danna) wrote…

a=2 would get written to new-index as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen between t0 and t6. This index is them merged into the newly added index at some time above t6 not currently in the timeline.

Thanks for clarifying the timelines.

-- commits, line 5 at r3:
To be clear, will non-unique indexes use the MERGING state?

-- commits, line 32 at r3:
nit: end sentence in period

This PR adds a MERGING index descriptor state. This state is similar to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in cases where they would typically used CPut or InitPut. This change is in support of the mvcc-compatible index backfiller. As part of the backfill process, newly added indexes will be added in a BACKFILLING state in which they see no writes or deletes. However, this would allow for the following sequence of events. For simplicity, the delete-preserving temporary index state transitions have been elided: t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state and a delete-preserving "temp index" t1: `INSERT INTO` with a = 1 (new index does not see this write) t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index") t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes) t4: Backfilling completes t5: "new index" to DELETE-ONLY t6: "new index" to DELETE-AND-WRITE-ONLY t7: `INSERT INTO` into with a = 1 t8: a = 2 merged from "temp index" into "new index" Unfortunately, at t7 the user would encounter an erroneous duplicate key violation when they attempt their insert because the attempted CPut into "new index" would fail as there is an existing value (a=1). The correct value of a=2 will not be written into "new index" until a follow-up merge pass that occurs at t8. The MERGING phase is intended to solve this problem by allowing for the following sequence: t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state and a delete-preserving "temp index" t1: `INSERT INTO` with a = 1 (new index does not see this write) t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index") t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes) t4: Backfilling completes t5: "new index" to DELETE-ONLY t5: "new index" to MERGING t7: `INSERT INTO` into with a = 1 t8: a = 2 merged from "temp index" into "new index" t9: "new index" to DELETE-AND-WRITE-ONLY Note that while in the MERGING state, the new index may accumulate duplicate keys. This is OK because it will be checked for uniqueness before being brought online. The tests added here are nearly identical to the tests added for DeletePreservingEncoding but have been written using a small data-drive test that supports putting indexes into a mutated state. Release note: None

stevendanna

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @rhu713, and @stevendanna)

-- commits, line 5 at r3:

Previously, mgartner (Marcus Gartner) wrote…

To be clear, will non-unique indexes use the MERGING state?

Yes, the plan is to use this state for all newly added indexes.

stevendanna · 2022-02-02T12:06:14Z

TFTR!

bors r=mgartner,ajwerner

craig · 2022-02-02T13:29:17Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-02-02T15:24:49Z

Build succeeded:

GitHub CI (Cockroach)

Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in cockroachdb#75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None Co-authored-by: Rui Hu <[email protected]>

73878: sql: introduce MVCC-compliant index backfiller r=ajwerner a=stevendanna Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in #72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in #75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None 76252: kvserver: account for AddSSTableRequests in QPS r=kvoli a=kvoli Previously, Queries-Per-Second (QPS) was calculated uniformly per `BatchRequest` as 1. This patch introduces variable QPS calculation for `AddSSTableRequest`, which use an order of magnitude more resources than other request types. This patch introduces the `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to attribute QPS to `AddSSTableRequest` sizes. The calculation is done as QPS = 1 + size(AddSSTableRequest) / factor. When `kv.replica_stats.addsst_request_size_factor` is less than 1, or no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1; the current behavior today. resolves #73731 Release note (performance improvement): Introduced `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to tune Queries-Per-Second (QPS) sensitivity to large imports. By default, this setting is disabled. When enabled, the size of any AddSSTableRequest will contribute to QPS in inverse relation to this settings magnitude. By default this setting configured to a conservative 50,000; every 50 kilobytes will be accounted for as an additional 1 QPS. 76617: bazel: make `CrdbTestBuild` `const` r=RaduBerinde a=rickystewart This partially reverts #72838. That change was made to avoid thrashing the cache when swapping between test- and non-test configurations. Today we have `dev cache` which can retain build artifacts across the different configurations, so this doesn't really serve a purpose any more. Indeed, you can now swap between `bazel build pkg/cmd/cockroach-short` and `bazel build pkg/cmd/cockroach-short --config=test` very quickly with Bazel downloading old versions of the built libraries if the built-in cache gets wiped. We still filter out the `go:build` lines in `crdb_test_{off,on}.go` so we don't have to set `gotags` for test and non-test, which still saves a lot of time and unnecessary recompilation. We have a check for this in CI so no one should be able to add a build constraint without us knowing. Release note: None 76627: util/tracing: improve span recording interface r=andreimatei a=andreimatei Before this patch, the Span's recording interface was left over from a time when there were only one recording mode: verbose. We now have two modes: verbose recording and structured recording. They can be enabled at span creation time through the WithRecording(<type>) option. This patch changes the Span's SetVerbose() method to expose the two options. Release note: None 76667: roachtest/test: fix ORM testing due to removed cluster setting r=rafiss a=otan `sql.catalog.unsafe_skip_system_config_trigger.enabled` got removed recently and was part of an alpha. Let's clean it up in ORMs too. Resolves #76654 Resolves #76655 Resolves #76656 Resolves #76657 Resolves #76658 Resolves #76659 Resolves #76660 Resolves #76661 Resolves #76662 Resolves #76663 Resolves #76664 Resolves #76665 Resolves #76666 Release note: None Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Rui Hu <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Oliver Tan <[email protected]>

Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in cockroachdb#75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None Co-authored-by: Rui Hu <[email protected]>

stevendanna requested a review from a team January 28, 2022 18:54

stevendanna requested a review from a team as a code owner January 28, 2022 18:54

stevendanna force-pushed the ssd/force-put branch from 34936b0 to 089cb37 Compare January 28, 2022 19:03

stevendanna requested a review from rhu713 January 28, 2022 19:06

stevendanna force-pushed the ssd/force-put branch from 089cb37 to e5b6ee1 Compare January 29, 2022 01:00

stevendanna requested a review from a team as a code owner January 29, 2022 01:00

stevendanna force-pushed the ssd/force-put branch from e5b6ee1 to 50e05a9 Compare January 29, 2022 18:41

mgartner approved these changes Jan 31, 2022

View reviewed changes

stevendanna commented Jan 31, 2022

View reviewed changes

ajwerner reviewed Jan 31, 2022

View reviewed changes

stevendanna commented Jan 31, 2022

View reviewed changes

stevendanna force-pushed the ssd/force-put branch from 50e05a9 to 3176055 Compare February 1, 2022 02:00

stevendanna commented Feb 1, 2022

View reviewed changes

stevendanna force-pushed the ssd/force-put branch from 3176055 to 0ea9520 Compare February 1, 2022 02:57

stevendanna changed the title ~~sql: add ForcePut option to indexes~~ sql: add MERGING descriptor state Feb 1, 2022

stevendanna mentioned this pull request Feb 1, 2022

sql: introduce MVCC-compliant index backfiller #73878

Merged

ajwerner approved these changes Feb 1, 2022

View reviewed changes

stevendanna force-pushed the ssd/force-put branch from 0ea9520 to d10972c Compare February 1, 2022 14:45

mgartner approved these changes Feb 1, 2022

View reviewed changes

stevendanna force-pushed the ssd/force-put branch from d10972c to 0ee2977 Compare February 2, 2022 10:05

stevendanna commented Feb 2, 2022

View reviewed changes

craig bot merged commit 39ca6ac into cockroachdb:master Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add MERGING descriptor state #75663

sql: add MERGING descriptor state #75663

stevendanna commented Jan 28, 2022 •

edited by ajwerner

Loading

cockroach-teamcity commented Jan 28, 2022

stevendanna commented Jan 28, 2022

mgartner left a comment

stevendanna left a comment

mgartner commented Jan 31, 2022

mgartner commented Jan 31, 2022

mgartner commented Jan 31, 2022

stevendanna left a comment

ajwerner left a comment

stevendanna left a comment

stevendanna left a comment

ajwerner left a comment

mgartner left a comment

stevendanna left a comment

stevendanna commented Feb 2, 2022

craig bot commented Feb 2, 2022

craig bot commented Feb 2, 2022

sql: add MERGING descriptor state #75663

sql: add MERGING descriptor state #75663

Conversation

stevendanna commented Jan 28, 2022 • edited by ajwerner Loading

cockroach-teamcity commented Jan 28, 2022

stevendanna commented Jan 28, 2022

mgartner left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

mgartner commented Jan 31, 2022

mgartner commented Jan 31, 2022

mgartner commented Jan 31, 2022

stevendanna left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna commented Feb 2, 2022

craig bot commented Feb 2, 2022

craig bot commented Feb 2, 2022

stevendanna commented Jan 28, 2022 •

edited by ajwerner

Loading