Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add MERGING descriptor state #75663

Merged
merged 1 commit into from
Feb 2, 2022

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented Jan 28, 2022

This PR adds a MERGING index descriptor state. This state is similar
to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in
cases where they would typically used CPut or InitPut.

This change is in support of the mvcc-compatible index backfiller.

As part of the backfill process, newly added indexes will be added in
a BACKFILLING state in which they see no writes or deletes.

However, this would allow for the following sequence of events. For
simplicity, the delete-preserving temporary index state transitions
have been elided:

t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
     and a delete-preserving "temp index"
t1: `INSERT INTO` with a = 1 (new index does not see this write)
t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
t4: Backfilling completes
t5: "new index" to DELETE-ONLY
t6: "new index" to DELETE-AND-WRITE-ONLY
t7: `INSERT INTO` into with a = 1
t8: a = 2 merged from "temp index" into "new index"

Unfortunately, at t7 the user would encounter an erroneous duplicate
key violation when they attempt their insert because the attempted
CPut into "new index" would fail as there is an existing
value (a=1). The correct value of a=2 will not be written into "new
index" until a follow-up merge pass that occurs at t8

The MERGING phase is intended to solve this problem by allowing for
the following sequence:

t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
     and a delete-preserving "temp index"
t1: `INSERT INTO` with a = 1 (new index does not see this write)
t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
t4: Backfilling completes
t5: "new index" to DELETE-ONLY
t5: "new index" to MERGING
t7: `INSERT INTO` into with a = 1
t8: a = 2 merged from "temp index" into "new index"
t9: "new index" to DELETE-AND-WRITE-ONLY

Note that while in the MERGING state, the new index may accumulate
duplicate keys. This is OK because it will be checked for uniqueness
before being brought online.

The tests added here are nearly identical to the tests added for
DeletePreservingEncoding but have been written using a small
data-drive test that supports putting indexes into a mutated state.

Fixes #75658

Release note: None

@stevendanna stevendanna requested a review from a team January 28, 2022 18:54
@stevendanna stevendanna requested a review from a team as a code owner January 28, 2022 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna
Copy link
Collaborator Author

I split this out from the index-backfiller PR to make it a bit easier to review.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just left some suggestions.

Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rhu713 and @stevendanna)


-- commits, line 22 at r1:
In this timeline, when is a=2 written to the index?


-- commits, line 28 at r1:
Can you expand on this? When does the ForceMode get turned on and off in the example above?


pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):


query T kvtrace
UPDATE ti SET b = 2 WHERE a = 4

Can you add similar tests for UPSERT and INSERT .. ON CONFLICT .. DO UPDATE ..? There's two important cases for the latter: one where there is no conflict and a duplicate b is inserted successfully, and another where there is a conflict and b is successfully updated to a duplicate value.


pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):

    c INT,
    FAMILY (a, b, c),
    INDEX (b, c)

In these tests, I guess we don't care whether the index is unique or not?

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)


-- commits, line 22 at r1:

Previously, mgartner (Marcus Gartner) wrote…

In this timeline, when is a=2 written to the index?

a=2 would get written to new-index as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen between t0 and t6. This index is them merged into the newly added index at some time above t6 not currently in the timeline.


-- commits, line 28 at r1:

Previously, mgartner (Marcus Gartner) wrote…

Can you expand on this? When does the ForceMode get turned on and off in the example above?

ForceMode would get turned on at t0 and would get turned off at a time not currently in the timeline after the "index merge" has taken place. I'll add this to the timeline.


pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Can you add similar tests for UPSERT and INSERT .. ON CONFLICT .. DO UPDATE ..? There's two important cases for the latter: one where there is no conflict and a duplicate b is inserted successfully, and another where there is a conflict and b is successfully updated to a duplicate value.

Just for clarity: "and another where there is a conflict" do you mean conflict on the primary column a?

@mgartner
Copy link
Collaborator


pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

  // This is used by the MVCC-compatible index backfiller to ensure
  // that unique indexes do not produce erroneous conflicts.
  optional bool force_put = 25 [(gogoproto.nullable) = false];

Should we add some validation in pkg/sql/catalog/tabledesc/index.go:validateTableIndexes to ensure this is only true when the index is part of an ongoing backfill? Should we do the same for use_delete_preserving_encoding?

@mgartner
Copy link
Collaborator


pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Just for clarity: "and another where there is a conflict" do you mean conflict on the primary column a?

Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true).

@mgartner
Copy link
Collaborator


pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Should we add some validation in pkg/sql/catalog/tabledesc/index.go:validateTableIndexes to ensure this is only true when the index is part of an ongoing backfill? Should we do the same for use_delete_preserving_encoding?

Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a ForcePut=true index is ever an "active" index.

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)


pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Besides the corruption this would create, I'm also thinking of features like the one added in #75548, which will break if a ForcePut=true index is ever an "active" index.

I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.

It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to this field is a new index state in the state machine. We've taken to liking these in the new schema changer. Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY put you use the force put. You need to move to DELETE_AND_WRITE_ONLY before you can start validating (right?).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)


pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I definitely think we should add some validation. I was going to add validation to the schema changer before it brought any indexes online; but perhaps doing it in validateTableIndexes is better since then it will be checked everywhere Validate() is called on the descriptor.

It might make some testing a bit more difficult (i.e. the execbuilder tests that modify public indexes); let me see how bad the fallout is.

Big 👍 on doing descriptor validation in the descriptor validation framework.

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving from DELETE_ONLY to MERGING which is like DELETE_AND_WRITE_ONLY put you use the force put. You need to move to DELETE_AND_WRITE_ONLY before you can start validating (right?).

This has the nice benefit of it being impossible for the ForcePut index to ever become "PUBLIC".

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the new MERGING state as recommended and moved the tests out of logic test and over to a small data-driven test that supports moving an index into the merging state.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @mgartner, and @rhu713)


pkg/sql/catalog/descpb/structured.proto, line 521 at r1 (raw file):

Previously, ajwerner wrote…

Big 👍 on doing descriptor validation in the descriptor validation framework.

I'll add some validation for UseDeletePreservingIndex in a different PR. For now, we no longer need to validate this because it no longer exists.


pkg/sql/opt/exec/execbuilder/testdata/force_put_index, line 56 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Correct. That's the only conflict that should be possible because the unique index on b shouldn't be public (I guess it is public in this simulation, but it wouldn't normally be when ForceIndex is true).

Done.


pkg/sql/opt/exec/execbuilder/testdata/update_nonmetamorphic, line 410 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

In these tests, I guess we don't care whether the index is unique or not?

Can't hurt to make these unique as far as I can see. I made as many of the indexes under test unique as I could.

@stevendanna stevendanna changed the title sql: add ForcePut option to indexes sql: add MERGING descriptor state Feb 1, 2022
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add validation around here that this state (and the BACKFILLING state) are only used for indexes?

func validateMutation(m *descpb.DescriptorMutation) error {

Otherwise, :lgtm:

Reviewed 2 of 12 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @rhu713, and @stevendanna)


pkg/sql/catalog/table_elements.go, line 205 at r2 (raw file):

	// them.
	//
	// New indexes will be checked for uniqueness at the end of the may miss

missing something around at the end of the may miss

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Very nice!

Reviewed 6 of 12 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @rhu713, and @stevendanna)


-- commits, line 22 at r1:

Previously, stevendanna (Steven Danna) wrote…

a=2 would get written to new-index as part of the new "index merging." A delete-preserving temporary index is used to capture writes that happen between t0 and t6. This index is them merged into the newly added index at some time above t6 not currently in the timeline.

Thanks for clarifying the timelines.


-- commits, line 5 at r3:
To be clear, will non-unique indexes use the MERGING state?


-- commits, line 32 at r3:
nit: end sentence in period

This PR adds a MERGING index descriptor state. This state is similar
to DELETE_AND_WRITE_ONLY, but all writes are handled via Put, even in
cases where they would typically used CPut or InitPut.

This change is in support of the mvcc-compatible index backfiller.

As part of the backfill process, newly added indexes will be added in
a BACKFILLING state in which they see no writes or deletes.

However, this would allow for the following sequence of events. For
simplicity, the delete-preserving temporary index state transitions
have been elided:

    t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
         and a delete-preserving "temp index"
    t1: `INSERT INTO` with a = 1 (new index does not see this write)
    t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
    t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
    t4: Backfilling completes
    t5: "new index" to DELETE-ONLY
    t6: "new index" to DELETE-AND-WRITE-ONLY
    t7: `INSERT INTO` into with a = 1
    t8: a = 2 merged from "temp index" into "new index"

Unfortunately, at t7 the user would encounter an erroneous duplicate
key violation when they attempt their insert because the attempted
CPut into "new index" would fail as there is an existing
value (a=1). The correct value of a=2 will not be written into "new
index" until a follow-up merge pass that occurs at t8.

The MERGING phase is intended to solve this problem by allowing for
the following sequence:

    t0: `CREATE UNIQUE INDEX` creates "new index" in BACKFILLING state
         and a delete-preserving "temp index"
    t1: `INSERT INTO` with a = 1 (new index does not see this write)
    t2: Backfill starts at t1 (a=1 included in backfill, ends up in "new index")
    t3: UPDATE a = 2 where a = 1 (new index still in backfilling does not see these writes)
    t4: Backfilling completes
    t5: "new index" to DELETE-ONLY
    t5: "new index" to MERGING
    t7: `INSERT INTO` into with a = 1
    t8: a = 2 merged from "temp index" into "new index"
    t9: "new index" to DELETE-AND-WRITE-ONLY

Note that while in the MERGING state, the new index may accumulate
duplicate keys. This is OK because it will be checked for uniqueness
before being brought online.

The tests added here are nearly identical to the tests added for
DeletePreservingEncoding but have been written using a small
data-drive test that supports putting indexes into a mutated state.

Release note: None
Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner, @rhu713, and @stevendanna)


-- commits, line 5 at r3:

Previously, mgartner (Marcus Gartner) wrote…

To be clear, will non-unique indexes use the MERGING state?

Yes, the plan is to use this state for all newly added indexes.

@stevendanna
Copy link
Collaborator Author

TFTR!

bors r=mgartner,ajwerner

@craig
Copy link
Contributor

craig bot commented Feb 2, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 2, 2022

Build succeeded:

@craig craig bot merged commit 39ca6ac into cockroachdb:master Feb 2, 2022
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Feb 16, 2022
Previously, the index backfilling process depended upon non-MVCC
compliant AddSSTable calls which potentially rewrote previously read
historical values.

To support an MVCC-compliant AddSSTable that writes at the _current_
timestamp, this change implements a new backfilling process described
in the following RFC:

https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md

In summary, the new index backfilling process depends on backfilling
the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In
this state it receives no writes or deletes. Writes that occur during
the backfilling process are captured by a "temporary index."  This
temporary index uses the DeletePreservingEncoding to ensure it
captures deletes as well as writes.

After the of bulk backfill using the MVCC-compliant AddSSTable, the
index is moved into a MERGING state
(added in cockroachdb#75663) in which it receives writes and deletes. Writes
previously captured by the temporary index are then transactionally
merged into the newly added index.

This feature is currently behind a new boolean cluster setting which
default to true. Schema changes that contains both old and new-style
backfills are rejected.

Reverting the default to false will require updating various tests
since many tests depend on the exact index IDs of newly added indexes.

Release note: None

Co-authored-by: Rui Hu <[email protected]>
craig bot pushed a commit that referenced this pull request Feb 16, 2022
73878: sql: introduce MVCC-compliant index backfiller r=ajwerner a=stevendanna

Previously, the index backfilling process depended upon non-MVCC
compliant AddSSTable calls which potentially rewrote previously read
historical values.

To support an MVCC-compliant AddSSTable that writes at the _current_
timestamp, this change implements a new backfilling process described
in the following RFC:

https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md

In summary, the new index backfilling process depends on backfilling
the new index when it is in a BACKFILLING state (added in #72281). In
this state it receives no writes or deletes. Writes that occur during
the backfilling process are captured by a "temporary index."  This
temporary index uses the DeletePreservingEncoding to ensure it
captures deletes as well as writes.

After the of bulk backfill using the MVCC-compliant AddSSTable, the
index is moved into a MERGING state
(added in #75663) in which it receives writes and deletes. Writes
previously captured by the temporary index are then transactionally
merged into the newly added index.

This feature is currently behind a new boolean cluster setting which
default to true. Schema changes that contains both old and new-style
backfills are rejected.

Reverting the default to false will require updating various tests
since many tests depend on the exact index IDs of newly added indexes.

Release note: None

76252: kvserver: account for AddSSTableRequests in QPS r=kvoli a=kvoli

Previously, Queries-Per-Second (QPS) was calculated uniformly per
`BatchRequest` as 1. This patch introduces variable QPS calculation
for `AddSSTableRequest`, which use an order of magnitude more
resources than other request types.

This patch introduces the
`kv.replica_stats.addsst_request_size_factor` cluster setting. This
setting is used to attribute QPS to `AddSSTableRequest` sizes. The
calculation is done as QPS = 1 + size(AddSSTableRequest) / factor.
When `kv.replica_stats.addsst_request_size_factor` is less than 1, or
no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1;
the current behavior today.

resolves #73731

Release note (performance improvement):
Introduced `kv.replica_stats.addsst_request_size_factor` cluster
setting. This setting is used to tune Queries-Per-Second (QPS)
sensitivity to large imports. By default, this setting is disabled.
When enabled, the size of any AddSSTableRequest will contribute to
QPS in inverse relation to this settings magnitude. By default this
setting configured to a conservative 50,000; every 50 kilobytes will
be accounted for as an additional 1 QPS.

76617: bazel: make `CrdbTestBuild` `const` r=RaduBerinde a=rickystewart

This partially reverts #72838. That change was made to avoid thrashing
the cache when swapping between test- and non-test configurations.
Today we have `dev cache` which can retain build artifacts across the
different configurations, so this doesn't really serve a purpose any
more. Indeed, you can now swap between `bazel build
pkg/cmd/cockroach-short` and `bazel build pkg/cmd/cockroach-short
--config=test` very quickly with Bazel downloading old versions of the
built libraries if the built-in cache gets wiped.

We still filter out the `go:build` lines in `crdb_test_{off,on}.go` so
we don't have to set `gotags` for test and non-test, which still saves a
lot of time and unnecessary recompilation. We have a check for this in
CI so no one should be able to add a build constraint without us
knowing.

Release note: None

76627: util/tracing: improve span recording interface r=andreimatei a=andreimatei

Before this patch, the Span's recording interface was left over from a
time when there were only one recording mode: verbose. We now have two
modes: verbose recording and structured recording. They can be enabled
at span creation time through the WithRecording(<type>) option. This
patch changes the Span's SetVerbose() method to expose the two options.

Release note: None

76667: roachtest/test: fix ORM testing due to removed cluster setting r=rafiss a=otan

`sql.catalog.unsafe_skip_system_config_trigger.enabled` got removed
recently and was part of an alpha. Let's clean it up in ORMs too.

Resolves #76654
Resolves #76655
Resolves #76656
Resolves #76657
Resolves #76658
Resolves #76659
Resolves #76660
Resolves #76661
Resolves #76662
Resolves #76663
Resolves #76664
Resolves #76665
Resolves #76666

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
RajivTS pushed a commit to RajivTS/cockroach that referenced this pull request Mar 6, 2022
Previously, the index backfilling process depended upon non-MVCC
compliant AddSSTable calls which potentially rewrote previously read
historical values.

To support an MVCC-compliant AddSSTable that writes at the _current_
timestamp, this change implements a new backfilling process described
in the following RFC:

https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md

In summary, the new index backfilling process depends on backfilling
the new index when it is in a BACKFILLING state (added in cockroachdb#72281). In
this state it receives no writes or deletes. Writes that occur during
the backfilling process are captured by a "temporary index."  This
temporary index uses the DeletePreservingEncoding to ensure it
captures deletes as well as writes.

After the of bulk backfill using the MVCC-compliant AddSSTable, the
index is moved into a MERGING state
(added in cockroachdb#75663) in which it receives writes and deletes. Writes
previously captured by the temporary index are then transactionally
merged into the newly added index.

This feature is currently behind a new boolean cluster setting which
default to true. Schema changes that contains both old and new-style
backfills are rejected.

Reverting the default to false will require updating various tests
since many tests depend on the exact index IDs of newly added indexes.

Release note: None

Co-authored-by: Rui Hu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mvcc-compliant index backfiller: filtering erroneous deletes
4 participants