Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add implicit SELECT FOR SHARE locking to FK checks #105857

Merged
merged 4 commits into from
Jul 26, 2023

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Jun 29, 2023

explain: add transaction information to EXPLAIN ANALYZE

Add transaction isolation, priority, and quality-of-service to the
output of EXPLAIN ANALYZE.

Release note (sql change): EXPLAIN ANALYZE output now includes:

  • the isolation level of the statement's transaction
  • the priority of the statement's transaction
  • the quality of service level of the statement's transaction

opt: do not use LockDurabilityGuaranteed under serializable isolation

This is a follow-up from #103734.

We do not want to use guaranteed-durable (a.k.a. replicated) locking
under serializable isolation, because it prevents pipelining and other
optimizations, and is unnecessary for correctness. This commit amends
8cbc6d1 to only set durability for
SELECT FOR UPDATE locking under weaker isolation levels.

This means that query plans will be slightly different under different
isolation levels, and so we must add isolation level to the optimizer
memo staleness calculation.

Furthermore, this commit changes the error message added by
e633d5e to be about guaranteed-durable
locking rather than SELECT FOR UPDATE, because in a later commit this
specific error will also be triggered by foreign key checks under weaker
isolation levels.

Informs: #100144, #100156, #100193, #100194

Release note: None


opt: show locking durability in EXPLAIN (OPT) output

Because the "guaranteed-durable locking not yet implemented" error
condition is checked in execbuilder, it prevents not only execution but
also EXPLAIN of queries using guaranteed-durable locking. Thankfully
EXPLAIN (OPT) bypasses execbuilder, and hence still works, so use this
for now to verify that we are enabling durable locking for SELECT FOR UPDATE under read committed isolation.

(Note that we have not yet fixed the SELECT FOR UPDATE plans to use
more precise locking, that will come in a later PR.)

Informs: #100194

Release note: None


sql: add implicit SELECT FOR SHARE locking to FK parent checks

Add SELECT FOR SHARE locking to FK parent checks. Under serializable
isolation, this locking is only used when
enable_implicit_fk_locking_for_serializable is set. Under weaker
isolation levels (snapshot and read committed) this locking is always
used.

We only need to lock during the insertion-side FK checks, which verify
the existence of a parent row. Deletion-side FK checks verify the
non-existence of a child row, and these do not need to lock. Instead, to
prevent concurrent inserts or updates to the child that would violate
the FK constraint, we rely on the intent(s) created by the deletion
conflicting with the FK locking of those concurrent inserts or updates.

Fixes: #80683
Informs: #100156

Epic: CRDB-25322

Release note (sql change): Add a new session variable,
enable_implicit_fk_locking_for_serializable, which controls locking
during foreign key checks under serializable isolation. With this set to
true, foreign key checks of the referenced (parent) table, such as those
performed during an INSERT or UPDATE of the referencing (child) table,
will lock the referenced row using SELECT FOR SHARE locking. (This is
somewhat analogous to the existing enable_implicit_select_for_update
variable but applies to the foreign key checks of a mutation statement
instead of the initial row fetch.)

Under weaker isolation levels such as read committed, SELECT FOR SHARE
locking will always be used to ensure the database maintains the foreign
key constraint, regardless of the current setting of
enable_implicit_fk_locking_for_serializable.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@michae2 michae2 force-pushed the rc_constraints branch 2 times, most recently from 373ecf3 to d7df1e7 Compare July 10, 2023 23:58
@michae2 michae2 marked this pull request as ready for review July 11, 2023 00:05
@michae2 michae2 requested review from a team as code owners July 11, 2023 00:05
@michae2 michae2 force-pushed the rc_constraints branch 2 times, most recently from 4a32411 to 8bc66ff Compare July 19, 2023 22:04
@michae2 michae2 requested a review from rytaft July 19, 2023 22:04
@michae2
Copy link
Collaborator Author

michae2 commented Jul 19, 2023

We've gone from "duable" to "durable" so this is now RFAL. 🙂

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! :lgtm: apart from a few nits.

Reviewed 11 of 11 files at r1, 12 of 12 files at r2, 3 of 3 files at r3, 28 of 28 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @nvanbenschoten, and @rytaft)


pkg/sql/opt/optbuilder/mutation_builder_fk.go line 616 at r4 (raw file):

// buildOtherTableScan builds a Scan of the "other" table.
func (h *fkCheckHelper) buildOtherTableScan(parent bool) (outScope *scope, tabMeta *opt.TableMeta) {

[nit] Maybe explain what parent means.


pkg/sql/row/locking.go line 47 at r4 (raw file):

}

// getWaitPolicy returns the configured lock wait policy to use for key-value

[nit] getWaitPolicy -> GetWaitPolicy


pkg/sql/opt/exec/execbuilder/testdata/fk line 417 at r4 (raw file):


# Even with implicit locking enabled, we should not acquire any shared locks
# when checking a delete from the parent.

Could you explain why the shared locks are unnecessary? Is it because inserts/updates to the child table will already check for existence of the parent row?


pkg/sql/opt/exec/execbuilder/testdata/fk_read_committed line 24 at r4 (raw file):

           └── anti-join (lookup jars)
                ├── lookup columns are key
                ├── locking: for-share,duability-guaranteed

This says duability still :)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Just a couple of questions.

Reviewed 11 of 11 files at r1, 12 of 12 files at r2, 3 of 3 files at r3, 28 of 28 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)


pkg/sql/conn_executor_exec.go line 2931 at r1 (raw file):

		// add qos?
		// add asoftime?
		// add readonly?

Did you mean to actually add these here? If not, this should probably be a TODO with your name / issue number


pkg/sql/logictest/testdata/logic_test/fk_read_committed line 35 at r4 (raw file):

UPDATE cookies SET j = 2 WHERE c = 1

# Foreign key checks of the child do not require locking.

Why not? What happens if you update or delete a jar that has no cookies in it, but a concurrent RC transaction tries to insert a cookie?

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews! Fixed the nits and added more tests and comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @nvanbenschoten, and @rytaft)


pkg/sql/conn_executor_exec.go line 2931 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Did you mean to actually add these here? If not, this should probably be a TODO with your name / issue number

Oh, whoops, I forgot about these. Opened #107318 and added TODOs.


pkg/sql/opt/optbuilder/mutation_builder_fk.go line 616 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Maybe explain what parent means.

Good call, done.


pkg/sql/row/locking.go line 47 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] getWaitPolicy -> GetWaitPolicy

Done.


pkg/sql/logictest/testdata/logic_test/fk_read_committed line 35 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Why not? What happens if you update or delete a jar that has no cookies in it, but a concurrent RC transaction tries to insert a cookie?

The concurrent RC transaction will perform its own FK check of the parent row, which will acquire a lock on the parent jar. This lock will conflict with the update or delete of the parent jar.

This is a bit tricky so I added some comments in various places about it.


pkg/sql/opt/exec/execbuilder/testdata/fk line 417 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Could you explain why the shared locks are unnecessary? Is it because inserts/updates to the child table will already check for existence of the parent row?

Yes, exactly. Inserts/updates to the child table will do their own FK checks which will acquire locks on the parent row. Those locks will conflict with deletions from the parent table, preventing violations of the FK constraint.

This is a bit tricky so I added more comments about it.


pkg/sql/opt/exec/execbuilder/testdata/fk_read_committed line 24 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

This says duability still :)

Gah, thank you. Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! :lgtm:

Reviewed 10 of 11 files at r1, 40 of 40 files at r5, 12 of 12 files at r6, 3 of 3 files at r7, 31 of 32 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @rytaft)


pkg/sql/insert_fast_path.go line 194 at r8 (raw file):

		lockWaitPolicy := row.GetWaitPolicy(descpb.ToScanLockingWaitPolicy(c.Locking.WaitPolicy))
		// All FK check lock wait policies should be BLOCK.
		if r.fkBatch.Header.WaitPolicy != lockWaitPolicy {

nit: why compare against r.fkBatch.Header.WaitPolicy (which we assume has not been set yet) instead of directly against lock.WaitPolicy_Block?


pkg/sql/vars.go line 2805 at r8 (raw file):

	// CockroachDB extension.
	`enable_implicit_fk_locking`: {

Should this setting mention "serializable" somehow, since setting it to false is ignored for weaker isolation levels?


pkg/sql/opt/exec/explain/output.go line 415 at r5 (raw file):

	txnIsoLevel isolation.Level, txnPriority roachpb.UserPriority, txnQoSLevel sessiondatapb.QoSLevel,
) {
	ob.AddTopLevelField("isolation level", txnIsoLevel.String())

If we want to match the other fields in the output, we could use StringLower() here.


pkg/sql/opt/memo/expr_format.go line 1563 at r7 (raw file):

	case tree.LockDurabilityBestEffort:
	case tree.LockDurabilityGuaranteed:
		durability = ",durability-guaranteed"

nit: should we add default: panic(errors.AssertionFailedf("unexpected durability")) to catch new durability levels?


pkg/sql/opt/optbuilder/mutation_builder_fk.go line 650 at r8 (raw file):

		locking = lockingSpec{
			&tree.LockingItem{
				// TODO(michae2): Change this to ForKeyShare when it is supported.

You could set it to row.ForKeyShare now if you'd like. That's handled the same as tree.ForShare in row.GetKeyLockingStrength.

But also, feel free to leave this as is if you think it would be confusing to users, given that the difference doesn't matter yet.


pkg/sql/opt/optbuilder/select.go line 694 at r6 (raw file):

	if locking.isSet() {
		private.Locking = locking.get()
		if b.evalCtx.TxnIsoLevel != isolation.Serializable {

Did we want to expose a cluster setting and/or session var to allow users to opt-in to guaranteed lock durability for SFU under Serializable isolation?


pkg/sql/sessiondatapb/local_only_session_data.proto line 410 at r8 (raw file):

  // ImplicitFKLocking is true if FOR SHARE locking may be used while checking
  // the referenced table during an insert or update to a table with a foreign
  // key.

Same point about qualifying this somehow with the fact that it only applies to serializable.

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @nvanbenschoten, and @rytaft)


pkg/sql/opt/optbuilder/select.go line 694 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did we want to expose a cluster setting and/or session var to allow users to opt-in to guaranteed lock durability for SFU under Serializable isolation?

Oh, good idea! I'm thinking a session var. @nvanbenschoten do you have any preferences about the name of the setting? How about "lock_durability" or something?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @rytaft)


pkg/sql/opt/optbuilder/select.go line 694 at r6 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Oh, good idea! I'm thinking a session var. @nvanbenschoten do you have any preferences about the name of the setting? How about "lock_durability" or something?

Maybe enable_durable_select_for_update_locks? If possible, making the variable boolean-valued seems like the way to go, as it makes the knob easier to use.

I think we'll also want to mention something about "serializable", like we'll want to do for enable_implicit_fk_locking (see other comment). Maybe suffix both with _with_serializable, or something like that.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 40 of 40 files at r5, 12 of 12 files at r6, 3 of 3 files at r7, 32 of 32 files at r8, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @michae2)


pkg/sql/logictest/testdata/logic_test/fk_read_committed line 35 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

The concurrent RC transaction will perform its own FK check of the parent row, which will acquire a lock on the parent jar. This lock will conflict with the update or delete of the parent jar.

This is a bit tricky so I added some comments in various places about it.

Thanks for the explanation!

Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @DrewKimball, @mgartner, @nvanbenschoten, and @rytaft)


pkg/sql/insert_fast_path.go line 194 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: why compare against r.fkBatch.Header.WaitPolicy (which we assume has not been set yet) instead of directly against lock.WaitPolicy_Block?

We're comparing against the header because there's an impedance mismatch between the r.fkChecks[i].Locking.WaitPolicy which is per-fk-check and the r.fkBatch.Header.WaitPolicy which is per-batch (one for all the fk checks). So just verifying that they all match whatever the batch is set to.

I changed addFKChecks to now explicitly set r.fkBatch.Header.WaitPolicy to blocking (was relying on it being implicitly set to blocking during defer n.run.fkBatch.Reset() in runFKChecks).


pkg/sql/vars.go line 2805 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this setting mention "serializable" somehow, since setting it to false is ignored for weaker isolation levels?

Good point, I added "_for_serializable" to be similar to Rafi's "max_retries_for_read_committed_transactions" in #107044.


pkg/sql/opt/exec/explain/output.go line 415 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we want to match the other fields in the output, we could use StringLower() here.

Oh! Thank you! Done.


pkg/sql/opt/memo/expr_format.go line 1563 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should we add default: panic(errors.AssertionFailedf("unexpected durability")) to catch new durability levels?

Good catch, done.


pkg/sql/opt/optbuilder/mutation_builder_fk.go line 650 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

You could set it to row.ForKeyShare now if you'd like. That's handled the same as tree.ForShare in row.GetKeyLockingStrength.

But also, feel free to leave this as is if you think it would be confusing to users, given that the difference doesn't matter yet.

Since this will be shown in EXPLAIN output, it felt wrong to show FOR KEY SHARE if we're not yet doing it, so I kept it this way for now.


pkg/sql/opt/optbuilder/select.go line 694 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Maybe enable_durable_select_for_update_locks? If possible, making the variable boolean-valued seems like the way to go, as it makes the knob easier to use.

I think we'll also want to mention something about "serializable", like we'll want to do for enable_implicit_fk_locking (see other comment). Maybe suffix both with _with_serializable, or something like that.

I started doing this but it got a little long, will do it in the next PR.


pkg/sql/sessiondatapb/local_only_session_data.proto line 410 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same point about qualifying this somehow with the fact that it only applies to serializable.

Done.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work! I appreciate the thoughtful organization of commits - it made reviewing pleasant.

Reviewed 10 of 11 files at r1, 40 of 40 files at r5, 12 of 12 files at r6, 3 of 3 files at r7, 32 of 32 files at r8, 52 of 52 files at r9, 12 of 12 files at r10, 3 of 3 files at r11, 32 of 32 files at r12, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 3 stale) (waiting on @DrewKimball and @michae2)


pkg/sql/opt/exec/execbuilder/testdata/fk_read_committed line 86 at r8 (raw file):

                └── filters (true)

# Foreign key checks of the child do not require locking.

Is this explained somewhere in more detail? Is this safe because any updates to the child rows would lock this parent row?

@michae2 michae2 force-pushed the rc_constraints branch 3 times, most recently from 2ef46c4 to 6930e94 Compare July 26, 2023 20:24
michae2 added 2 commits July 26, 2023 15:18
Add transaction isolation, priority, and quality-of-service to the
output of `EXPLAIN ANALYZE`.

Release note (sql change): `EXPLAIN ANALYZE` output now includes:
- the isolation level of the statement's transaction
- the priority of the statement's transaction
- the quality of service level of the statement's transaction
This is a follow-up from cockroachdb#103734.

We do not want to use guaranteed-durable (a.k.a. replicated) locking
under serializable isolation, because it prevents pipelining and other
optimizations, and is unnecessary for correctness. This commit ammends
8cbc6d1 to only set durability for
`SELECT FOR UPDATE` locking under weaker isolation levels.

This means that query plans will be slightly different under different
isolation levels, and so we must add isolation level to the optimizer
memo staleness calculation.

Furthermore, this commit changes the error message added by
e633d5e to be about guaranteed-durable
locking rather than `SELECT FOR UPDATE`, because in a later commit this
specific error will also be triggered by foreign key checks under weaker
isolation levels.

Informs: cockroachdb#100144, cockroachdb#100156, cockroachdb#100193, cockroachdb#100194

Release note: None
michae2 added 2 commits July 26, 2023 15:18
Because the "guaranteed-durable locking not yet implemented" error
condition is checked in execbuilder, it prevents not only execution but
also `EXPLAIN` of queries using guaranteed-durable locking. Thankfully
`EXPLAIN (OPT)` bypasses execbuilder, and hence still works, so use this
for now to verify that we are enabling durable locking for `SELECT FOR
UPDATE` under read committed isolation.

(Note that we have not yet fixed the `SELECT FOR UPDATE` plans to use
more precise locking, that will come in a later PR.)

Informs: cockroachdb#100194

Release note: None
Add SELECT FOR SHARE locking to FK parent checks. Under serializable
isolation, this locking is only used when
`enable_implicit_fk_locking_for_serializable` is set. Under weaker
isolation levels (snapshot and read committed) this locking is always
used.

We only need to lock during the insertion-side FK checks, which verify
the existence of a parent row. Deletion-side FK checks verify the
non-existence of a child row, and these do not need to lock. Instead, to
prevent concurrent inserts or updates to the child that would violate
the FK constraint, we rely on the intent(s) created by the deletion
conflicting with the FK locking of those concurrent inserts or updates.

Fixes: cockroachdb#80683
Informs: cockroachdb#100156

Epic: CRDB-25322

Release note (sql change): Add a new session variable,
`enable_implicit_fk_locking_for_serializable`, which controls locking
during foreign key checks under serializable isolation. With this set to
true, foreign key checks of the referenced (parent) table, such as those
performed during an INSERT or UPDATE of the referencing (child) table,
will lock the referenced row using SELECT FOR SHARE locking. (This is
somewhat analogous to the existing `enable_implicit_select_for_update`
variable but applies to the foreign key checks of a mutation statement
instead of the initial row fetch.)

Under weaker isolation levels such as read committed, SELECT FOR SHARE
locking will always be used to ensure the database maintains the foreign
key constraint, regardless of the current setting of
`enable_implicit_fk_locking_for_serializable`.
Copy link
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, all! This should be ready now...

bors r=DrewKimball,nvanbenschoten,rytaft,mgartner

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 4 stale) (waiting on @DrewKimball, @mgartner, @nvanbenschoten, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/fk_read_committed line 86 at r8 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is this explained somewhere in more detail? Is this safe because any updates to the child rows would lock this parent row?

Yes, that's correct. It is explained in a little more detail in comments in pkg/sql/opt/optbuilder/mutation_builder_fk.go as well as in the 4th commit message.

@craig
Copy link
Contributor

craig bot commented Jul 26, 2023

Build succeeded:

@craig craig bot merged commit 284b6c0 into cockroachdb:master Jul 26, 2023
@michae2 michae2 deleted the rc_constraints branch July 26, 2023 23:25
michae2 added a commit to michae2/cockroach that referenced this pull request Jul 27, 2023
Follow-up from cockroachdb#105857

This commit ammends 6a3e43d to add a
session variable to control whether guaranteed-durable locks are used
under serializable isolation.

Informs: cockroachdb#100194

Epic: CRDB-25322

Release note (sql change): Add a new session variable,
`enable_durable_locking_for_serializable`, which controls locking
durability under serializable isolation. With this set to true, SELECT
FOR UPDATE locks, SELECT FOR SHARED locks, and constraint check locks
(e.g. locks acquired during foreign key checks if
`enable_implicit_fk_locking_for_serializable` is set to true) will be
guaranteed-durable under serializable isolation, meaning they will
always be held to transaction commit. (These locks are always
guaranteed-durable under weaker isolation levels.)

By default, under serializable isolation these locks are best-effort
rather than guaranteed-durable, meaning in some cases (e.g. leaseholder
transfer, node loss, etc.) they could be released before transaction
commit. Serializable isolation does not rely on locking for correctness,
only using it to improve performance under contention, so this default
is a deliberate choice to avoid the performance overhead of lock
replication.
@michae2
Copy link
Collaborator Author

michae2 commented Jul 27, 2023

Maybe enable_durable_select_for_update_locks? If possible, making the variable boolean-valued seems like the way to go, as it makes the knob easier to use.

I think we'll also want to mention something about "serializable", like we'll want to do for enable_implicit_fk_locking (see other comment). Maybe suffix both with _with_serializable, or something like that.

I started doing this but it got a little long, will do it in the next PR.

Follow-up PR is: #107749

michae2 added a commit to michae2/cockroach that referenced this pull request Jul 27, 2023
Follow-up from cockroachdb#105857

This commit ammends 6a3e43d to add a
session variable to control whether guaranteed-durable locks are used
under serializable isolation.

Informs: cockroachdb#100194

Epic: CRDB-25322

Release note (sql change): Add a new session variable,
`enable_durable_locking_for_serializable`, which controls locking
durability under serializable isolation. With this set to true, SELECT
FOR UPDATE locks, SELECT FOR SHARED locks, and constraint check locks
(e.g. locks acquired during foreign key checks if
`enable_implicit_fk_locking_for_serializable` is set to true) will be
guaranteed-durable under serializable isolation, meaning they will
always be held to transaction commit. (These locks are always
guaranteed-durable under weaker isolation levels.)

By default, under serializable isolation these locks are best-effort
rather than guaranteed-durable, meaning in some cases (e.g. leaseholder
transfer, node loss, etc.) they could be released before transaction
commit. Serializable isolation does not rely on locking for correctness,
only using it to improve performance under contention, so this default
is a deliberate choice to avoid the performance overhead of lock
replication.
craig bot pushed a commit that referenced this pull request Jul 28, 2023
…107752 #107802 #107803

106508: util/must: add runtime assertion API r=erikgrinaker a=erikgrinaker

For details and usage examples, see the [package documentation](https://github.com/erikgrinaker/cockroach/blob/must/pkg/util/must/must.go).

---

This patch adds a convenient and canonical API for runtime assertions, inspired by the Testify package used for Go test assertions. It is intended to encourage liberal use of runtime assertions throughout the code base, by making it as easy as possible to write assertions that follow best practices. It does not attempt to reinvent the wheel, but instead builds on existing infrastructure.

Assertion failures are fatal in all non-release builds, including roachprod clusters and roachtests, to ensure they will be noticed. In release builds, they instead log the failure and report it to Sentry (if enabled), and return an assertion error to the caller for propagation. This avoids excessive disruption in production environments, where an assertion failure is often scoped to an individual RPC request, transaction, or range, and crashing the node can turn a minor problem into a full-blown outage. It is still possible to kill the node when appropriate via `log.Fatalf`, but this should be the exception rather than the norm.

It also supports expensive assertions that must be compiled out of normal dev/test/release builds for performance reasons. These are instead enabled in special test builds.

This is intended to be used instead of other existing assertion mechanisms, which have various shortcomings:

* `log.Fatalf`: kills the node even in release builds, which can cause severe disruption over often minor issues.

* `errors.AssertionFailedf`: only suitable when we have an error return path, does not fatal in non-release builds, and are not always notified in release builds.

* `logcrash.ReportOrPanic`: panics rather than fatals, which can leave the node limping along. Requires the caller to implement separate assertion handling in release builds, which is easy to forget. Also requires propagating cluster settings, which aren't always available.

* `buildutil.CrdbTestBuild`: only enabled in Go tests, not roachtests, roachprod clusters, or production clusters.

* `util.RaceEnabled`: only enabled in race builds. Expensive assertions should be possible to run without the additional overhead of the race detector.

For more details and examples, see the `must` package documentation.

Resolves #94986.
Epic: none
Release note: None

107094: streamingest: unskip TestTenantStreamingUnavailableStreamAddress r=lidorcarmel a=lidorcarmel

Changing a few things to get this test to pass under stress:
- use 50 ranges instead of 10, because there are already 50-ish system ranges,
  so if we write only 10 more ranges those might not get distributed on all
  servers.
- avoid reading from the source cluster after stopping a node, it's flaky,
  see #107499 for more info.

Epic: none
Fixes: #107023
Fixes: #106865

Release note: None

107717: server/profiler: remove `server.cpu_profile.enabled` setting r=xinhaoz a=xinhaoz

Cpu profiling can be enabled by setting the cluster setting `server.cpu_profile.cpu_usage_combined_threshold`. This makes `server.cpu_profile.enabled` redundant and makes it more difficult and confusing to enable cpu profiling. This commit removes the `server.cpu_profile.enabled` setting entirely. Note that both jdefault values for the cluster settings set profiling off.

Closes: #102024

Release note (sql change): The cluster setting
`server.cpu_profile.enabled` has been removed.
`server.cpu_profile.cpu_usage_combined_threshold` can enable and disable cpu profiling.

107720: cli: add probe_range in debug.zip r=j82w a=j82w

PR #79546 introduces `crdb_internal.probe_range`. This PR adds the `crdb_internal.probe_range` to the debug.zip. The LIMIT gives a very approximately ~1000ms*100 target on how long this can take, so that running debug.zip against an unavailable cluster won't take too long.

closes: #80360

Release note (cli change): The debug.zip now includes the `crdb_internal.probe_range` table with a limit of 100 rows to avoid the query from taking to long.

107727: server: deflake TestServerShutdownReleasesSession r=rafiss a=rafiss

The tenant was not being fully stopped, so the test could encounter flakes.

fixes #107592
Release note: None

107742: ui: show txn fingerprint details page with unspecified app r=xinhaoz a=xinhaoz

Previously, when the app was not specified in the url search params for the txn details fingerprint page, the page would fail to load. This commit allows the page to load when there is no app specified but a fingerprint id that matches the requested page in the payload. The first matching fingerprint id is loaded.

Additionally, the TransactionDetailsLink will not include the appNames search param unless the provided prop is non-nullish.

Fixes: #107731

Release note (bug fix): Txn fingerprint details page in the console UI should load with the fingerprint details even if no app is specified in the URL.




Demo:
https://www.loom.com/share/810308d3dcd74ca888c42287ebafaecf

107745: kvserver: fix test merge queue when grunning unsupported r=irfansharif a=kvoli

`TestMergeQueue/load-based-merging/switch...below-threshold` asserts that switching the split objective between CPU and QPS will not cause ranges to merge, even if their pre-switch load qualified them for merging.

This test was broken when `grunning` was unsupported, as the objective never actually switches to anything other than QPS.

Add a check for `grunning` support, and assert that a merge occurs if unsupported.

Fixes: #106937
Epic: none
Release note: None

107749: opt: add enable_durable_locking_for_serializable session variable r=DrewKimball,nvanbenschoten a=michae2

Follow-up from #105857

This commit ammends 6a3e43d to add a session variable to control whether guaranteed-durable locks are used under serializable isolation.

Informs: #100194

Epic: CRDB-25322

Release note (sql change): Add a new session variable, `enable_durable_locking_for_serializable`, which controls locking durability under serializable isolation. With this set to true, SELECT FOR UPDATE locks, SELECT FOR SHARED locks, and constraint check locks (e.g. locks acquired during foreign key checks if
`enable_implicit_fk_locking_for_serializable` is set to true) will be guaranteed-durable under serializable isolation, meaning they will always be held to transaction commit. (These locks are always guaranteed-durable under weaker isolation levels.)

By default, under serializable isolation these locks are best-effort rather than guaranteed-durable, meaning in some cases (e.g. leaseholder transfer, node loss, etc.) they could be released before transaction commit. Serializable isolation does not rely on locking for correctness, only using it to improve performance under contention, so this default is a deliberate choice to avoid the performance overhead of lock replication.

107752: changefeedccl: prevent deadlock in TestChangefeedKafkaMessageTooLarge r=miretskiy a=jayshrivastava

Previously, this test would deadlock due to kafka retrying messages too many times. These messages are stored in a buffer of size 1024 created by the CDC testing infra: https://github.com/cockroachdb/cockroach/blob/5c3f96d38cdc3a2d953ca3ffb1e39e97d7e5110e/pkg/ccl/changefeedccl/testfeed_test.go#L1819

The test asserts that 2000 messages pass through the buffer. When the test finishes, it stops reading from the buffer. The problem is that due to retries, there may be more messages sent to the buffer than that are read out of the buffer. Even after the 2000 messages are read and the test is shutting down, the sink may be blocked trying to put resolved messages (plus retries) in the buffer. If this happens, the changefeed resumer (same goroutine as the kafka sink) gets blocked and does not terminate when the job is cancelled at the end of the test.

This change caps the number of retries at 200 for this test, so there should be no more than 200 extra messages plus a few resolved messages during this test. This is far less than the buffer size of 1024.

See detailed explanation in #107591.

Fixes: #107591
Epic: none
Release note: None

107802: teamcity-trigger: don't start a job for an empty target r=healthy-pod a=rickystewart

This makes no sense, so skip these cases.

Closes: #107779
Closes: #107780
Closes: #107781

Epic: none
Release note: None

107803: githubpost: set `map` field if `null` r=healthy-pod a=rickystewart

Go is a really good language.

Informs: #107779

Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Lidor Carmel <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Jayant Shrivastava <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/opt: add implicit SELECT FOR SHARE support for FK checks
6 participants