Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make SQL statements operate on a read snapshot #42862

Merged
merged 1 commit into from
Jan 20, 2020

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 29, 2019

Fixes #33473.
Fixes #28842.
Informs #41569 and #42864.

Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it by exploiting the new KV Step() API which
decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
#42864 and #33475 for details.

For now, this patch excludes any mutation that involves a foreign key
from the new (fixed) semantics. When a mutation involves a FK, the
previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement could incorrectly operate multiple
times
on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz changed the title sql: make SQL statements operate on a read snapshot [DNM] sql: make SQL statements operate on a read snapshot Nov 29, 2019
@knz
Copy link
Contributor Author

knz commented Nov 29, 2019

The changes (and that of the parent commit) are sufficiently lightweight that we may consider back-porting this to 19.2 and perhaps 19.1.

@knz knz force-pushed the 20191129-sql-step branch 4 times, most recently from 51efb5d to 451ecd7 Compare November 29, 2019 17:05
@knz knz changed the title [DNM] sql: make SQL statements operate on a read snapshot sql: make SQL statements operate on a read snapshot Nov 29, 2019
@knz knz force-pushed the 20191129-sql-step branch 2 times, most recently from 088ac43 to da262f6 Compare November 29, 2019 17:59
@knz
Copy link
Contributor Author

knz commented Nov 29, 2019

cc @RaduBerinde FYI see the ugly workaround (= code to disable the fix) around FK existence checks and cascading actions.

@knz
Copy link
Contributor Author

knz commented Nov 29, 2019

Only remaining failing test TestUpsertFastPath is pending resolution of this comment: #42854 (comment)

@knz knz force-pushed the 20191129-sql-step branch 4 times, most recently from 22e7690 to 10662ad Compare December 2, 2019 22:03
@knz
Copy link
Contributor Author

knz commented Dec 2, 2019

Rebased on top of the latest #42854 implementation, which fixes TestUpsertFastPath

Also, updated (*DistSQLPlanner) PlanAndRunPostqueries() to add calls to Step as recommended by Radu. This enables the remaining logic tests to complete without error.

@knz
Copy link
Contributor Author

knz commented Dec 2, 2019

@RaduBerinde ^^

@knz knz force-pushed the 20191129-sql-step branch from 10662ad to f825da2 Compare December 2, 2019 23:36
@knz knz requested a review from a team as a code owner December 2, 2019 23:36
@knz knz force-pushed the 20191129-sql-step branch 4 times, most recently from 0988369 to 35c3103 Compare December 7, 2019 17:12
@knz knz force-pushed the 20191129-sql-step branch from 35c3103 to cc29ba8 Compare January 6, 2020 07:31
@knz knz removed the request for review from nvanbenschoten January 6, 2020 07:31
@knz
Copy link
Contributor Author

knz commented Jan 6, 2020

@andreimatei this is ready I think (assuming you're also ok with #42854).

@RaduBerinde please check I did do the post-query logic properly.

@knz knz force-pushed the 20191129-sql-step branch 2 times, most recently from d4d036c to 84f0349 Compare January 6, 2020 08:53
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/sql/alter_table.go, line 104 at r3 (raw file):

	// want to see their previous writes. Disable step-wise execution
	// for that phase.
	if err := params.p.Txn().DisableStepping(); err != nil {

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out. Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this. Have you considered alternatives?


pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):

	// For regular statements (the ones that get to this point), we
	// don't return any event unless an an error happens.

/an an/an if you're taking the blame anyway


pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):

	// Although this is not required for further SQL statements (these
	// will define their own sequence point above), there may be
	// additional non-SQL KV activity beyond this point which may want

I'm curious - like what activity?


pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):

	"database/sql/driver"
	"fmt"
	_ "net/http/pprof"

did you want to leave this in?


pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):

	params.Insecure = true
	s, db, _ := serverutils.StartServer(t, params)
	defer s.Stopper().Stop(context.TODO())

nit: context.Background() has won for tests


pkg/sql/create_sequence.go, line 120 at r3 (raw file):

	}

	// The event logging wants to operate in step-wise execution. Mark a

please clarify "wants to operate in step-wise execution" a bit.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/sql/alter_table.go, line 104 at r3 (raw file):

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.

That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)

Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.

Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...

Working on it, will ping you when I have a solution.


pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

/an an/an if you're taking the blame anyway

Done.


pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm curious - like what activity?

Added example in comment.


pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

did you want to leave this in?

No sorry about that


pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: context.Background() has won for tests

Done.

@knz knz force-pushed the 20191129-sql-step branch from 84f0349 to 3d10880 Compare January 15, 2020 10:44
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL:

I found a cleaner approach than the spurious DisableStepping() calls. Instead, each planNode can implement an interface to declare that it wants to deviate from the standard SQL semantics. The conditional is handled cleanly in just one place.

Also there's now a NewTxnWithSteppingEnabled and the conn executor uses that.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/sql/alter_table.go, line 104 at r3 (raw file):

Previously, knz (kena) wrote…

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.

That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)

Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.

Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...

Working on it, will ping you when I have a solution.

Done.

@knz knz force-pushed the 20191129-sql-step branch from 3d10880 to fda15ca Compare January 15, 2020 11:39
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 36 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)


pkg/internal/client/sender.go, line 255 at r4 (raw file):

	// does not need it.
	//
	// Calling ConfigureStepping(true) when the stepping mode is

s/true/SteppingEnabled


pkg/internal/client/txn.go, line 1209 at r4 (raw file):

// transaction.
//
// This function is guaranteed to not return an error if it previously

I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.


pkg/internal/client/txn.go, line 1210 at r4 (raw file):

//
// This function is guaranteed to not return an error if it previously
// succeeded once with some txn and mode, then provided its own return

s/with some txn and mode/with some mode


pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):

}

// DisableStepping is part of the TxnSender interface.

stale comment


pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):

	tc.mu.Lock()
	defer tc.mu.Unlock()
	prevEnabled := tc.interceptorAlloc.txnSeqNumAllocator.configureSteppingLocked(mode == client.SteppingEnabled)

long line


pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):

}

// configureSteppingLocked configures the stepping mode.

pls comment that when disabling stepping, the readSeq is bumped


pkg/kv/txn_interceptor_seq_num_allocator.go, line 144 at r4 (raw file):

// configureSteppingLocked configures the stepping mode.
// Used by the TxnCoordSender's ConfigureStepping() method.

nit: since you're taking the blame, I think the 2nd line of the comment doesn't add anything


pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):

// configureSteppingLocked configures the stepping mode.
// Used by the TxnCoordSender's ConfigureStepping() method.
func (s *txnSeqNumAllocator) configureSteppingLocked(enabled bool) (prevEnabled bool) {

nit: I'd carry over the enum from all the higher layers. Not for the s.steppingModeEnabled field, though; that one should stay a bool.


pkg/sql/create_table.go, line 1384 at r5 (raw file):

		}
	}

nit: spurious diff


pkg/sql/plan.go, line 135 at r5 (raw file):

}

// planNodeReadingOwnWrites can be implemented by planNodes which do

s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?


pkg/sql/plan.go, line 141 at r5 (raw file):

// descriptors, expecting to read their own writes.
//
// This constraint is obeyed by (*planner).startExec().

please clarify this saying that it's only startExec that runs with the stepping disabled, not the plan node more broadly.


pkg/sql/plan.go, line 143 at r5 (raw file):

// This constraint is obeyed by (*planner).startExec().
type planNodeReadingOwnWrites interface {
	// ReadingOwnWrites can be implemented as no-op by nodes wishing

nit: just say ReadingOwnWrites is a marker interface.


pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):

}

// TestingSetKVBatchSize changes the kvBatchFetcher batch size, and returns a function that restores it.

nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.

@knz knz force-pushed the 20191129-sql-step branch from fda15ca to 2f26c54 Compare January 20, 2020 14:10
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/internal/client/sender.go, line 255 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/true/SteppingEnabled

Done.


pkg/internal/client/txn.go, line 1209 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.

Done.


pkg/internal/client/txn.go, line 1210 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/with some txn and mode/with some mode

Done.


pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

stale comment

Done.


pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.


pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment that when disabling stepping, the readSeq is bumped

Done.


pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I'd carry over the enum from all the higher layers. Not for the s.steppingModeEnabled field, though; that one should stay a bool.

Done.


pkg/sql/create_sequence.go, line 120 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please clarify "wants to operate in step-wise execution" a bit.

Done.


pkg/sql/plan.go, line 135 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?

Done.


pkg/sql/plan.go, line 141 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please clarify this saying that it's only startExec that runs with the stepping disabled, not the plan node more broadly.

Done.


pkg/sql/plan.go, line 143 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: just say ReadingOwnWrites is a marker interface.

Done.


pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.

Done.

@knz knz force-pushed the 20191129-sql-step branch 4 times, most recently from 20d180d to 0322fe1 Compare January 20, 2020 17:34
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it (partially) by exploiting the new KV `Step()` API
which decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
[cockroachdb#42864](cockroachdb#42864) and
[cockroachdb#33475](cockroachdb#33475) for
details.

For now, this patch excludes any mutation that 1) involves a foreign
key and 2) does not uyse the new CBO-driven FK logic, from the
new (fixed) semantics. When a mutation involves a FK without CBO
involvement, the previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.
@knz
Copy link
Contributor Author

knz commented Jan 20, 2020

bors r=andreimatei

craig bot pushed a commit that referenced this pull request Jan 20, 2020
42862: sql: make SQL statements operate on a read snapshot r=andreimatei a=knz

Fixes #33473.
Fixes #28842.
Informs #41569 and #42864.

Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it by exploiting the new KV `Step()` API which
decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
#42864 and #33475 for details.

For now, this patch excludes any mutation that involves a foreign key
from the new (fixed) semantics. When a mutation involves a FK, the
previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement [could incorrectly operate multiple
times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.


Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 20, 2020

Build succeeded

@craig craig bot merged commit 0a658c1 into cockroachdb:master Jan 20, 2020
@knz knz deleted the 20191129-sql-step branch January 20, 2020 22:15
@RaduBerinde
Copy link
Member

Too little too late, but I got a chance to look through the diff and it looks great to me!

@knz
Copy link
Contributor Author

knz commented Jan 21, 2020

radu 💖 you answered all my questions

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 21, 2020
These comments were added and accidentally left by cockroachdb#42862.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 21, 2020
44174: kv: delete disabled debug logging about txn stepping r=nvanbenschoten a=nvanbenschoten

These comments were added and accidentally left by #42862.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants