sql,kv: add preliminary SQL savepoints support #45566

andreimatei · 2020-03-02T02:43:53Z

Informs (but does not resolve) #10735.

This patch adds support for SAVEPOINT , RELEASE SAVEPOINT ,
ROLLBACK TO SAVEPOINT .
Before this patch, we only had support for the special savepoint
cockroach_restart, which had to be placed at the beginning of the
transaction and was specifically intended for dealing with transaction
retries. This patch implements general support for savepoints, which
provide an error recovery mechanism.

The connExecutor now maintains a stack of savepoints. Rolling back to a
savepoint uses the recent KV api for ignoring a range of write sequence
numbers.

At the SQL level, savepoints differ in two characteristics:

savepoints placed at the beginning of a transaction (i.e. before any
KV operations) are marked as "initial". Rolling back to an initial
savepoint is always possible. Rolling back to a non-initial savepoint is
not possible after the transaction restarts (see below).
the savepoint named "cockroach_restart" retains special RELEASE
semantics: releasing it (attempts to) commit the underlying KV txn.
This continues to allow for descovering of deferred serilizability
errors (i.e. write timestamp pushes by the ts cache). As before, this
RELEASE can fail with a retriable error, at which point the client can
do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work
because cockroach_restart needs to be an "initial" savepoint). The
transaction continues to maintain all its locks after such an error.
This is all in contrast to releasing any other savepoints, which cannot
commit the txn and also never fails. See below for more discussion.
The cockroach_restart savepoint is only special in its release behavior,
not in its rollback behavior.

With the implementation of savepoints, the state machine driving a SQL
connection's transactions becomes a lot simpler. There's no longer a
distinction between an "Aborted" transaction and one that's in
"RestartWait". Rolling back to a savepoint now works the same way across
the two states, so RestartWait is gone.

This patch also improves the KV savepoints. They now capture and restore
the state of the read spans and the in-flight writes.

Some things don't work (yet):
a) Rolling back to a savepoint created after a schema change will error
out. This is because we don't yet snapshot the transaction's schema
change state.
b) After a retriable error, you can only rollback to an initial
savepoint. Attempting to rollback to a non-initial savepoint generates a
retriable error again. If the trasaction has been aborted, I think this
is the right behavior; no recovery is possible since the transaction has
lost its write intents. In the usual case where the transaction has not
been aborted, I think we want something better but it will take more
work to get it. I think the behavior we want is the following:

after a serializability failure, retrying just part of the transaction
should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback
should succeed if all the non-rolled-back reads can be refreshed to the
desired timestamp. If they can be refreshed, then the client can simply
retry the rolled back part of the transaction. If they can't, then the
ROLLBACK should return a retriable error again, allowing the client to
attempt a deeper rollback - and so on until the client rolls back to an
initial savepoint (which succeeds by definition).
Implementing this would allow for the following nifty pattern:

func fn_n() {
  for {
    SAVEPOINT savepoint_n
    try {
      fn_n+1()
    } catch retriable error {
      err := ROLLBACK TO SAVEPOINT outer
      if err != nil {
        throw err
      }
      continue
    }
    RELEASE SAVEPOINT savepoint_n
    break
  }
}

The idea here is that the client is trying to re-do as little work as
possible by successively rolling back to earlier and earlier savepoints.
This pattern will technically work with the current patch already,
except it will not actually help the client in any way since all the
rollbacks will fail until we get to the very first savepoint.
There's an argument to be made for making RELEASE SAVEPOINT check for
deferred serializability violations (and perhaps other deferred checks -
like deferred constraint validation), although Postgres doesn't do any
of these.
Anyway, I've left implementing this for a future patch because I want to
do some KV work for supporting it nicely. Currently, the automatic
restart behavior that KV transactions have is a pain in the ass since it
works against what we're trying to do.

For the time-being, non-initial savepoints remember their txn ID and
epoch and attempting to rollback to them after these changes produces a
retriable error automatically.

Release note (sql change): SQL savepoints are now supported. SAVEPOINT
, RELEASE SAVEPOINT , ROLLBACK TO SAVEPOINT now works.
SHOW SAVEPOINT STATUS can be used to inspect the current stack of active
savepoints.

Co-authored-by: Raphael 'kena' Poss [email protected]
Co-authored-by: Andrei Matei [email protected]

cockroach-teamcity · 2020-03-02T02:44:06Z

This change is

andreimatei

Everything except r5,r6,r7 is elsewhere.

This is based on #43051, but I've changed a lot of stuff so I think it's better to start with a new PR. Compared to that, this version has support for rolling back after an error, different behavior on rolling back after retriable errors and removes the special treatment of the savepoint cockroach_restart when it comes to rollbacks.

I still need to see what more tests need to be written (but you added a bunch of great ones already Rafa) and go over everything and spruce it up, but I think its ready for a first pass. Not ready for nits yet though.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

andreimatei

I've been adding various tests. Ready for review.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

nvanbenschoten

I gave this a first pass. It's looking good and I'm very excited for this to be merged, although the PR still needs some polish. I also think it deserves a review from @knz, given that he was the other person most connected to this change.

s/descovering/discovering/ in the commit message.

Reviewed 4 of 4 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 37 of 37 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/kv/txn_coord_sender.go, line 179 at r4 (raw file):

	epochBumpedLocked()

	// createSavepoint is used to populate a savepoint with all the state the

s/the needs/the transaction needs/

pkg/kv/txn_coord_sender.go, line 181 at r4 (raw file):

	// createSavepoint is used to populate a savepoint with all the state the
	// needs to be restored on a rollback.
	createSavepoint(context.Context, *savepoint)

Do these need Locked suffixes?

pkg/kv/txn_coord_sender.go, line 184 at r4 (raw file):

	// rollbackToSavepoint is used to restore the state previously saved by createSavepoint().
	// implementations are allowed to modify the savepoint if they want to, optimizing it for future

s/implementations/Implementations/

pkg/kv/txn_coord_sender_savepoints.go, line 33 at r4 (raw file):

	seqNum enginepb.TxnSeq

	// txnSpanRefresher fields

nit, period after both of these comments.

pkg/kv/txn_coord_sender_savepoints.go, line 42 at r4 (raw file):

	ifBytes  int64

	// txnID is used to verify that a rollback is not used to paper

Move these fields up above seqNum to give the ordering here some notion of logical precedence.

pkg/kv/txn_coord_sender_savepoints.go, line 80 at r4 (raw file):

		epoch: tc.mu.txn.Epoch,
	}
	for _, reqInt := range tc.interceptorStack {

👍 I like this approach.

pkg/kv/txn_coord_sender_savepoints.go, line 172 at r4 (raw file):

	}

	if st.seqNum > 0 && st.txnID != tc.mu.txn.ID {

Why do we need these seqNum checks?

pkg/kv/txn_coord_sender_savepoints.go, line 180 at r4 (raw file):

	}

	if st.seqNum < 0 || st.seqNum > tc.interceptorAlloc.txnSeqNumAllocator.writeSeq {

When is seqNum less than zero?

pkg/kv/txn_interceptor_committer.go, line 465 at r4 (raw file):

func (*txnCommitter) importLeafFinalState(*roachpb.LeafTxnFinalState) {}

// epochBumpedLocked implements the txnReqInterceptor interface.

nit: not your change, but if we wanted to rename epochBumpedLocked to bumpEpochLocked to fit the naming scheme here more appropriately, I'd be very supportive. If not, I'll make the change after this PR lands.

pkg/kv/txn_interceptor_pipeliner.go, line 607 at r4 (raw file):

func (tp *txnPipeliner) createSavepoint(ctx context.Context, s *savepoint) {
	if tp.ifWrites.len() > 0 {
		s.ifWrites = tp.ifWrites.t.Clone()

Let's add a clone method to *inFlightWriteSet.

pkg/kv/txn_interceptor_pipeliner.go, line 615 at r4 (raw file):

// rollbackToSavepoint is part of the txnReqInterceptor interface.
func (tp *txnPipeliner) rollbackToSavepoint(ctx context.Context, s *savepoint) {
	// Intersect the inflight writes from the savepoint to the ones from the

s/to/with/

pkg/kv/txn_interceptor_pipeliner.go, line 627 at r4 (raw file):

			// higher seqnum for the same key. We don't keep track of what sequence
			// numbers we've verified and which we haven't, so we're going to assume
			// that the savepoint's write has not been verified.

I'm confused, we do keep track of what sequence numbers we've verified and which we haven't. There's a Sequence field in inFlightWrite.

pkg/kv/txn_interceptor_pipeliner.go, line 635 at r4 (raw file):

			return true
		})
		// TODO(andrei): Can I delete directly during the iteration above?

No, you can't. The btree library doesn't like that.

pkg/kv/txn_interceptor_pipeliner.go, line 654 at r4 (raw file):

	// Restore the inflightWrites from the savepoint.
	if s.ifWrites == nil {

Should you be setting cloned here?

pkg/kv/txn_interceptor_pipeliner.go, line 655 at r4 (raw file):

	// Restore the inflightWrites from the savepoint.
	if s.ifWrites == nil {
		tp.ifWrites.t = nil

Should we clear(true) here instead?

pkg/kv/txn_interceptor_pipeliner.go, line 825 at r4 (raw file):

	s.t.Clear(reuse /* addNodesToFreelist */)
	s.bytes = 0
	s.alloc.clear()

Is this safe when s.cloned == true?

pkg/kv/txn_interceptor_pipeliner_test.go, line 1231 at r4 (raw file):

}

func TestTxnPipelinerSavepoints(t *testing.T) {

Give this a comment. We've been good about doing so in these tests so far.

Good test though.

pkg/kv/txn_interceptor_pipeliner_test.go, line 1244 at r4 (raw file):

	tp.createSavepoint(ctx, s)

	// Some more write after the savepoint. One of them is on key "c" that is part

s/write/writes/

pkg/kv/txn_interceptor_pipeliner_test.go, line 1267 at r4 (raw file):

	}

	// Now verify one of the writes. When we'll rollback to the savepoint below,

s/we'll/we/

pkg/kv/txn_interceptor_pipeliner_test.go, line 1292 at r4 (raw file):

	require.NotNil(t, br)
	require.Equal(t, []roachpb.Span{{Key: roachpb.Key("a")}}, tp.footprint.asSlice())
	require.Equal(t, 3, tp.ifWrites.len()) // We've verified one out of 4 writes.

Want to check the footprint?

pkg/kv/txn_interceptor_pipeliner_test.go, line 1307 at r4 (raw file):

	{
		var savepointWrites []inFlightWrite
		s.ifWrites.Ascend(func(i btree.Item) bool {

This is interesting. It begs the question of why we need to update the savepoint itself. Is it not enough to update the TxnCoordSender?

pkg/kv/txn_interceptor_seq_num_allocator.go, line 180 at r4 (raw file):

// createSavepoint is part of the txnReqInterceptor interface.
func (s *txnSeqNumAllocator) createSavepoint(ctx context.Context, sp *savepoint) {
	sp.seqNum = s.writeSeq

It's worth adding a very small test for this. I know the implementation is trivial, but the role the txnSeqNumAllocator plays in populating savepoints is critical.

pkg/kv/txn_interceptor_span_refresher_test.go, line 559 at r4 (raw file):

}

func TestTxnSpanRefresherSavepoint(t *testing.T) {

Same point about adding a comment.

pkg/kv/testdata/savepoints, line 322 at r4 (raw file):

subtest end

# !!!

!!!

Can this test be revived?

pkg/sql/conn_executor.go, line 2483 at r3 (raw file):

	case *tree.ReleaseSavepoint:
		// TODO(knz): Sanitize this.
		if err := ex.validateSavepointName(t.Savepoint); err == nil {

I don't understand this. Why does validating the name indicate whether this is the cockroach_restart savepoint?

pkg/sql/conn_executor_exec.go, line 945 at r4 (raw file):

	}

	// TODO(andrei/cuongdo): Figure out what statements to count here.

TODO(andrei): ...

I don't think you're gonna get any help with that one any time soon.

Or just delete the TODO. Is it useful anymore?

pkg/sql/conn_executor_exec.go, line 1006 at r4 (raw file):

}

// execStmtInRollbackWaitState executes a statement in a txn that's in state

Is this used? I thought we just got rid of the restartWait state?

pkg/sql/conn_executor_savepoints.go, line 88 at r1 (raw file):

}

// execSavepointInAbortedState runs a SAVEPOINT statement when a txn is aborted.

Why do we support running a SAVEPOINT while in the Aborted state? Postgres doesn't appear to:

nathan=# begin;
BEGIN
nathan=#
nathan=# savepoint a;
SAVEPOINT
nathan=#
nathan=# savepodint a;
ERROR:  syntax error at or near "savepodint"
LINE 1: savepodint a;
        ^
nathan=# savepoint b;
ERROR:  current transaction is aborted, commands ignored until end of transaction block

EDIT: this was answered later.

pkg/sql/conn_executor_savepoints.go, line 62 at r4 (raw file):

			ev, payload := ex.makeErrEvent(err, s)
			return ev, payload, nil
		}

We actually create a kv.savepoint object for cockroach_restart now? I'm surprised. Isn't there a cost to this? Is this imposing strange behavior on the TxnCoordSender since we still want to restart when we roll back to these savepoints?

pkg/sql/conn_executor_savepoints.go, line 200 at r4 (raw file):

		curID, curEpoch := ex.state.mu.txn.ID(), ex.state.mu.txn.Epoch()
		if !curID.Equal(entry.txnID) {
			return ex.makeErrEvent(roachpb.NewTransactionRetryWithProtoRefreshError(

Should we push these errors into client.RollbackToSavepoint?

pkg/sql/conn_executor_savepoints.go, line 264 at r4 (raw file):

	// a savepoint if there was DDL performed "under it".
	// TODO(knz): support partial DDL cancellation in pending txns.
	numDDL int

Add more to this comment. This is the number of DDL statements that had been issued at the time that the savepoint was created, right?

pkg/sql/conn_executor_savepoints.go, line 276 at r4 (raw file):

	// - if the epoch has changed, then we can only rollback if we performed a
	// refresh for the reads that are not rolled back. We currently don't do this.
	txnID uuid.UUID

I'm still confused about why we need this here and in the kv.savepoint object.

pkg/sql/conn_executor_savepoints.go, line 310 at r4 (raw file):

func (stack savepointStack) clone() savepointStack {
	cpy := make(savepointStack, len(stack))

This copy is unfortunate. Can we do something smarter with aliasing the slices and copying on write? I suspect that even without reference counting to eliminate redundant copied, it would be cheaper to simply copy on every call to push.

Do you mind comparing the relative frequency of these method calls so we can decide on the right solution here? I'm working under the assumption that setTxnRewindPos is going to be called far more often than execSavepointInOpenState.

pkg/sql/conn_executor_savepoints.go, line 337 at r4 (raw file):

}

// commitOnReleaseSavepointName is the name of the savepoint with special

Could you explain again what those semantics are?

Also, move this to the top of the file. It's buried down here but pretty important to understand in order to read the rest of the file.

pkg/sql/conn_executor_savepoints_test.go, line 193 at r4 (raw file):

func isOpenTxn(status string) bool {
	return status == "Open" || status == "NoTxn"

Can we use sql.OpenStateStr and sql.NoTxnStr here?

pkg/sql/conn_fsm.go, line 58 at r4 (raw file):

}

// stateAborted is entered on retriable errors. A ROLLBACK TO SAVEPOINT can

And non-retryable errors, right?

pkg/sql/conn_fsm.go, line 92 at r4 (raw file):

func (stateOpen) State()    {}
func (stateAborted) State() {}

nit: stray line.

pkg/sql/conn_fsm.go, line 115 at r4 (raw file):

// makeEventTxnStartPayload creates an eventTxnStartPayload.
//
// Pass noRolledBackSavepoint for rolledBackSavepoint when the transaction is

What is this referring to?

pkg/sql/conn_fsm.go, line 144 at r4 (raw file):

// eventSavepointRollback is generated when we want to move from Aborted to Open
// through a ROLLBACK TO SAVEPOINT <not cockroach_restart>. Note that it is not
// generated when such a savepoint is rolled back to from the Open state. In

"to from"

pkg/sql/conn_fsm.go, line 190 at r4 (raw file):

// eventTxnRestart is generated by a rollback to a savepoint placed at the
// beginning of the transaction (commonly SAVEPOINT cockroach_restart).
type eventTxnRestart struct{}

What went into the decision to have a separate event instead of a single type eventSavepointRollback struct{Initial fsm.Bool}?

pkg/sql/conn_fsm.go, line 193 at r4 (raw file):

// eventTxnReleased is generated after a successful
// RELEASE SAVEPOINT cockroach_restart. It moves the state to CommitWait.

Are these two events limited to cockroch_restart or just commonly linked to it like eventTxnRestart?

pkg/sql/conn_fsm.go, line 204 at r4 (raw file):

}

func (eventTxnStart) Event()          {}

nit: keep these in the same order as the type defs. Also ideally in the same order as the cases for a given state in the state transition map.

pkg/sql/logictest/testdata/logic_test/manual_retry, line 59 at r4 (raw file):

# BEGIN
#
# !!! rewrite this test somehow to assert the release behavior, not the initial status

!!!

pkg/sql/logictest/testdata/logic_test/manual_retry, line 141 at r4 (raw file):

BEGIN TRANSACTION; SAVEPOINT foo

statement error pq: savepoint bar does not exist

Might as well quote the savepoint name so that we exactly mirror PG:

nathan=# BEGIN;
BEGIN
nathan=# ROLLBACK TO SAVEPOINT bar;
ERROR:  savepoint "bar" does not exist

pkg/sql/logictest/testdata/logic_test/manual_retry, line 165 at r4 (raw file):

ROLLBACK TO SAVEPOINT "Foo Bar"

query TB

Could you query with the column names here and below?

pkg/sql/opt/exec/execbuilder/builder.go, line 76 at r4 (raw file):

	// IsDDL is set to true if the statement contains DDL.
	IsDDL bool

Someone else should sign off on whether this is a valid use of execbuilder.Builder.

cc. @RaduBerinde

pkg/sql/opt/exec/execbuilder/relational.go, line 144 at r4 (raw file):

	var err error

	isDDL := opt.IsDDLOp(e)

nit: no need for the variable.

pkg/sql/testdata/savepoints, line 1 at r4 (raw file):

# This test exercises the savepoint state in the conn executor.

Nice tests!

pkg/sql/testdata/savepoints, line 303 at r4 (raw file):

subtest rollback_after_error

# check that we can rollback after an error

nit: s/check/Check/ here and below.

pkg/sql/testdata/savepoints, line 330 at r4 (raw file):

2: SELECT crdb_internal.force_retry('100ms') -- pq: restart transaction: crdb_internal.force_retry(): TransactionRetryWithProtoRefreshError: forced by crdb_internal.force_retry()
-- Open        -> Aborted     XXXX  init(r)
3: ROLLBACK TO SAVEPOINT init -- 0 rows

It's too bad we aren't testing that this causes a txn retry and that the epoch is larger.

pkg/sql/testdata/savepoints, line 335 at r4 (raw file):

-- Open        -> NoTxn       #...  (none)

# Check that, after a retriable error, rolling back to anything an initial

"other than"?

pkg/sql/testdata/savepoints, line 526 at r4 (raw file):

# Test that the rewinding we do when performing an automatic retry restores the
# savepoint stack properly.
subtest rewing_on_automatic_restarts

s/rewing/rewind/

knz · 2020-03-03T23:33:28Z

Generally I'll trust you on implementation but I'll sit tomorrow over this PR to add more testing too. I believe remaining issues should be rooted out via testing, including via the sqlsmith tests that Rohan already prepared for us.

Also Rafi recommends that we trigger the nightly run on the PR manually to pick up all the new feedback from ORM tests.

knz · 2020-03-04T01:16:02Z

I have kicked a full nightly run on the PR ahead of further changes, to pick up any news from ORMs:

https://teamcity.cockroachdb.com/viewLog.html?buildId=1782977&

andreimatei

s/descovering/discovering/ in the commit message.

done

I haven't finished all the comments, but responding to some. Take a look if you get around here. The rest tomorrow.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)

pkg/kv/txn_coord_sender.go, line 181 at r4 (raw file):

the needs
done

pkg/kv/txn_coord_sender.go, line 184 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/implementations/Implementations/

done

pkg/kv/txn_coord_sender_savepoints.go, line 33 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit, period after both of these comments.

done

pkg/kv/txn_coord_sender_savepoints.go, line 42 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Move these fields up above seqNum to give the ordering here some notion of logical precedence.

done

pkg/kv/txn_coord_sender_savepoints.go, line 172 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need these seqNum checks?

because "initial savepoint" allow rollbacks after retries. See now.

pkg/kv/txn_coord_sender_savepoints.go, line 180 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When is seqNum less than zero?

never, I guess that's the point of the assertion. Rafa added it, but I'd leave it.

pkg/kv/txn_interceptor_committer.go, line 465 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: not your change, but if we wanted to rename epochBumpedLocked to bumpEpochLocked to fit the naming scheme here more appropriately, I'd be very supportive. If not, I'll make the change after this PR lands.

meh

pkg/kv/txn_interceptor_pipeliner.go, line 607 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's add a clone method to *inFlightWriteSet.

well but I'm only cloning the tree, not other crap...

pkg/kv/txn_interceptor_pipeliner.go, line 615 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/to/with/

done

pkg/kv/txn_interceptor_pipeliner.go, line 627 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm confused, we do keep track of what sequence numbers we've verified and which we haven't. There's a Sequence field in inFlightWrite.

indeed. See now.

pkg/kv/txn_interceptor_pipeliner.go, line 635 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No, you can't. The btree library doesn't like that.

ack

pkg/kv/txn_interceptor_pipeliner.go, line 654 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should you be setting cloned here?

It's already set from the moment we've created the savepoint. Or maybe I'm missing your point.

pkg/kv/txn_interceptor_pipeliner.go, line 655 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we clear(true) here instead?

Sometimes. See now.
Perhaps we could always do it if we relied on s.ifWrites == nil (the special nil value) to imply that no earlier savepoint has any writes, but it seems pretty fragile.
This is all pretty nasty; curious for your thoughts.

pkg/kv/txn_interceptor_pipeliner.go, line 825 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this safe when s.cloned == true?

It is not safe when you can still rollback to any savepoints that might have writes in them. But we don't; we only call this on epoch bump (after which you can only roll back to "initial" savepoints), on commit, and, since this review, on a special case of rollbackToSavepoint when we're rolling back to an initial savepoint.

Added a comment to the method about it. And accepting thoughts.

pkg/kv/txn_interceptor_pipeliner_test.go, line 1231 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this a comment. We've been good about doing so in these tests so far.

Good test though.

done

pkg/kv/txn_interceptor_pipeliner_test.go, line 1244 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/write/writes/

done

pkg/kv/txn_interceptor_pipeliner_test.go, line 1267 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/we'll/we/

the future here is meant to signal that the comment does not apply to the lines immediately below, but to something that's coming later.

pkg/kv/txn_interceptor_pipeliner_test.go, line 1292 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Want to check the footprint?

I did one line above, no?

pkg/kv/txn_interceptor_pipeliner_test.go, line 1307 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is interesting. It begs the question of why we need to update the savepoint itself. Is it not enough to update the TxnCoordSender?

Updating the savepoint too is an optimization for the case when we'll rollback to it again.

pkg/kv/testdata/savepoints, line 322 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

Can this test be revived?

done

pkg/sql/conn_executor_exec.go, line 945 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

TODO(andrei): ...

I don't think you're gonna get any help with that one any time soon.

Or just delete the TODO. Is it useful anymore?

:)

gone

pkg/sql/conn_executor_exec.go, line 1006 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this used? I thought we just got rid of the restartWait state?

not used. gone.

pkg/sql/conn_executor_savepoints.go, line 200 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we push these errors into client.RollbackToSavepoint?

done

pkg/sql/conn_executor_savepoints.go, line 276 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm still confused about why we need this here and in the kv.savepoint object.

they're gone, left just in kv

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)

pkg/kv/txn_coord_sender.go, line 179 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/the needs/the transaction needs/

done

pkg/kv/txn_interceptor_span_refresher_test.go, line 559 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same point about adding a comment.

done

pkg/sql/conn_executor.go, line 2483 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't understand this. Why does validating the name indicate whether this is the cockroach_restart savepoint?

I don't know what was going on at the time of this commit, but in the latter commit this is changed to isCommitOnReleaseSavepoint.

pkg/sql/conn_executor_savepoints.go, line 62 at r4 (raw file):

We actually create a kv.savepoint object for cockroach_restart now? I'm surprised. Isn't there a cost to this?

I've optimized the creation of kv.savepoints for initial savepoints - since the empty value suffices for those.

Is this imposing strange behavior on the TxnCoordSender since we still want to restart when we roll back to these savepoints?

We don't need to do anything special when rolling back to these savepoints - which is nice. If we're rolling back to them from the Open state, then they behave just list regular savepoints (we don't need to restart anything; we used to before this patch but not any more). When rolling back to them from the Aborted state, the transaction will have been already restarted if there was a retriable error prior.

pkg/sql/conn_executor_savepoints.go, line 310 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This copy is unfortunate. Can we do something smarter with aliasing the slices and copying on write? I suspect that even without reference counting to eliminate redundant copied, it would be cheaper to simply copy on every call to push.

Do you mind comparing the relative frequency of these method calls so we can decide on the right solution here? I'm working under the assumption that setTxnRewindPos is going to be called far more often than execSavepointInOpenState.

setTxnRewindPos should be called less than execSavepointInOpenState. setTxnRewindPos is called on the order of once per transaction.
Still, I've tried for a hot minute to optimize the common handling of a stack of savepoints only consisting of the commitOnRelease one, but it's not entirely trivial because the name of that savepoint is not always cockroach_restart, depending on session variables. I've optimized the cloning of the nil slice, through, which should be useful for connections that never use savepoints.
I might try harder, but after this PR.

pkg/sql/conn_executor_savepoints.go, line 337 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you explain again what those semantics are?

Also, move this to the top of the file. It's buried down here but pretty important to understand in order to read the rest of the file.

done

pkg/sql/conn_executor_savepoints_test.go, line 193 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we use sql.OpenStateStr and sql.NoTxnStr here?

done

pkg/sql/conn_fsm.go, line 58 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

And non-retryable errors, right?

done

pkg/sql/conn_fsm.go, line 92 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: stray line.

done

pkg/sql/conn_fsm.go, line 115 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What is this referring to?

stale. removed.

pkg/sql/conn_fsm.go, line 144 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"to from"

but it's correct :)

pkg/sql/conn_fsm.go, line 190 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What went into the decision to have a separate event instead of a single type eventSavepointRollback struct{Initial fsm.Bool}?

pretty arbitrary, but I think it looks better in the state machine diagram

pkg/sql/logictest/testdata/logic_test/manual_retry, line 59 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done

pkg/sql/opt/exec/execbuilder/relational.go, line 144 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: no need for the variable.

done

pkg/sql/testdata/savepoints, line 1 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice tests!

Mostly Raphael.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)

pkg/kv/txn_interceptor_seq_num_allocator.go, line 180 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It's worth adding a very small test for this. I know the implementation is trivial, but the role the txnSeqNumAllocator plays in populating savepoints is critical.

done

pkg/sql/conn_executor_savepoints.go, line 264 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add more to this comment. This is the number of DDL statements that had been issued at the time that the savepoint was created, right?

done

pkg/sql/conn_fsm.go, line 193 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are these two events limited to cockroch_restart or just commonly linked to it like eventTxnRestart?

the former. Emphasized more in the comment. Releasing a normal savepoint doesn't generate an event.

pkg/sql/conn_fsm.go, line 204 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: keep these in the same order as the type defs. Also ideally in the same order as the cases for a given state in the state transition map.

shuffled a bit

pkg/sql/logictest/testdata/logic_test/manual_retry, line 141 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Might as well quote the savepoint name so that we exactly mirror PG:
nathan=# BEGIN;
BEGIN
nathan=# ROLLBACK TO SAVEPOINT bar;
ERROR:  savepoint "bar" does not exist

done

pkg/sql/logictest/testdata/logic_test/manual_retry, line 165 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you query with the column names here and below?

done

pkg/sql/testdata/savepoints, line 303 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: s/check/Check/ here and below.

meh

pkg/sql/testdata/savepoints, line 330 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It's too bad we aren't testing that this causes a txn retry and that the epoch is larger.

I've added a check that we don't see writes from before the rollback

pkg/sql/testdata/savepoints, line 335 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"other than"?

done

pkg/sql/testdata/savepoints, line 526 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/rewing/rewind/

done

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/builder.go, line 76 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Someone else should sign off on whether this is a valid use of execbuilder.Builder.

cc. @RaduBerinde

Looks fine to me. The whole setsystemconfig trigger thing could be passed to the execbuilder as a function since it's really higher-level logic IMO but no need to do that now.

nvanbenschoten

Reviewed 2 of 3 files at r12, 30 of 41 files at r13, 3 of 3 files at r14.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/kv/txn_interceptor_pipeliner.go, line 683 at r14 (raw file):

	}

	// If the tree has not been cloned before, we can attempt a fast path where we

Remove this comment.

pkg/kv/txn_interceptor_pipeliner.go, line 801 at r14 (raw file):

}

// AsSlice returns the in-flight writes, ordered by key.

Since we're not exporting any of these methods, name this asSlice.

andreimatei

TFTR!

@rafiss, I've updated the pgjdbc blacklist removing the 56 tests that are now passing.
I've also run the django test, and it passed a bunch of tests and wanted

 var djangoBlacklist20_1 = blacklist{
   "admin_views.tests.GroupAdminTest.test_group_permission_performance": "unknown",
}

but it also claimed that only 10 tests failed unexpectedly and making that change would remove more than 10. So I don't know what's going on and I haven't touched the blacklist.

I also ran the hibernate test which claims that 20 tests failed unexpectedly. I don't immediately see a rhyme in the failed ones, so I'm gonna burry my head in the sand and move on.

sqlalchemy and typeorm passed without any fuss.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

pkg/kv/txn_interceptor_pipeliner.go, line 683 at r14 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Remove this comment.

done

pkg/kv/txn_interceptor_pipeliner.go, line 801 at r14 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since we're not exporting any of these methods, name this asSlice.

done

andreimatei · 2020-03-06T01:34:39Z

bors r+

craig · 2020-03-06T01:35:00Z

Merge conflict

Before this patch, a transaction was releasing its locks immediately after a statement encountered an error, when entering the Aborted state. With the upcoming support for savepoints, which will allow error recovery, this is no longer desirable (or, at least, not always desirable). This patch removes the rollback that happened when transitioning to the Aborted state. Instead, we defer the rollback to the final ROLLBACK statement, which transitions us out of the Aborted state. Release note: None

We have some metrics around savepoints which are referenced both in some SQL category, and in a KV transactions category. I'm removing the references from the KV section, I don't think they belong. And generally I understand that this catalog is not used for anything. Release note: None

Release note (sql change): CockroachDB now collects separate sets of metrics for usage of SAVEPOINT: one set for regular SQL savepoints and one set for uses dedicated to CockroachDB's client-side transaction retry protocol.

This patch adds support for SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo>. Before this patch, we only had support for the special savepoint cockroach_restart, which had to be placed at the beginning of the transaction and was specifically intended for dealing with transaction retries. This patch implements general support for savepoints, which provide an error recovery mechanism. The connExecutor now maintains a stack of savepoints. Rolling back to a savepoint uses the recent KV api for ignoring a range of write sequence numbers. At the SQL level, savepoints differ in two characteristics: 1) savepoints placed at the beginning of a transaction (i.e. before any KV operations) are marked as "initial". Rolling back to an initial savepoint is always possible. Rolling back to a non-initial savepoint is not possible after the transaction restarts (see below). 2) the savepoint named "cockroach_restart" retains special RELEASE semantics: releasing it (attempts to) commit the underlying KV txn. This continues to allow for discovering of deferred serilizability errors (i.e. write timestamp pushes by the ts cache). As before, this RELEASE can fail with a retriable error, at which point the client can do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work because cockroach_restart needs to be an "initial" savepoint). The transaction continues to maintain all its locks after such an error. This is all in contrast to releasing any other savepoints, which cannot commit the txn and also never fails. See below for more discussion. The cockroach_restart savepoint is only special in its release behavior, not in its rollback behavior. With the implementation of savepoints, the state machine driving a SQL connection's transactions becomes a lot simpler. There's no longer a distinction between an "Aborted" transaction and one that's in "RestartWait". Rolling back to a savepoint now works the same way across the two states, so RestartWait is gone. This patch also improves the KV savepoints. They now capture and restore the state of the read spans and the in-flight writes. Some things don't work (yet): a) Rolling back to a savepoint created after a schema change will error out. This is because we don't yet snapshot the transaction's schema change state. b) After a retriable error, you can only rollback to an initial savepoint. Attempting to rollback to a non-initial savepoint generates a retriable error again. If the trasaction has been aborted, I think this is the right behavior; no recovery is possible since the transaction has lost its write intents. In the usual case where the transaction has not been aborted, I think we want something better but it will take more work to get it. I think the behavior we want is the following: - after a serializability failure, retrying just part of the transaction should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback should succeed if all the non-rolled-back reads can be refreshed to the desired timestamp. If they can be refreshed, then the client can simply retry the rolled back part of the transaction. If they can't, then the ROLLBACK should return a retriable error again, allowing the client to attempt a deeper rollback - and so on until the client rolls back to an initial savepoint (which succeeds by definition). Implementing this would allow for the following nifty pattern: func fn_n() { for { SAVEPOINT savepoint_n try { fn_n+1() } catch retriable error { err := ROLLBACK TO SAVEPOINT outer if err != nil { throw err } continue } RELEASE SAVEPOINT savepoint_n break } } The idea here is that the client is trying to re-do as little work as possible by successively rolling back to earlier and earlier savepoints. This pattern will technically work with the current patch already, except it will not actually help the client in any way since all the rollbacks will fail until we get to the very first savepoint. There's an argument to be made for making RELEASE SAVEPOINT check for deferred serializability violations (and perhaps other deferred checks - like deferred constraint validation), although Postgres doesn't do any of these. Anyway, I've left implementing this for a future patch because I want to do some KV work for supporting it nicely. Currently, the automatic restart behavior that KV transactions have is a pain in the ass since it works against what we're trying to do. For the time-being, non-initial savepoints remember their txn ID and epoch and attempting to rollback to them after these changes produces a retriable error automatically. Fixes cockroachdb#45477 Touches cockroachdb#10735 Release note (sql change): SQL savepoints are now supported. SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo> now works. `SHOW SAVEPOINT STATUS` can be used to inspect the current stack of active savepoints. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

andreimatei · 2020-03-06T02:15:51Z

bors r+

craig · 2020-03-06T03:11:33Z

Build succeeded

GitHub CI (Cockroach)

knz · 2020-03-06T11:05:56Z

There was no reason for you to rush this merge before I had a look, was there?

In fact the release note here is incomplete and there are still a few missing tests. I'll issue the complementary PR.

knz · 2020-03-06T11:10:09Z

actually you even botched it... I'll file the additional issues

andreimatei · 2020-03-06T12:44:23Z

I merged it cause I'm away until Tuesday. I thought you told me you had looked...

knz · 2020-03-06T13:04:42Z

all good - we can fix things next week. Enjoy your weekend

This change updates the syntax diagram definitions and generated BNF for several SAVEPOINT-related statements, specifically: - Add the SHOW SAVEPOINT STATUS statement to the list of syntax diagrams generated by pkg/cmd/docgen - Add the SHOW SAVEPOINT STATUS BNF file to the other generated BNF files - Update ROLLBACK TO SAVEPOINT to note that the savepoint name does not have to be 'cockroach_restart' It uses the changes in cockroachdb#45794, which enabled docgen for SHOW SAVEPOINT STATUS. It is part of the work surrounding cockroachdb#45566, which added preliminary SQL savepoints support. Release justification: low-risk update to documentation diagrams Release note: None

45962: sql: re-add GC job on schema element deletion r=pbardea a=pbardea This commit creates GC jobs upon the deletion of an index, table or database. Similarly to the previous implementation, it considers the walltime at which the schema change completed to be the drop time of the schema element. Release note (sql change): Previously, after deleting an index, table, or database the relevant schema change job would change its running status to waiting for GC TTL. The schema change and the GC process are now decoupled into 2 jobs. Release justification: This is a follow up to the migration of turning schema changes into actual jobs. This commit re-adds the ability to properly GC indexes and tables. 46048: docgen: update savepoint-related definitions, bnfs r=rmloveland a=rmloveland This change updates the syntax diagram definitions and generated BNF for several SAVEPOINT-related statements, specifically: - Add the SHOW SAVEPOINT STATUS statement to the list of syntax diagrams generated by pkg/cmd/docgen - Add the SHOW SAVEPOINT STATUS BNF file to the other generated BNF files - Update ROLLBACK TO SAVEPOINT to note that the savepoint name does not have to be 'cockroach_restart' It uses the changes in #45794, which enabled docgen for SHOW SAVEPOINT STATUS. It is part of the work surrounding #45566, which added preliminary SQL savepoints support. Release justification: low-risk update to documentation diagrams Release note: None Co-authored-by: Paul Bardea <[email protected]> Co-authored-by: Rich Loveland <[email protected]>

andreimatei requested review from knz and nvanbenschoten March 2, 2020 02:43

andreimatei requested review from a team as code owners March 2, 2020 02:43

andreimatei commented Mar 2, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch 6 times, most recently from 56e7ce6 to 7b4a5e3 Compare March 3, 2020 19:26

andreimatei commented Mar 3, 2020

View reviewed changes

nvanbenschoten reviewed Mar 3, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch 5 times, most recently from 0f5c697 to a3ddc11 Compare March 5, 2020 00:15

andreimatei commented Mar 5, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch 3 times, most recently from 13b6c4c to 3c811c6 Compare March 5, 2020 16:28

andreimatei commented Mar 5, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch from 3c811c6 to e8b3ad4 Compare March 5, 2020 18:16

andreimatei commented Mar 5, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch from e8b3ad4 to a99e49e Compare March 5, 2020 19:53

RaduBerinde reviewed Mar 5, 2020

View reviewed changes

nvanbenschoten approved these changes Mar 6, 2020

View reviewed changes

andreimatei force-pushed the savepoint.implement-savepoints branch from 06fc1d2 to ebe8e55 Compare March 6, 2020 00:42

andreimatei commented Mar 6, 2020

View reviewed changes

andreimatei and others added 4 commits March 5, 2020 21:06

sql: separate metrics for restart and non-restart savepoints

d48f4f5

Release note (sql change): CockroachDB now collects separate sets of metrics for usage of SAVEPOINT: one set for regular SQL savepoints and one set for uses dedicated to CockroachDB's client-side transaction retry protocol.

andreimatei force-pushed the savepoint.implement-savepoints branch from ebe8e55 to f1e2a00 Compare March 6, 2020 02:15

craig bot merged commit ad84601 into cockroachdb:master Mar 6, 2020

knz changed the title ~~sql,kv: add SQL savepoints support~~ sql,kv: add preliminary SQL savepoints support Mar 6, 2020

knz mentioned this pull request Mar 6, 2020

sql: introduce preliminary support for savepoints #43051

Closed

This was referenced Mar 6, 2020

sql: fix savepoints release note #45791

Merged

sql: rollback past DDL causes deadlock #45793

Closed

sql: TestTxnUserRestart failed under stress #33503

Closed

andreimatei deleted the savepoint.implement-savepoints branch March 10, 2020 18:32

rmloveland mentioned this pull request Mar 12, 2020

docgen: update savepoint-related definitions, bnfs #46048

Merged

miraradeva mentioned this pull request Sep 25, 2023

kv: reads rolled back by savepoints are not refreshed #111228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql,kv: add preliminary SQL savepoints support #45566

sql,kv: add preliminary SQL savepoints support #45566

andreimatei commented Mar 2, 2020 •

edited by knz

Loading

cockroach-teamcity commented Mar 2, 2020

andreimatei left a comment

andreimatei left a comment

nvanbenschoten left a comment

knz commented Mar 3, 2020

knz commented Mar 4, 2020

andreimatei left a comment

andreimatei left a comment

andreimatei left a comment

RaduBerinde left a comment

nvanbenschoten left a comment

andreimatei left a comment

andreimatei commented Mar 6, 2020

craig bot commented Mar 6, 2020

andreimatei commented Mar 6, 2020

craig bot commented Mar 6, 2020

knz commented Mar 6, 2020

knz commented Mar 6, 2020

andreimatei commented Mar 6, 2020

knz commented Mar 6, 2020

sql,kv: add preliminary SQL savepoints support #45566

sql,kv: add preliminary SQL savepoints support #45566

Conversation

andreimatei commented Mar 2, 2020 • edited by knz Loading

cockroach-teamcity commented Mar 2, 2020

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

knz commented Mar 3, 2020

knz commented Mar 4, 2020

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei commented Mar 6, 2020

craig bot commented Mar 6, 2020

Merge conflict

andreimatei commented Mar 6, 2020

craig bot commented Mar 6, 2020

Build succeeded

knz commented Mar 6, 2020

knz commented Mar 6, 2020

andreimatei commented Mar 6, 2020

knz commented Mar 6, 2020

andreimatei commented Mar 2, 2020 •

edited by knz

Loading