storage: allow transactions to run at a lower sequence #33001

ridwanmsharif · 2018-12-10T22:59:56Z

Continuing off #32688, final part of #5861 (comment).

Adds support to have transactions run at a lower sequence
at a given key. It asserts the value it computes with the
value written for the sequence in the intent history instead
or returning a retry-able error.

Release note: None

cockroach-teamcity · 2018-12-10T23:00:06Z

This change is

nvanbenschoten

Reviewed 4 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica.go, line 3429 at r1 (raw file):
For other reviewers - Ridwan and I discussed this and decided it was fine to remove the assertion.

I added that assertion in when trying to smoke out any operations that would violate the invariants necessary for transactional pipelining to work properly. Now that everything’s in, I think I’d be ok with removing the assertion entirely
It already isn’t foolproof

pkg/storage/replica_test.go, line 2894 at r1 (raw file):

}

// TestReplicaAbortSpanStoredTxnRetryError verifies that if a cached

Does this test need a name and a new description?

pkg/storage/engine/mvcc.go, line 1238 at r1 (raw file):

				(txn.Sequence < meta.Txn.Sequence ||
					(txn.Sequence == meta.Txn.Sequence &&
						txn.DeprecatedBatchIndex <= meta.Txn.DeprecatedBatchIndex)) {

If you feel up to it, we can remove all references to this DeprecatedBatchIndex now.

Then we can make this condition if txn.Epoch == meta.Txn.Epoch && txn.Sequence <= meta.Txn.Sequence.

pkg/storage/engine/mvcc.go, line 1241 at r1 (raw file):

				// Find the previous value at this key so the current value can be computed.
				// If a previous intent value doesn't exist, get the last versioned value
				// and apply the valueFn where necessary.

Feel free to expand this comment past what we're doing. Explain why we're doing this too! It's a little non-intuitive if you stumble into it.

pkg/storage/engine/mvcc.go, line 1243 at r1 (raw file):

				// and apply the valueFn where necessary.
				prevSeq, prevValueWritten := meta.GetPrevIntentSeq(txn.Sequence)
				if prevValueWritten {

Add comments to these cases that explain what they logically mean. Comments like:

// There was an intent on this key at a lower sequence number that this write.

and

// This was the first write in the transaction to this key.

As a more macro level nit, I this would be easier to narrate in comments if you did writtenValue, found := meta.GetIntentValue(txn.Sequence) first. The flow would then be:

find the result of this write, erroring if it isn't found
find the value that preceded it
verify that a re-evaluation would end up with the same result
conclude that nothing needs to be done because we're idempotent!

This reveals that looking at the previous value is only necessary if valueFn != nil.

pkg/storage/engine/mvcc.go, line 1263 at r1 (raw file):

						var exVal *roachpb.Value
						var err error
						if exVal, _, _, err = mvccGetInternal(

Explain why this is inconsistent. You mentioned on slack that you thought that was incorrect. I'm curious why. It seems right to me.

pkg/storage/engine/mvcc.go, line 1264 at r1 (raw file):

						var err error
						if exVal, _, _, err = mvccGetInternal(
							ctx, iter, metaKey, timestamp, false /* consistent */, safeValue, txn, getBuf); err != nil {

Can we pass unsafeValue here?

pkg/storage/engine/mvcc.go, line 1274 at r1 (raw file):

				}
				// Assert that the value computed is the same as the value written previously.
				writtenValue, found := meta.GetIntentValue(txn.Sequence)

Does this handle the case where we replay the last intent written, before it makes it into the intent history?

pkg/storage/engine/mvcc_test.go, line 3303 at r1 (raw file):

		sequence   int32
		batchIndex int32
		expRetry   bool

Looks like we never retry anymore, which is exactly what we want!

There are two new variants of tests that we need, for the "different value after recomputing" error and for the "missing an intent with lower sequence" error.

pkg/storage/engine/enginepb/mvcc.go, line 140 at r1 (raw file):

// GetPrevIntentSeq goes through the intent history and finds the previous
// intents sequence number given the current sequence

"intent's"

Also, period at the end.

pkg/storage/engine/enginepb/mvcc.go, line 142 at r1 (raw file):

// intents sequence number given the current sequence
func (meta *MVCCMetadata) GetPrevIntentSeq(seq int32) (prevSeq int32, found bool) {
	for i := len(meta.IntentHistory) - 1; i > 0; i-- {

Think it's worth performing a binary search here?

pkg/storage/engine/enginepb/mvcc.go, line 145 at r1 (raw file):

		intent := meta.IntentHistory[i]
		if seq == intent.Sequence {
			prevSeq = meta.IntentHistory[i-1].Sequence

nit: why not just return here?

pkg/storage/engine/enginepb/mvcc.go, line 158 at r1 (raw file):

	for _, intent := range meta.IntentHistory {
		if seq == intent.Sequence {
			found = true

Same nit.

ridwanmsharif

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica.go, line 3429 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

For other reviewers - Ridwan and I discussed this and decided it was fine to remove the assertion.

I added that assertion in when trying to smoke out any operations that would violate the invariants necessary for transactional pipelining to work properly. Now that everything’s in, I think I’d be ok with removing the assertion entirely
It already isn’t foolproof

Ack.

pkg/storage/replica_test.go, line 2894 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this test need a name and a new description?

Done.

pkg/storage/engine/mvcc.go, line 1238 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If you feel up to it, we can remove all references to this DeprecatedBatchIndex now.

Then we can make this condition if txn.Epoch == meta.Txn.Epoch && txn.Sequence <= meta.Txn.Sequence.

Done. See 2nd commit.

pkg/storage/engine/mvcc.go, line 1241 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Feel free to expand this comment past what we're doing. Explain why we're doing this too! It's a little non-intuitive if you stumble into it.

Done.

pkg/storage/engine/mvcc.go, line 1243 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add comments to these cases that explain what they logically mean. Comments like:
// There was an intent on this key at a lower sequence number that this write.

and

// This was the first write in the transaction to this key.
As a more macro level nit, I this would be easier to narrate in comments if you did writtenValue, found := meta.GetIntentValue(txn.Sequence) first. The flow would then be:

find the result of this write, erroring if it isn't found

find the value that preceded it

verify that a re-evaluation would end up with the same result

conclude that nothing needs to be done because we're idempotent!

This reveals that looking at the previous value is only necessary if valueFn != nil.

Done.

pkg/storage/engine/mvcc.go, line 1263 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Explain why this is inconsistent. You mentioned on slack that you thought that was incorrect. I'm curious why. It seems right to me.

Elaborated here. using nil for the txn param tho. Could you confirm if this is correct though? Seems alright to me.

pkg/storage/engine/mvcc.go, line 1264 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we pass unsafeValue here?

Done.

pkg/storage/engine/mvcc.go, line 1274 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this handle the case where we replay the last intent written, before it makes it into the intent history?

Done.

pkg/storage/engine/mvcc_test.go, line 3303 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Looks like we never retry anymore, which is exactly what we want!

There are two new variants of tests that we need, for the "different value after recomputing" error and for the "missing an intent with lower sequence" error.

Done. See the test I added.

pkg/storage/engine/enginepb/mvcc.go, line 140 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"intent's"

Also, period at the end.

Done.

pkg/storage/engine/enginepb/mvcc.go, line 142 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Think it's worth performing a binary search here?

Well its always tiny, may not be worth it but lets leave no stone unturned. Done.

pkg/storage/engine/enginepb/mvcc.go, line 158 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same nit.

Done.

nvanbenschoten

Just nits now, the logic all looks good.

@tbg might be interested in reviewing this too.

Reviewed 4 of 5 files at r2, 6 of 8 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica.go, line 6380 at r3 (raw file):

				// the TxnMeta, which is stored as part of an intent.
				ba.Txn.Sequence = seqNum
			} else {

I think we can remove this entire else case and DisallowUnsequencedTransactionalWrites.

pkg/storage/engine/mvcc.go, line 1238 at r1 (raw file):

Previously, ridwanmsharif (Ridwan Sharif) wrote…

Done. See 2nd commit.

👍

pkg/storage/engine/mvcc.go, line 1241 at r1 (raw file):

Previously, ridwanmsharif (Ridwan Sharif) wrote…

Done.

Nice comment!

pkg/storage/engine/mvcc.go, line 1263 at r1 (raw file):

Previously, ridwanmsharif (Ridwan Sharif) wrote…

Elaborated here. using nil for the txn param tho. Could you confirm if this is correct though? Seems alright to me.

Yep, it's good.

pkg/storage/engine/mvcc.go, line 1239 at r2 (raw file):

					(txn.Sequence == meta.Txn.Sequence &&
						txn.DeprecatedBatchIndex <= meta.Txn.DeprecatedBatchIndex)) {
				// Since transactions should be idempotent, we must be particularly careful

Do you think all of this logic should be pulled into a separate function? It would allow you to test the error cases more closely.

pkg/storage/engine/mvcc.go, line 1252 at r2 (raw file):

				// Ensure all intents are found and that the values are always accurate for
				// transactional idempotency.
				writtenValue, found := meta.GetIntentValue(txn.Sequence)

Let's either not call this method at all when txn.Sequence == meta.Txn.Sequence or assert that found == false when txn.Sequence == meta.Txn.Sequence. FWIW, I don't really care about the cost of calling GetIntentValue, I'm just concerned about how this reads and how understandable it is.

pkg/storage/engine/mvcc.go, line 1283 at r2 (raw file):

						prevVal, _ := meta.GetIntentValue(prevSeq)
						// TODO(ridwanmsharif): Confirm the timestamp is correct.
						value, err = valueFn(&roachpb.Value{RawBytes: prevVal, Timestamp: txn.Timestamp})

This is interesting. Do we ever use the timestamp? If not, I'd leave it out entirely.

pkg/storage/engine/mvcc.go, line 1316 at r2 (raw file):

				// written.
				if !bytes.Equal(value, writtenValue) {
					return errors.Errorf("transaction %s has a different value %+v after recomputing from what was written: %+v",

Include the sequence number in this error.

pkg/storage/engine/mvcc.go, line 1235 at r3 (raw file):

				return errors.Errorf("put with epoch %d came after put with epoch %d in txn %s",
					txn.Epoch, meta.Txn.Epoch, txn.ID)
			} else if txn.Epoch == meta.Txn.Epoch &&

minor nit: can we remove the line wrap now?

pkg/storage/engine/mvcc_test.go, line 4435 at r2 (raw file):

	// Lay down an intent without increasing the sequence but with a different value.
	if err := MVCCPut(ctx, engine, nil, key, ts1, newValue, txn); err != nil {
		if !strings.Contains(err.Error(), "has a different value") {

Replace these type of assertions with testutils.IsError/testutils.IsPError.

pkg/storage/engine/enginepb/mvcc.go, line 148 at r2 (raw file):

	})
	if index > 0 && index < len(meta.IntentHistory) &&
		meta.IntentHistory[index].Sequence == seq {

Should we require that a sequence equal to seq exist? Seems a little overly restrictive.

ridwanmsharif

TFTR! Fixed all the nits!

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/replica.go, line 6380 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we can remove this entire else case and DisallowUnsequencedTransactionalWrites.

Done.

pkg/storage/engine/mvcc.go, line 1263 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yep, it's good.

Done.

pkg/storage/engine/mvcc.go, line 1239 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do you think all of this logic should be pulled into a separate function? It would allow you to test the error cases more closely.

Done.

pkg/storage/engine/mvcc.go, line 1252 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's either not call this method at all when txn.Sequence == meta.Txn.Sequence or assert that found == false when txn.Sequence == meta.Txn.Sequence. FWIW, I don't really care about the cost of calling GetIntentValue, I'm just concerned about how this reads and how understandable it is.

Done.

pkg/storage/engine/mvcc.go, line 1283 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is interesting. Do we ever use the timestamp? If not, I'd leave it out entirely.

Done. Just used for MVCCIncrement so we can get rid of it.

pkg/storage/engine/mvcc.go, line 1316 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Include the sequence number in this error.

Done.

pkg/storage/engine/mvcc_test.go, line 4435 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Replace these type of assertions with testutils.IsError/testutils.IsPError.

Done.

pkg/storage/engine/enginepb/mvcc.go, line 148 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we require that a sequence equal to seq exist? Seems a little overly restrictive.

Done.

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained

Adds support to have transactions run at a lower sequence at a given key. It asserts the value it computes with the value written for the sequence in the intent history instead or returning a retry-able error. Release note: None

Removes all references to the DeprecatedBatchIndex field. Release note: None

ridwanmsharif · 2018-12-12T18:39:27Z

bors r+

33001: storage: allow transactions to run at a lower sequence r=ridwanmsharif a=ridwanmsharif Continuing off #32688, final part of #5861 (comment). Adds support to have transactions run at a lower sequence at a given key. It asserts the value it computes with the value written for the sequence in the intent history instead or returning a retry-able error. Release note: None Co-authored-by: Ridwan Sharif <[email protected]>

craig · 2018-12-12T18:54:13Z

Build succeeded

GitHub CI (Cockroach)

…tches This change builds on the introduction of idempotency properties introduced into the MVCC layer in cockroachdb#33001. It exploits this property to remove a previous restriction that could prevent transactions from refreshing and result in transaction retries. The restriction was that batches which had experienced some success while writing could not refresh even if another part of the batch required a refresh. This was because it was unclear which parts of the batch had succeeded in performing writes and which parts of the batch had not. Without this knowledge, it was unsafe to re-issue the batch because that could result in duplicating writes (e.g. performing an increment twice). A lot has changed since then. We now have proper sequence numbers on requests, we no longer send `BeginTransaction` requests (which threw errors on replays), and we now have an MVCC layer that is idempotent for writes within a transaction. With this foundation in place, we can now safely re-issue any write within a transaction that we're unsure about. As long as writes remain well sequenced (writes to the same key aren't reordered), everything should behave as expected. The best way to see that the change works is through the test cases in `TestTxnCoordSenderRetries` that now succeed. Without cockroachdb#33001, those all fail. This change will lead to the removal of `MixedSuccessError`s in 19.2! Release note: None

…tches This change builds on the introduction of idempotency properties introduced into the MVCC layer in cockroachdb#33001. It exploits this property to remove a previous restriction that could prevent transactions from refreshing and result in transaction retries. The restriction was that batches which had experienced some success while writing could not refresh even if another part of the batch required a refresh. This was because it was unclear which parts of the batch had succeeded in performing writes and which parts of the batch had not. Without this knowledge, it was unsafe to re-issue the batch because that could result in duplicating writes (e.g. performing an increment twice). A lot has changed since then. We now have proper sequence numbers on requests, we no longer send `BeginTransaction` requests (which threw errors on replays), and we now have an MVCC layer that is idempotent for writes within a transaction. With this foundation in place, we can now safely re-issue any write within a transaction that we're unsure about. As long as writes remain well sequenced (writes to the same key aren't reordered), everything should behave as expected. The best way to see that the change works is through the test cases in `TestTxnCoordSenderRetries` that now succeed. Without cockroachdb#33001, those all fail. Because we can now refresh mixed-success batches, this commit allows us to remove the `MixedSuccessError` type entirely. No longer will it haunt our sender interface with its ambiguity and awkward error wrapping. No longer will it stand in the way of read refreshes with its ignorant stubbornness. The days of its tyranny are over. The war is won. Release note (performance improvement): Transactions are able to refresh their read timestamp even after the partial success of a batch.

35140: storage: employ transactional idempotency to refresh mixed-success batches r=nvanbenschoten a=nvanbenschoten This change builds on the introduction of idempotency properties introduced into the MVCC layer in #33001. It exploits this property to remove a previous restriction that could prevent transactions from refreshing and result in transaction retries. The restriction was that batches which had experienced some success while writing could not refresh even if another part of the batch required a refresh. This was because it was unclear which parts of the batch had succeeded in performing writes and which parts of the batch had not. Without this knowledge, it was unsafe to re-issue the batch because that could result in duplicating writes (e.g. performing an increment twice). A lot has changed since then. We now have proper sequence numbers on requests, we no longer send `BeginTransaction` requests (which threw errors on replays), and we now have an MVCC layer that is idempotent for writes within a transaction. With this foundation in place, we can now safely re-issue any write within a transaction that we're unsure about. As long as writes remain well-sequenced (writes to the same key aren't reordered), everything should behave as expected. The best way to see that the change works is through the test cases in `TestTxnCoordSenderRetries` that now succeed. Without #33001, those all fail. Co-authored-by: Nathan VanBenschoten <[email protected]>

ridwanmsharif requested a review from a team December 10, 2018 22:59

nvanbenschoten requested changes Dec 11, 2018

View reviewed changes

ridwanmsharif force-pushed the sequence-transaction-replays branch 4 times, most recently from d2c0019 to 79a357a Compare December 11, 2018 21:31

ridwanmsharif commented Dec 11, 2018

View reviewed changes

ridwanmsharif requested a review from nvanbenschoten December 11, 2018 22:08

nvanbenschoten requested changes Dec 12, 2018

View reviewed changes

ridwanmsharif force-pushed the sequence-transaction-replays branch from 6d78871 to 9a6c74d Compare December 12, 2018 03:35

ridwanmsharif commented Dec 12, 2018

View reviewed changes

ridwanmsharif requested review from tbg and nvanbenschoten December 12, 2018 03:57

nvanbenschoten approved these changes Dec 12, 2018

View reviewed changes

Ridwan Sharif added 2 commits December 12, 2018 12:34

storage: allow transactions to run at a lower sequence

a147f24

Adds support to have transactions run at a lower sequence at a given key. It asserts the value it computes with the value written for the sequence in the intent history instead or returning a retry-able error. Release note: None

storage: remove all references to DeprecatedBatchIndex

e2459a2

Removes all references to the DeprecatedBatchIndex field. Release note: None

ridwanmsharif force-pushed the sequence-transaction-replays branch from 9a6c74d to e2459a2 Compare December 12, 2018 18:10

craig bot merged commit e2459a2 into cockroachdb:master Dec 12, 2018

ridwanmsharif deleted the sequence-transaction-replays branch December 12, 2018 19:01

ridwanmsharif mentioned this pull request Dec 20, 2018

storage: query hangs when using INSERT FROM ... SELECT on same table #28842

Closed

nvanbenschoten mentioned this pull request Feb 22, 2019

storage: employ transactional idempotency to refresh mixed-success batches #35140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: allow transactions to run at a lower sequence #33001

storage: allow transactions to run at a lower sequence #33001

ridwanmsharif commented Dec 10, 2018 •

edited

Loading

cockroach-teamcity commented Dec 10, 2018

nvanbenschoten left a comment

ridwanmsharif left a comment

nvanbenschoten left a comment

ridwanmsharif left a comment

nvanbenschoten left a comment

ridwanmsharif commented Dec 12, 2018

craig bot commented Dec 12, 2018

storage: allow transactions to run at a lower sequence #33001

storage: allow transactions to run at a lower sequence #33001

Conversation

ridwanmsharif commented Dec 10, 2018 • edited Loading

cockroach-teamcity commented Dec 10, 2018

nvanbenschoten left a comment

Choose a reason for hiding this comment

ridwanmsharif left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

ridwanmsharif left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

ridwanmsharif commented Dec 12, 2018

craig bot commented Dec 12, 2018

Build succeeded

ridwanmsharif commented Dec 10, 2018 •

edited

Loading