storage: Make Put handle WriteTooOldError more like CPut #38668

bdarnell · 2019-07-03T19:47:20Z

CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates #38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).

cockroach-teamcity · 2019-07-03T19:47:26Z

This change is

nvanbenschoten

Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)

pkg/roachpb/api.go, line 111 at r1 (raw file):

}

// IsReadAndWrite returns truee if the request both reads and writes

true

pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

	// retry is inevitable, it's probably best to continue and lay down
	// as many intents as possible before that retry (this can avoid n^2
	// behavior in some scenarios with high contention on multiple keys,

This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.

pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):

	//
	// TODO(bdarnell): Plumb the SQL CanAutoRetry field through to
	// !ba.Header.DeferWriteTooOldError.

Are you planning on addressing this immediately?

Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.

pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):

	}

	// If there was an EndTransaction in the batch that finalized the transaction,

How was this behaving correctly before?

pkg/storage/replica_test.go, line 4171 at r1 (raw file):

	}

	// Send a put for keyB; by default this fails with a WriteTooOldError

Missing a period at the end.

bdarnell

TestDropWhileBackfill is failing consistently in CI with this change but I haven't been able to reproduce it locally; I'm still investigating.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.

If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.

pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you planning on addressing this immediately?

Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.

Not immediately; see #35891. My plan is to get this in now and consider it for backporting to 19.1.x, but the plumbing changes would probably not be backportable (and I think they're lower priority - I wouldn't push for their addition to the 19.2 roadmap at this late date).

pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How was this behaving correctly before?

I think it wasn't, but we weren't ever reaching a case where it matters. The test that fails is this one which sends a Put in the same batch as an EndTransaction. This didn't try to return the WriteTooOldError until this commit.

It's tricky to get into the situation where this matters (a write in the same batch as an EndTransaction, and NoRefreshSpans is true). 1PC write-only transactions have their EndTransactions elided. Multi-range transactions never have anything in the same batch as their EndTransaction. Single-range multi-step transactions will usually have refresh spans. SQL never uses Put without refresh spans, so you couldn't get into the erroneous case here in practice before.

nvanbenschoten

once the test failure is resolved.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)

pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.

By "same order" I meant that conflicting transactions would be almost identical to each other. I guess it's not worth discussing in the comment though.

bdarnell · 2019-07-08T21:15:10Z

OK, the new commit appears to fix the test flakiness. @knz could you please take a look at the error changes?

The error handling issues appear to be absent in the 19.1 branch, so if we backport this PR we'd only take the original commit.

knz

yes this partial revert of #35821 is adequate. Back when I implemented #35821 we did not have an adequate way to wrap errors properly. Now that we do, this can be used.

I still think that encountering a retry error in some of the places listed here was not envisioned in the design of the surrounding code (thinking about schema changes specifically, but now that I think of it, a retry during a backup is also suspicious - why is this not accessing KV at a fixed timestamp in the past?) and letting that retry error flow through without further investigation is ... worrying.

@andreimatei what do you think?

bdarnell · 2019-07-09T14:06:57Z

I don't have any evidence that the backup cases are actually problematic. The ones that cause problems I've seen are the ones related to schema job status, which perform UPDATE queries. These queries would previously never return WriteTooOldError; the error would be deferred to the end of the transaction (where it wouldn't get wrapped). This PR makes the UPDATE statement return the WTOE immediately. (but even before this PR, other retriable errors were possible from the UPDATE statement)

knz · 2019-07-09T14:12:20Z

thanks for explaining 👍

bdarnell · 2019-07-09T14:14:13Z

bors r=nvanbenschoten,knz

craig · 2019-07-09T14:42:56Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-07-09T15:10:38Z

Build failed (retrying...)

GitHub CI (Cockroach)

andreimatei

Any interest in addressing this TODO about not always refreshing on WriteTooOldErrors?

cockroach/pkg/roachpb/data.go

Line 1246 in d024107

// TODO(andrei): Chances of success for on write-too-old conditions might be

If we're making WTOE more eager, should we also make regular timestamp pushes generate serialization errors more eagerly? In fact, can we get rid of the write_too_old field on the txn? Since we got rid of snapshot isolation, this field doesn't add anything to the pushed timestamp, does it?

And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old flag, just assume that the refresh will fail - which I guess would keep the write_too_old field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.

cockroach/pkg/sql/conn_executor_exec.go

Lines 425 to 445 in d024107

    
           if !os.ImplicitTxn.Get() && txn.IsSerializablePushAndRefreshNotPossible() { 
        
           	rc, canAutoRetry := ex.getRewindTxnCapability() 
        
           	if canAutoRetry { 
        
           		ev := eventRetriableErr{ 
        
           			IsCommit:     fsm.FromBool(isCommit(stmt.AST)), 
        
           			CanAutoRetry: fsm.FromBool(canAutoRetry), 
        
           		} 
        
           		txn.ManualRestart(ctx, ex.server.cfg.Clock.Now()) 
        
           		payload := eventRetriableErrPayload{ 
        
           			err: roachpb.NewTransactionRetryWithProtoRefreshError( 
        
           				"serializable transaction timestamp pushed (detected by connExecutor)", 
        
           				txn.ID(), 
        
           				// No updated transaction required; we've already manually updated our 
        
           				// client.Txn. 
        
           				roachpb.Transaction{}, 
        
           			), 
        
           			rewCap: rc, 
        
           		} 
        
           		return ev, payload, nil 
        
           	} 
        
           }

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell and @nvanbenschoten)

craig · 2019-07-09T15:46:21Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-07-09T16:28:50Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2019-07-09T17:12:34Z

Build failed

GitHub CI (Cockroach)

bdarnell · 2019-07-09T17:15:45Z

If we're making WTOE more eager, should we also make regular timestamp pushes generate serialization errors more eagerly?

Probably not. Regular timestamp pushes can usually be handled transparently with RefreshSpans.

In fact, can we get rid of the write_too_old field on the txn? Since we got rid of snapshot isolation, this field doesn't add anything to the pushed timestamp, does it?

It's hard to get rid of it because we can't write an intent and return an error at the same time, so we need the txn flag to pass this status out of the replica evaluation level. Also, the additional work contemplated in #38591, deferred WTOEs would come back.

And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old flag, just assume that the refresh will fail - which I guess would keep the write_too_old field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.

Ah, that sounds like a better idea than plumbing this flag down the stack. I'll look into it.

The AssertionError wrappers hid the retryable nature of the errors, causing failures in cases that should have been retried. Most of the affected wrappers were changed from Wrapf to NewAssertionError in cockroachdb#35821. I looked at all uses of NewAssertionErrorWithWrappedErrf and changed all the ones where a client.Txn was in scope unless they were clearly not going to be txn-related (for example, proto unmarshaling errors). Release note (bug fix): Transaction retries in schema changes are again handled correctly.

CPut (used by INSERT) and Put (used by UPDATE) previously handled WriteTooOldErrors differently: CPut would return them immediately while Put would defer them to the end of the transaction. This commit makes Put work (mostly) like CPut. The major benefit of this change is that if the first statement of a transaction is an UPDATE, it will no longer be possible for it to return transaction-retry errors due to SQL-level automatic retries. (This was already true for INSERT. Note that an ON CONFLICT DO UPDATE clause acts like a standalone UPDATE in this respect) A potential downside is that UPDATE-heavy workloads experiencing high contention on many keys may have worse performance (up to O(n^2)). Updates cockroachdb#38591 Release note (sql change): The first statement of a transaction will no longer return a transaction-retry error if it is an UPDATE or DELETE (this was already true for INSERT).

bdarnell · 2019-07-09T17:57:25Z

bors r=nvanbenschoten,knz

Merge skew on a proto file leading to conflicts in the c++ generated code.

38668: storage: Make Put handle WriteTooOldError more like CPut r=nvanbenschoten,knz a=bdarnell CPut (used by INSERT) and Put (used by UPDATE) previously handled WriteTooOldErrors differently: CPut would return them immediately while Put would defer them to the end of the transaction. This commit makes Put work (mostly) like CPut. The major benefit of this change is that if the first statement of a transaction is an UPDATE, it will no longer be possible for it to return transaction-retry errors due to SQL-level automatic retries. (This was already true for INSERT. Note that an ON CONFLICT DO UPDATE clause acts like a standalone UPDATE in this respect) A potential downside is that UPDATE-heavy workloads experiencing high contention on many keys may have worse performance (up to O(n^2)). Updates #38591 Release note (sql change): The first statement of a transaction will no longer return a transaction-retry error if it is an UPDATE or DELETE (this was already true for INSERT). Co-authored-by: Ben Darnell <[email protected]>

craig · 2019-07-09T18:28:29Z

Build succeeded

GitHub CI (Cockroach)

…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (SQL change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.

…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (sql change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.

44654: storage: leave intents behind after blind-writes experiencing write-too-old r=andreimatei a=andreimatei Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before #38668. This patch retains the goal of #38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches #44653 Release note (SQL change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts. Co-authored-by: Andrei Matei <[email protected]>

bdarnell requested review from nvanbenschoten and a team July 3, 2019 19:47

bdarnell mentioned this pull request Jul 3, 2019

sql: Client-side retries with WriteTooOldError in UPDATE #38591

Closed

bdarnell force-pushed the defer-write-too-old branch from cb1d3be to 58dcc63 Compare July 3, 2019 20:05

nvanbenschoten reviewed Jul 8, 2019

View reviewed changes

bdarnell commented Jul 8, 2019

View reviewed changes

nvanbenschoten approved these changes Jul 8, 2019

View reviewed changes

bdarnell force-pushed the defer-write-too-old branch from 58dcc63 to cc485c8 Compare July 8, 2019 21:13

bdarnell requested review from a team July 8, 2019 21:13

knz approved these changes Jul 9, 2019

View reviewed changes

andreimatei reviewed Jul 9, 2019

View reviewed changes

bdarnell added 2 commits July 9, 2019 13:43

bdarnell force-pushed the defer-write-too-old branch from cc485c8 to 4ee7b8a Compare July 9, 2019 17:45

craig bot merged commit 4ee7b8a into cockroachdb:master Jul 9, 2019

bdarnell deleted the defer-write-too-old branch July 9, 2019 19:59

tbg mentioned this pull request Jul 23, 2019

storage: adaptive closed timestamp target duration #36478

Closed

bdarnell mentioned this pull request Jul 24, 2019

release-19.1: storage: Make Put handle WriteTooOldError more like CPut #39087

Merged

andreimatei mentioned this pull request Nov 27, 2019

1PC CPuts suffering from lost updates #42849

Closed

andreimatei mentioned this pull request Feb 3, 2020

storage: writes encountering write-too-old don't leave intents behind #44653

Closed

andreimatei mentioned this pull request Feb 3, 2020

storage: leave intents behind after blind-writes experiencing write-too-old #44654

Merged

nvanbenschoten mentioned this pull request May 3, 2023

kv: eliminate write-too-old deferral mechanism #102751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Make Put handle WriteTooOldError more like CPut #38668

storage: Make Put handle WriteTooOldError more like CPut #38668

bdarnell commented Jul 3, 2019

cockroach-teamcity commented Jul 3, 2019

nvanbenschoten left a comment

bdarnell left a comment

nvanbenschoten left a comment

bdarnell commented Jul 8, 2019

knz left a comment

bdarnell commented Jul 9, 2019

knz commented Jul 9, 2019

bdarnell commented Jul 9, 2019

craig bot commented Jul 9, 2019

craig bot commented Jul 9, 2019

andreimatei left a comment •

edited

Loading

craig bot commented Jul 9, 2019

craig bot commented Jul 9, 2019

craig bot commented Jul 9, 2019

bdarnell commented Jul 9, 2019

bdarnell commented Jul 9, 2019

craig bot commented Jul 9, 2019

	if !os.ImplicitTxn.Get() && txn.IsSerializablePushAndRefreshNotPossible() {
	rc, canAutoRetry := ex.getRewindTxnCapability()
	if canAutoRetry {
	ev := eventRetriableErr{
	IsCommit: fsm.FromBool(isCommit(stmt.AST)),
	CanAutoRetry: fsm.FromBool(canAutoRetry),
	}
	txn.ManualRestart(ctx, ex.server.cfg.Clock.Now())
	payload := eventRetriableErrPayload{
	err: roachpb.NewTransactionRetryWithProtoRefreshError(
	"serializable transaction timestamp pushed (detected by connExecutor)",
	txn.ID(),
	// No updated transaction required; we've already manually updated our
	// client.Txn.
	roachpb.Transaction{},
	),
	rewCap: rc,
	}
	return ev, payload, nil
	}
	}

storage: Make Put handle WriteTooOldError more like CPut #38668

storage: Make Put handle WriteTooOldError more like CPut #38668

Conversation

bdarnell commented Jul 3, 2019

cockroach-teamcity commented Jul 3, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

bdarnell commented Jul 8, 2019

knz left a comment

Choose a reason for hiding this comment

bdarnell commented Jul 9, 2019

knz commented Jul 9, 2019

bdarnell commented Jul 9, 2019

craig bot commented Jul 9, 2019

Build failed (retrying...)

craig bot commented Jul 9, 2019

Build failed (retrying...)

andreimatei left a comment • edited Loading

Choose a reason for hiding this comment

craig bot commented Jul 9, 2019

Build failed (retrying...)

craig bot commented Jul 9, 2019

Build failed (retrying...)

craig bot commented Jul 9, 2019

Build failed

bdarnell commented Jul 9, 2019

bdarnell commented Jul 9, 2019

craig bot commented Jul 9, 2019

Build succeeded

andreimatei left a comment •

edited

Loading