Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Make Put handle WriteTooOldError more like CPut #38668

Merged
merged 2 commits into from
Jul 9, 2019

Conversation

bdarnell
Copy link
Contributor

@bdarnell bdarnell commented Jul 3, 2019

CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates #38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).

@bdarnell bdarnell requested review from nvanbenschoten and a team July 3, 2019 19:47
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/roachpb/api.go, line 111 at r1 (raw file):

}

// IsReadAndWrite returns truee if the request both reads and writes

true


pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

	// retry is inevitable, it's probably best to continue and lay down
	// as many intents as possible before that retry (this can avoid n^2
	// behavior in some scenarios with high contention on multiple keys,

This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.


pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):

	//
	// TODO(bdarnell): Plumb the SQL CanAutoRetry field through to
	// !ba.Header.DeferWriteTooOldError.

Are you planning on addressing this immediately?

Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.


pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):

	}

	// If there was an EndTransaction in the batch that finalized the transaction,

How was this behaving correctly before?


pkg/storage/replica_test.go, line 4171 at r1 (raw file):

	}

	// Send a put for keyB; by default this fails with a WriteTooOldError

Missing a period at the end.

Copy link
Contributor Author

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestDropWhileBackfill is failing consistently in CI with this change but I haven't been able to reproduce it locally; I'm still investigating.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.

If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.


pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you planning on addressing this immediately?

Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.

Not immediately; see #35891. My plan is to get this in now and consider it for backporting to 19.1.x, but the plumbing changes would probably not be backportable (and I think they're lower priority - I wouldn't push for their addition to the 19.2 roadmap at this late date).


pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How was this behaving correctly before?

I think it wasn't, but we weren't ever reaching a case where it matters. The test that fails is this one which sends a Put in the same batch as an EndTransaction. This didn't try to return the WriteTooOldError until this commit.

It's tricky to get into the situation where this matters (a write in the same batch as an EndTransaction, and NoRefreshSpans is true). 1PC write-only transactions have their EndTransactions elided. Multi-range transactions never have anything in the same batch as their EndTransaction. Single-range multi-step transactions will usually have refresh spans. SQL never uses Put without refresh spans, so you couldn't get into the erroneous case here in practice before.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: once the test failure is resolved.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)


pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.

By "same order" I meant that conflicting transactions would be almost identical to each other. I guess it's not worth discussing in the comment though.

@bdarnell bdarnell force-pushed the defer-write-too-old branch from 58dcc63 to cc485c8 Compare July 8, 2019 21:13
@bdarnell bdarnell requested review from a team July 8, 2019 21:13
@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 8, 2019

OK, the new commit appears to fix the test flakiness. @knz could you please take a look at the error changes?

The error handling issues appear to be absent in the 19.1 branch, so if we backport this PR we'd only take the original commit.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this partial revert of #35821 is adequate. Back when I implemented #35821 we did not have an adequate way to wrap errors properly. Now that we do, this can be used.

I still think that encountering a retry error in some of the places listed here was not envisioned in the design of the surrounding code (thinking about schema changes specifically, but now that I think of it, a retry during a backup is also suspicious - why is this not accessing KV at a fixed timestamp in the past?) and letting that retry error flow through without further investigation is ... worrying.

@andreimatei what do you think?

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 9, 2019

I don't have any evidence that the backup cases are actually problematic. The ones that cause problems I've seen are the ones related to schema job status, which perform UPDATE queries. These queries would previously never return WriteTooOldError; the error would be deferred to the end of the transaction (where it wouldn't get wrapped). This PR makes the UPDATE statement return the WTOE immediately. (but even before this PR, other retriable errors were possible from the UPDATE statement)

@knz
Copy link
Contributor

knz commented Jul 9, 2019

thanks for explaining 👍

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 9, 2019

bors r=nvanbenschoten,knz

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any interest in addressing this TODO about not always refreshing on WriteTooOldErrors?

// TODO(andrei): Chances of success for on write-too-old conditions might be

If we're making WTOE more eager, should we also make regular timestamp pushes generate serialization errors more eagerly? In fact, can we get rid of the write_too_old field on the txn? Since we got rid of snapshot isolation, this field doesn't add anything to the pushed timestamp, does it?

And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old flag, just assume that the refresh will fail - which I guess would keep the write_too_old field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.

if !os.ImplicitTxn.Get() && txn.IsSerializablePushAndRefreshNotPossible() {
rc, canAutoRetry := ex.getRewindTxnCapability()
if canAutoRetry {
ev := eventRetriableErr{
IsCommit: fsm.FromBool(isCommit(stmt.AST)),
CanAutoRetry: fsm.FromBool(canAutoRetry),
}
txn.ManualRestart(ctx, ex.server.cfg.Clock.Now())
payload := eventRetriableErrPayload{
err: roachpb.NewTransactionRetryWithProtoRefreshError(
"serializable transaction timestamp pushed (detected by connExecutor)",
txn.ID(),
// No updated transaction required; we've already manually updated our
// client.Txn.
roachpb.Transaction{},
),
rewCap: rc,
}
return ev, payload, nil
}
}

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell and @nvanbenschoten)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 9, 2019

If we're making WTOE more eager, should we also make regular timestamp pushes generate serialization errors more eagerly?

Probably not. Regular timestamp pushes can usually be handled transparently with RefreshSpans.

In fact, can we get rid of the write_too_old field on the txn? Since we got rid of snapshot isolation, this field doesn't add anything to the pushed timestamp, does it?

It's hard to get rid of it because we can't write an intent and return an error at the same time, so we need the txn flag to pass this status out of the replica evaluation level. Also, the additional work contemplated in #38591, deferred WTOEs would come back.

And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old flag, just assume that the refresh will fail - which I guess would keep the write_too_old field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.

Ah, that sounds like a better idea than plumbing this flag down the stack. I'll look into it.

bdarnell added 2 commits July 9, 2019 13:43
The AssertionError wrappers hid the retryable nature of the errors,
causing failures in cases that should have been retried.

Most of the affected wrappers were changed from Wrapf to
NewAssertionError in cockroachdb#35821. I looked at all uses of
NewAssertionErrorWithWrappedErrf and changed all the ones where a
client.Txn was in scope unless they were clearly not going to be
txn-related (for example, proto unmarshaling errors).

Release note (bug fix): Transaction retries in schema changes are
again handled correctly.
CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates cockroachdb#38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).
@bdarnell bdarnell force-pushed the defer-write-too-old branch from cc485c8 to 4ee7b8a Compare July 9, 2019 17:45
@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 9, 2019

bors r=nvanbenschoten,knz

Merge skew on a proto file leading to conflicts in the c++ generated code.

craig bot pushed a commit that referenced this pull request Jul 9, 2019
38668: storage: Make Put handle WriteTooOldError more like CPut r=nvanbenschoten,knz a=bdarnell

CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates #38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).

Co-authored-by: Ben Darnell <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build succeeded

@craig craig bot merged commit 4ee7b8a into cockroachdb:master Jul 9, 2019
@bdarnell bdarnell deleted the defer-write-too-old branch July 9, 2019 19:59
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 6, 2020
…oo-old

Before this patch, any write running into a write-too-old condition
resulted in a WriteTooOldError being returned by the server. Returning
an error implies that no intents are left behind. This is unfortunate;
we'd like to leave intents (or, in the future, other types of locks)
behind so keep away other transactions. We've observed this resulting in
the starvation of a class of transactions in a user's workload.

This patch makes it so that blind writes (i.e. Puts - used by UPDATE,
not CPuts) don't return WriteTooOldErrors any more. Instead, they return
the a txn proto with the WriteTooOld flag set. This is the behavior they
had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the
client now eagerly refreshes the transactions when it sees a WriteTooOld
flag, and if the refresh succeeds, it returns a WriteTooOldError to the
higher layers (SQL), allowing for automatic retries where applicable.

Unfortunately, CPuts (used by INSERT) continue to return
WriteTooOldErrors without leaving locks behind. Dealing with them
requires more tenderness because they imply a read, and the timestamp of
a read cannot be bumped as easily as that of a write.

Touches cockroachdb#44653

Release note (SQL change): UPDATEs returning a serialization failure error (code
40001) now leave behind a lock, helping the transaction succeed if it
retries. This prevents starvation of transactions whose UPDATEs are
prone to conflicts.
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 10, 2020
…oo-old

Before this patch, any write running into a write-too-old condition
resulted in a WriteTooOldError being returned by the server. Returning
an error implies that no intents are left behind. This is unfortunate;
we'd like to leave intents (or, in the future, other types of locks)
behind so keep away other transactions. We've observed this resulting in
the starvation of a class of transactions in a user's workload.

This patch makes it so that blind writes (i.e. Puts - used by UPDATE,
not CPuts) don't return WriteTooOldErrors any more. Instead, they return
the a txn proto with the WriteTooOld flag set. This is the behavior they
had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the
client now eagerly refreshes the transactions when it sees a WriteTooOld
flag, and if the refresh succeeds, it returns a WriteTooOldError to the
higher layers (SQL), allowing for automatic retries where applicable.

Unfortunately, CPuts (used by INSERT) continue to return
WriteTooOldErrors without leaving locks behind. Dealing with them
requires more tenderness because they imply a read, and the timestamp of
a read cannot be bumped as easily as that of a write.

Touches cockroachdb#44653

Release note (sql change): UPDATEs returning a serialization failure error (code
40001) now leave behind a lock, helping the transaction succeed if it
retries. This prevents starvation of transactions whose UPDATEs are
prone to conflicts.
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 11, 2020
…oo-old

Before this patch, any write running into a write-too-old condition
resulted in a WriteTooOldError being returned by the server. Returning
an error implies that no intents are left behind. This is unfortunate;
we'd like to leave intents (or, in the future, other types of locks)
behind so keep away other transactions. We've observed this resulting in
the starvation of a class of transactions in a user's workload.

This patch makes it so that blind writes (i.e. Puts - used by UPDATE,
not CPuts) don't return WriteTooOldErrors any more. Instead, they return
the a txn proto with the WriteTooOld flag set. This is the behavior they
had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the
client now eagerly refreshes the transactions when it sees a WriteTooOld
flag, and if the refresh succeeds, it returns a WriteTooOldError to the
higher layers (SQL), allowing for automatic retries where applicable.

Unfortunately, CPuts (used by INSERT) continue to return
WriteTooOldErrors without leaving locks behind. Dealing with them
requires more tenderness because they imply a read, and the timestamp of
a read cannot be bumped as easily as that of a write.

Touches cockroachdb#44653

Release note (sql change): UPDATEs returning a serialization failure error (code
40001) now leave behind a lock, helping the transaction succeed if it
retries. This prevents starvation of transactions whose UPDATEs are
prone to conflicts.
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 13, 2020
…oo-old

Before this patch, any write running into a write-too-old condition
resulted in a WriteTooOldError being returned by the server. Returning
an error implies that no intents are left behind. This is unfortunate;
we'd like to leave intents (or, in the future, other types of locks)
behind so keep away other transactions. We've observed this resulting in
the starvation of a class of transactions in a user's workload.

This patch makes it so that blind writes (i.e. Puts - used by UPDATE,
not CPuts) don't return WriteTooOldErrors any more. Instead, they return
the a txn proto with the WriteTooOld flag set. This is the behavior they
had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the
client now eagerly refreshes the transactions when it sees a WriteTooOld
flag, and if the refresh succeeds, it returns a WriteTooOldError to the
higher layers (SQL), allowing for automatic retries where applicable.

Unfortunately, CPuts (used by INSERT) continue to return
WriteTooOldErrors without leaving locks behind. Dealing with them
requires more tenderness because they imply a read, and the timestamp of
a read cannot be bumped as easily as that of a write.

Touches cockroachdb#44653

Release note (sql change): UPDATEs returning a serialization failure error (code
40001) now leave behind a lock, helping the transaction succeed if it
retries. This prevents starvation of transactions whose UPDATEs are
prone to conflicts.
craig bot pushed a commit that referenced this pull request Feb 13, 2020
44654: storage: leave intents behind after blind-writes experiencing write-too-old r=andreimatei a=andreimatei

Before this patch, any write running into a write-too-old condition
resulted in a WriteTooOldError being returned by the server. Returning
an error implies that no intents are left behind. This is unfortunate;
we'd like to leave intents (or, in the future, other types of locks)
behind so keep away other transactions. We've observed this resulting in
the starvation of a class of transactions in a user's workload.

This patch makes it so that blind writes (i.e. Puts - used by UPDATE,
not CPuts) don't return WriteTooOldErrors any more. Instead, they return
the a txn proto with the WriteTooOld flag set. This is the behavior they
had before #38668. This patch retains the goal of #38668, however: the
client now eagerly refreshes the transactions when it sees a WriteTooOld
flag, and if the refresh succeeds, it returns a WriteTooOldError to the
higher layers (SQL), allowing for automatic retries where applicable.

Unfortunately, CPuts (used by INSERT) continue to return
WriteTooOldErrors without leaving locks behind. Dealing with them
requires more tenderness because they imply a read, and the timestamp of
a read cannot be bumped as easily as that of a write.

Touches #44653

Release note (SQL change): UPDATEs returning a serialization failure error (code
40001) now leave behind a lock, helping the transaction succeed if it
retries. This prevents starvation of transactions whose UPDATEs are
prone to conflicts.

Co-authored-by: Andrei Matei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants