Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: don't bump transaction on closed timestamp when re-writing existing intent #63796

Closed
nvanbenschoten opened this issue Apr 16, 2021 · 4 comments
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-kv KV Team X-stale

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 16, 2021

As observed in #36478 (comment) and later discussed in #41720, this could help prevent starvation scenarios in cases where a transaction:

- scans a large number of rows in a GLOBAL table
- writes a row in a GLOBAL table, gets its timestamp bumped on the closed timestamp
- attempts to refresh and fails due to conflicting writes that invalidated its read
- retries and rewrites the same rows

If a rewrite of the intent did not get bumped on the closed timestamp, which is safe, we could avoid this transaction retrying indefinitely.

This is coming up in kvnemesis + global_reads: #63747.

Jira issue: CRDB-6778

@nvanbenschoten nvanbenschoten added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-transactions Relating to MVCC and the transactional model. labels Apr 16, 2021
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 16, 2021
This commit adds an exponential backoff to the transaction retry loop
when it detects that a transaction has been aborted. This was observed
to prevent thrashing under heavy read-write contention on `global_read`
ranges, which are added to kvnemesis in cockroachdb#63747. These ranges have an
added propensity to cause thrashing because every write to these ranges
gets bumped to a higher timestamp, so it is currently imperative that a
transaction be able to refresh its reads after writing to a global_read
range. If other transactions continue to invalidate a read-write
transaction's reads, it may never complete and will repeatedly abort
conflicting txns after detecting deadlocks. This commit prevents this
from stalling kvnemesis indefinitely.

I see two ways that we can improve this situation in the future.
1. The first option is that we could introduce some form of pessimistic
   read-locking for long running read-write transactions, so that they can
   eventually prevent other transactions from invalidating their reads as
   they proceed to write to a global_reads range and get their write
   timestamp bumped. This ensures that when the long-running transaction
   returns to refresh (if it even needs to, depending on the durability of
   the read locks) its reads, the refresh will have a high likelihood of
   succeeding. This is discussed in cockroachdb#52768.
2. The second option is to allow a transaction to re-write its existing
   intents in new epochs without being bumped by the closed timestamp. If a
   transaction only got bumped by the closed timestamp when writing new
   intents, then after a transaction was forced to retry, it would have a
   high likelihood of succeeding on its second epoch as long as it didn't
   write to a new set of keys. This is discussed in cockroachdb#63796.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 23, 2021
This commit adds an exponential backoff to the transaction retry loop
when it detects that a transaction has been aborted. This was observed
to prevent thrashing under heavy read-write contention on `global_read`
ranges, which are added to kvnemesis in cockroachdb#63747. These ranges have an
added propensity to cause thrashing because every write to these ranges
gets bumped to a higher timestamp, so it is currently imperative that a
transaction be able to refresh its reads after writing to a global_read
range. If other transactions continue to invalidate a read-write
transaction's reads, it may never complete and will repeatedly abort
conflicting txns after detecting deadlocks. This commit prevents this
from stalling kvnemesis indefinitely.

I see two ways that we can improve this situation in the future.
1. The first option is that we could introduce some form of pessimistic
   read-locking for long running read-write transactions, so that they can
   eventually prevent other transactions from invalidating their reads as
   they proceed to write to a global_reads range and get their write
   timestamp bumped. This ensures that when the long-running transaction
   returns to refresh (if it even needs to, depending on the durability of
   the read locks) its reads, the refresh will have a high likelihood of
   succeeding. This is discussed in cockroachdb#52768.
2. The second option is to allow a transaction to re-write its existing
   intents in new epochs without being bumped by the closed timestamp. If a
   transaction only got bumped by the closed timestamp when writing new
   intents, then after a transaction was forced to retry, it would have a
   high likelihood of succeeding on its second epoch as long as it didn't
   write to a new set of keys. This is discussed in cockroachdb#63796.
craig bot pushed a commit that referenced this issue Apr 23, 2021
63799: kvnemesis: add backoff to retry loop on txn aborts r=nvanbenschoten a=nvanbenschoten

This commit adds an exponential backoff to the transaction retry loop
when it detects that a transaction has been aborted. This was observed
to prevent thrashing under heavy read-write contention on `global_read`
ranges, which are added to kvnemesis in #63747. These ranges have an
added propensity to cause thrashing because every write to these ranges
gets bumped to a higher timestamp, so it is currently imperative that a
transaction be able to refresh its reads after writing to a global_read
range. If other transactions continue to invalidate a read-write
transaction's reads, it may never complete and will repeatedly abort
conflicting txns after detecting deadlocks. This commit prevents this
from stalling kvnemesis indefinitely.

I see two ways that we can improve this situation in the future.
1. The first option is that we could introduce some form of pessimistic
   read-locking for long running read-write transactions, so that they can
   eventually prevent other transactions from invalidating their reads as
   they proceed to write to a global_reads range and get their write
   timestamp bumped. This ensures that when the long-running transaction
   returns to refresh (if it even needs to, depending on the durability of
   the read locks) its reads, the refresh will have a high likelihood of
   succeeding. This is discussed in #52768.
2. The second option is to allow a transaction to re-write its existing
   intents in new epochs without being bumped by the closed timestamp. If a
   transaction only got bumped by the closed timestamp when writing new
   intents, then after a transaction was forced to retry, it would have a
   high likelihood of succeeding on its second epoch as long as it didn't
   write to a new set of keys. This is discussed in #63796.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@nvanbenschoten
Copy link
Member Author

cc. @andreimatei in case you want to add anything here from our recent discussion about this.

@andreimatei
Copy link
Contributor

What we discussed with @nvanbenschoten and @ajwerner was that a request that only writes to keys on which the transaction holds replicated locks shouldn't consult the timestamp cache / the closed timestamp.

To take advantage of this observation, we need a mechanism for efficiently figuring out if a request is only touching locked keys. The lock table is that, except currently it doesn't contains locks corresponding to intents that no conflicting reader has ran into; writing an intent does not populate the lock table for optimistic memory usage reasons. One proposal was that writing intents could populate the cache starting at epoch two. This way, most transactions are unaffected, but a transaction that failed to refresh once (and was forced to restart) would switch to this mode where it can write to the keys that it wrote in previous epochs without interacting with the closed timestamp.

@ajwerner
Copy link
Contributor

One proposal was that writing intents could populate the cache starting at epoch two.

nit: IIRC epochs start at 0, so maybe at epoch 1 we start to populate the lock table

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-kv KV Team X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants