-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txn: persist fair lock type in lock information and handle stale fair lock resolve #14692
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/test |
components/txn_types/src/lock.rs
Outdated
@@ -20,13 +20,14 @@ pub enum LockType { | |||
Put, | |||
Delete, | |||
Lock, | |||
Pessimistic, | |||
Pessimistic(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to comment the field.
Ok((TxnStatus::TtlExpire, released)) | ||
}; | ||
if lock.is_pessimistic_lock() { | ||
let (status, released) = check_txn_status_from_pessimistic_primary_lock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only guarantee it is primary if verify_is_primary
is true. What should we do if it's false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, In my opinion, for the newer version, the kv client should always set it to true as indicated in https://github.com/tikv/tikv/blob/master/src/storage/txn/commands/check_txn_status.rs#L57. For older versions, resolving the primary-key switching issue was not possible.
Another potential solution is to never place a rollback
record on the pessimistic lock primary key. This would prevent breaking transaction status. However, if during prewrite phase, the prewrite request on the primary key is lost and remains a pessimistic lock while secondary locks are resolved as prewrite ones, it could neither be rolled back nor committed since we cannot determine its transaction status.
/cc @MyonKeminta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, my original idea is to check write cf to see if there is any record of this transaction (get_txn_commit_record
) if check_txn_status finds the primary has a pessimistic lock, and it's marked to be locked with conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed while considering about this, that there exists some more problem. When we call check_txn_status on a primary (maybe due to encountering an optimistic lock), the primary may be locked by a stale pessimistic lock request whose primary points to another key. For this case, it seems that it can be covered by performing the logic that checking locked-with-conflict flag and checking txn status from write cf before verifying primary at line 115.
However, I'm afraid there can still be way more complicated corner cases starting from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's much more complicated than before because we have to make sure if the current primary key
and lock on the key
are valid, the transaction status should be determined only if the information are valid.
TxnStatus::RolledBack => (TxnStatus::RolledBack, true), | ||
TxnStatus::Committed { commit_ts } => (TxnStatus::committed(commit_ts), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TxnStatus::RolledBack => (TxnStatus::RolledBack, true), | |
TxnStatus::Committed { commit_ts } => (TxnStatus::committed(commit_ts), true), | |
t @ TxnStatus::RolledBack | t @ TxnStatus::Committed{..} => (t, true), |
} | ||
}, | ||
Err(err) => match err { | ||
// Continue to process if there's no correspond persistent information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Continue to process if there's no correspond persistent information. | |
// Continue to process if there's no corresponding persistent information. |
8b34647
to
21244cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation LGTM, but I'm considering can we reject a pessimistic acquiring request when the recent write record's start_ts is equal to the txn's start_ts, which means the transaction is already committed. If so, the format might not be changed, I'm not clear enough with rejecting pessimistic lock way, need more thinking.
@you06 |
Another thought is can we accept the conflict pessimistic lock from resumed requests only? So that we can reject the stale pessimistic lock because it's not in TiKV's waiting queue. |
@you06 I dont't quite get it, do you mean we check if it is a force lock by checking its existency in the waitting queue? Could you explain more about it in details? |
@cfzjywxk If the a request acquired a lock successfully without awaking(from the fair lock queue), it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..., but I'm considering can we reject a pessimistic acquiring request when the recent write record's start_ts is equal to the txn's start_ts, ...
Apart from the extra performance cost, also note that if the transaction is committed, it might not always be the most recent write record since it's theoretically possible (though the probability is very low) that another transaction writes this key before the stale request arrives.
Another thought is can we accept the conflict pessimistic lock from resumed requests only? So that we can reject the stale pessimistic lock because it's not in TiKV's waiting queue.
Allowing locking with conflict no matter if it's resumed is an important idea in my opinion. When a new pessimistic lock request arrives, the previous transaction might be either ongoing or finished. My original expectation is to reduce the extra latency from statement-retrying in both these cases. But since in our high-concurrency single-row conflict scenarios the row is continuously locked, disallowing locking with conflict for non-resumed request might not cause any problem.
However a more important problem is that, when a stale pessimistic lock request arrives, it's possible that the key is locked by another transaction, which will put the stale request to waiting status. Then when it's resumed, it will still acquire the lock that we don't want.
components/txn_types/src/lock.rs
Outdated
} | ||
|
||
const FLAG_PUT: u8 = b'P'; | ||
const FLAG_DELETE: u8 = b'D'; | ||
const FLAG_LOCK: u8 = b'L'; | ||
const FLAG_PESSIMISTIC: u8 = b'S'; | ||
const FLAG_PESSIMISTIC_WTIH_CONFLICT: u8 = b'F'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it needs carefully handling the compatibility. It shouldn't be written unless all components in the cluster were upgraded to a version that supports this new lock type. I think more simple approach to do that is to add a flag to indicate whether the lock is acquired with conflict. The flag can also be derived to the optimistic lock after prewritting, so it's possible to know if it's locked with conflict after prewriting if we need in the future. Then the only compatibility issue we need to concern about is to inform the TiFlash team to recognize this new flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more simple approach to do that is to add a flag to indicate whether the lock is acquired with conflict
Do you mean adding the flag somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/WTIH/WITH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more simple approach to do that is to add a flag to indicate whether the lock is acquired with conflict
Do you mean adding the flag somewhere else?
Yes. Add an extra flag instead of a new lock type. It would be much easier to handle compatibility stuff, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MyonKeminta
It's updated, PTAL.
c09f3f0
to
edab787
Compare
"current_ts" => current_ts, | ||
"resolving_pessimistic_lock" => ?resolving_pessimistic_lock, | ||
); | ||
let released = txn.unlock_key(primary_key, true, TimeStamp::zero()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second argument commit_ts
is used as the conflicting_commit_ts
information for pessimistic lock requests that are waiting for the lock (if any). But in this case it doesn't make sense by passing either zero or the commit ts of the record we found, as there's possibly a new write record on the key with large commit ts. Perhaps we'd better consider adding an unknown state for it to make the semantics more clear. (nothing to do with this PR though, just to mention)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's needed to do some refactor about this interface and the commit_ts
parameter.
return if resolving_pessimistic_lock { | ||
let released = txn.unlock_key(primary_key, true, TimeStamp::zero()); | ||
MVCC_CHECK_TXN_STATUS_COUNTER_VEC.pessimistic_rollback.inc(); | ||
Ok((TxnStatus::PessimisticRollBack, released)) | ||
} else { | ||
let released = rollback_lock(txn, reader, primary_key, lock, true, true)?; | ||
MVCC_CHECK_TXN_STATUS_COUNTER_VEC.rollback.inc(); | ||
Ok((TxnStatus::TtlExpire, released)) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me some time to realize this is actually correct...
(By the way I think it's better to comment at line 76 about what kinds of cases may reach this piece of code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are added above.
reader, | ||
primary_key.clone(), | ||
None, | ||
MissingLockAction::ReturnError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to assert that check_txn_status_missing_lock
called here won't perform any write operation?
It seems it won't write anything (to the txn
) currently. I'm kind of afraid that if in the future we mistakenly writes the samething (the same physical key) more than once in a single MvccTxn
object, the final result might be undefined
/// 'start_ts'. | ||
/// | ||
/// 1. Validate whether the existing lock indeed corresponds to the | ||
/// primary lock. The primary key may switch under certain circumstances. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// primary lock. The primary key may switch under certain circumstances. If | |
/// primary lock. The primary key may switch under certain circumstances. If |
/// | ||
/// 1. Validate whether the existing lock indeed corresponds to the | ||
/// primary lock. The primary key may switch under certain circumstances. If | ||
/// it's a stale lock, the transaction status should not be determined by it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// it's a stale lock, the transaction status should not be determined by it. | |
/// it's a stale lock, the transaction status should not be determined by it. |
/// 1. Validate whether the existing lock indeed corresponds to the | ||
/// primary lock. The primary key may switch under certain circumstances. If | ||
/// it's a stale lock, the transaction status should not be determined by it. | ||
/// Refer to https://github.com/tikv/tikv/issues/14636 for additional information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Refer to https://github.com/tikv/tikv/issues/14636 for additional information. | |
/// Refer to https://github.com/pingcap/tidb/issues/42937 for additional information. |
#14636 doesn't say anything in detail. I think linking to the corresponding issue in TiDB repo is better.
)?; | ||
// Return if the primary lock is stale or the transaction status is decided. | ||
if (status.is_decided() || status == TxnStatus::PessimisticRollBack) && released.is_some() { | ||
return Ok((status, released)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert released
is empty if we are not returning here? We should be careful not to write the same thing to MvccTxn
twice or put the same thing to ReleasedLocks
twice.
// 1. Test resolve the stale pessimistic primary lock. Note the force lock | ||
// could succeed only if there's no corresponding rollback record. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is a rollback record, it seems it can still succeed if there is a newer record written by another transaction. This case is better to be covered too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check https://github.com/tikv/tikv/blob/master/src/storage/txn/actions/acquire_pessimistic_lock.rs#L282-L297 seems to prevent lock acquiring when there's a corresponding rollback record.
be67276
to
8d69553
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
// rollback record in the write CF, if so the current primary | ||
// pessimistic lock is stale. Otherwise the primary pessimistic lock is | ||
// regarded as valid, and the transaction status is determined by it. | ||
let (txn_status, is_status_decided) = check_txn_status_from_storage(reader, &primary_key)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both locks and writes comes from the storage. Perhaps we can call it check_determined_txn_status
and return Option<TxnStatus>
which is None if the record isn't found in write cf. The similar to check_secondary_locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are refactored, PTAL.
@@ -54,6 +54,78 @@ enum SecondaryLockStatus { | |||
RolledBack, | |||
} | |||
|
|||
// The returned `bool` indicates whether the rollback record should be written, | |||
// it should be false if and only if the txn commit record is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be false if ...
Did you mean it should be true ?
|
||
if lock.is_pessimistic_lock() { | ||
let released_lock = txn.unlock_key(key.clone(), true, TimeStamp::zero()); | ||
let overlapped_write = reader.get_txn_commit_record(key)?.unwrap_none(region_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_txn_commit_record
might be called twice for a force-locked pessimstic lock. I think in this case it should be possible to reuse the record info already got at line 101. It doesn't seem to matter much though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored, PTAL.
8d69553
to
569e864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if lock.is_pessimistic_lock() { | ||
let released_lock = txn.unlock_key(key.clone(), true, TimeStamp::zero()); | ||
let overlapped_write_res = if lock.is_pessimistic_lock_with_conflict() { | ||
overlapped_write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider comment about that overlapped_write
is already loaded if the above lock_with_conflcit related code is executed so that we can use it directly.
…le locks Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
Signed-off-by: cfzjywxk <[email protected]>
569e864
to
54a4ecb
Compare
/merge |
@cfzjywxk: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 54a4ecb
|
How will this affect TiFlash, is it safe to be ignored? |
What is changed and how it works?
Issue Number: Ref #13298 pingcap/tidb#43540
What's Changed:
lock with conflict
information in the pessimistic lock type.the actual transaction status and unlock the stale lock.
the actual transaction status and unlock the stale lock.
Related changes
Check List
Tests
Side effects
Release note