Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.1: kv/concurrency: permit lock timestamp regression across durabilities #47139

Merged

Conversation

nvanbenschoten
Copy link
Member

Backport 1/1 commits from #47101.

/cc @cockroachdb/release


Fixes #46526.
Fixes #46779.
Follow up to #46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.

Fixes cockroachdb#46526.
Fixes cockroachdb#46779.
Follow up to cockroachdb#46391.

This change adjusts the lockTable to allow lock timestamp regressions
when necessary. Specifically, it allows a lock's timestamp as reported
by getLockerInfo to regress if it is acquired at a lower timestamp and a
different durability than it was previously held with. This is necessary
to support because the hard constraint which we must uphold here that
the lockHolderInfo for a replicated lock cannot diverge from the
replicated state machine in such a way that its timestamp in the
lockTable exceeds that in the replicated keyspace. If this invariant
were to be violated, we'd risk infinite lock-discovery loops for
requests that conflict with the lock as is written in the replicated
state machine but not as is reflected in the lockTable.

Lock timestamp regressions are safe from the perspective of other
transactions because the request which re-acquired the lock at the lower
timestamp must have been holding a write latch at or below the new
lock's timestamp. This means that no conflicting requests could be
evaluating concurrently. Instead, all will need to re-scan the lockTable
once they acquire latches and will notice the reduced timestamp at that
point, which may cause them to conflict with the lock even if they had
not conflicted before. In a sense, it is no different than the first
time a lock is added to the lockTable.

I considered special-casing this logic to look at the new lock's
durability and only allow the regression in the case that the new lock
was replicated and instead forwarding the acquisition timestamp in the
case that the new lock was unreplicated, but doing so seemed complex and
not clearly worth it. The rest of the lock-table supports these lock
timestamp regressions, so adding complexity to conditionally avoid the
case for certain state transitions, based on the lock durabilities,
didn't seem worthwhile. I'm happy to reconsider this decision.

Release note (bug fix): CDC no longer combines with long running
transactions to trigger an assertion with the text "lock timestamp
regression".

Release justification: fixes a high-priority bug in existing
functionality. The bug could crash a server if the right sequence of
events occurred. This was typically rare, but was much more common when
CDC was in use.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@nvanbenschoten nvanbenschoten merged commit 95887d4 into cockroachdb:release-20.1 Apr 7, 2020
@nvanbenschoten nvanbenschoten deleted the backport20.1-47101 branch April 8, 2020 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants