-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: savepoints are broken with slow statements on master #94337
Comments
I ran a git bisect, and found that 7ff05f1 is the first commit that causes Jordan's test to fail:
|
1.5 seconds also seems close to the minimum - 1.25 doesn't work on my machine. |
Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
The issue here seems to be that we're incorrectly determining that the scan request for cockroach/pkg/storage/engine.go Lines 1777 to 1779 in e7713f1
However, this doesn't take into account The question still remains -- what's the deal with In addition to the sequence numbers, we also look at the timestamp of the read request and compare it with the timestamp at which the intent is written at when deciding whether we need the intent history or not. We only determine that intent history isn't required if the read timestamp is greater than the timestamp at which the intent was written, for which the transaction needs to get pushed. Adding the |
… required Read-only requests that drop their latches early, before evaluation, scan the lock table to resolve conflicts in `ScanConflictingIntentsForDroppingLatchesEarly`. A request needs to determine if it needs access to the intent history during the scan phase. Typically, this happens if the lock table scan uncovers an intent from the same transaction but at a higher sequence number. However, we may also find intents at lower sequence numbers but at higher timestamps (e.g, the intent was pushed by a conflicting transaction) -- these intents need to be read, and to correctly do that during the scan phase, we need access to the intent history. Note that this only applies to intents at strictly higher timestamps. Previously, we would determine we needed access to the intent history even if the intent history was at the same timestamp as the read request. In practice, this is quite common, which ends up meaning that we default back to the intent interleaving iterator more often than we need to. This patch fixes this behaviour. References cockroachdb#94337 -- see explanation for "what's the deal with `pg_sleep`. Epic: none Release note: None
@arulajmani do you have a timeline for when this issue can be worked on? just wondering since it's still affecting nightly tests here: #95569 |
Thanks for the ping! It's on my plate for this milestone, but realistically, I was hoping to get to it early next week. Sorry for the trouble, I didn't realize this was actively affecting nightly tests. |
Read-only requests that qualify for dropping latches early scan the lock table under latches to perform conflict resolution. If they don't conflict with any concurrent transactions, they drop their latches and scan the MVCC keyspace. However, if a request is trying to read a key that was written to in its transaction, we may need access to the key's intent history when performing the scan. If scanning the lock table uncovers an intent by our own transaction, we need access to the intent history if: 1. The request is reading at a lower sequence number than the found intent's sequence number. 2. OR the request is reading at a strictly lower timestamp than the intent's timestamp. 3. OR the found intent should be ignored because it was written as part of a savepoing which was subsequently rolled back. 4. OR the found intent and read request belong to different txn epochs. Prior to this patch, we weren't accounting for 3 and 4. Worse yet, our determination for 2 was a less than equal to -- which meant we would end up consulting the intent history in the common case. This meant that existing tests which would have exercised 3 and 4 weren't failing on us. Instead of correcting the determination, this patch makes it such that all read-your-own-write transactions consult intent history. This avoids the subtle complexity of getting all these cases right and keeping them in sync with the MVCC code that reads keys. It also seems like scanning the lock table twice (once during conflict resolution and once during the MVCC read) for read-your-own-write transactions in the common case did not cause noticable regressions. We can always optimize this later, if we find there's a need to. Epic: none Fixes: cockroachdb#94337 Release note: None
Read-only requests that qualify for dropping latches early scan the lock table under latches to perform conflict resolution. If they don't conflict with any concurrent transactions, they drop their latches and scan the MVCC keyspace. However, if a request is trying to read a key that was written to in its transaction, we may need access to the key's intent history when performing the scan. If scanning the lock table uncovers an intent by our own transaction, we need access to the intent history if: 1. The request is reading at a lower sequence number than the found intent's sequence number. 2. OR the request is reading at a strictly lower timestamp than the intent's timestamp. 3. OR the found intent should be ignored because it was written as part of a savepoing which was subsequently rolled back. 4. OR the found intent and read request belong to different txn epochs. Prior to this patch, we weren't accounting for 3 and 4. Worse yet, our determination for 2 was a less than equal to -- which meant we would end up consulting the intent history in the common case. This meant that existing tests which would have exercised 3 and 4 weren't failing on us. Instead of correcting the determination, this patch makes it such that all read-your-own-write transactions consult intent history. This avoids the subtle complexity of getting all these cases right and keeping them in sync with the MVCC code that reads keys. It also seems like scanning the lock table twice (once during conflict resolution and once during the MVCC read) for read-your-own-write transactions in the common case did not cause noticable regressions. We can always optimize this later, if we find there's a need to. Epic: none Fixes: cockroachdb#94337 Release note: None
While investigating #93379, @rafiss and I have discovered some very broken behavior with savepoints on
master
.The following logic test fails, but it should succeed (seeing nothing in the final output statement):
The
pg_sleep
statements are important to make this fail, suggesting timing issues.Jira issue: CRDB-22839
The text was updated successfully, but these errors were encountered: