-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: transitive prerequisites are not always transferred on command cancellation #16266
Comments
Last time we saw this error was #10501 |
I'm seeing this on
Cc @tamird |
Have the meta ranges split yet? It's possible we're running into #2266 on |
@bdarnell The above happened after 2000 splits. Pretty sure the meta ranges have not split yet. Yesterday I had sky up to 400k ranges (via sized-based splits). |
This happened again on
The raw printing of the meta key makes debugging a bit difficult. I'm going to try and reproduce again with #17540 in place. |
Translating that key reveals it to be |
Note that there are only 2 keys above. If we look at the 2 most recent versions of the keys we see:
The key we're looking for (
Notice that the gap is gone. The end key of the first range descriptor is the start key of the second range descriptor. Still trying to piece together what is going on here. Seems like there was a split at |
Here is the split:
The keys here exactly match the |
If the theory is that we're simply losing the write (not unheard of in past anomalies), and you can "quickly" repro this issue, then you could add the following logging:
|
This reminds me of #9265 (comment). That led to #9399 which proposed introducing additional assertions to MVCCResolveWriteIntent, although in the end we decided not to do that. What are the bounds of r272822? Doesn't the fact that we have meta2 ranges on a range other than r1 mean that meta2 has split? |
The bounds of r272822 are |
Seems like an assertion in |
The span for
Notice that the meta key we were looking up is exactly equal to the end key of the last range descriptor in this meta2 range. I'm going to try and track down the next meta2 range to see what it contains. |
And another instance:
The next meta range is
Notice that the meta key aligns with the start key of Cc @nvanbenschoten whom @tschottdorf says has some understanding of this. Note that the debugging I did in this comment indicated a different issue as the desired meta key was definitely contained within the meta range, not at the end. |
PS Seems easy to write a test which splits meta2 and verify that we don't properly handle lookup correctly at the end of the first meta2 range. |
@petermattis if I'm understanding your last crash correctly, it looks like the meta2 splits are misaligned with the userspace splits, which is why you suggest
I'd expect that we would make sure to keep the split boundaries between the different meta layers aligned, but a quick search doesn't lead me to believe that we do that. For instance, we wouldn't have this issue if |
@nvanbenschoten I'm not sure your proposed solution works since every userspace range start key is the end key of the previous range. If It's interesting that a In the short term, I'll send a patch to disable splitting of the meta2 range as we clearly don't handle this correctly. |
I changed my mind. Changing meta split keys as |
Force meta ranges to split at metaKey.Next(). This is necessary to allow DistSender to direct RangeLookup requests to the correct meta range. Consider the following scenario that can occur with the prior split key selection for meta ranges: range 1 [a, e): b -> [a, b) c -> [b, c) d -> [c, d) range 2 [e, g): e -> [d, e) f -> [e, f) g -> [f, g) Now consider looking up the range containing key `d`. The DistSender routing logic would send the RangeLookup request to range 1 since `d` lies within the bounds of that range. But notice that the range descriptor containing `d` lies in range 2. Boom! The root problem here is that the split key chosen for meta ranges is an actual key in the range. What we want to do is ensure that the bounds of range descriptors within a meta range are entirely contained within the bounds of the meta range. We can accomplish this by splitting on metaKey.Next(). This would transform the above example into: range 1 [a, d\0): b -> [a, b) c -> [b, c) d -> [c, d\0) range 2 [d\0, g\0): e -> [d, e) f -> [e, f) g -> [f, g) Truncate range lookup scans to the range. Fixes cockroachdb#2266 Fixes cockroachdb#16266
I'm not sure I follow. Why wouldn't/shouldn't the range |
Aren't the range descriptors stored at the range's start key, not their end key? So in your example, we'd have:
|
They're stored twice: the range-local copy is stored under the start key, and the meta copy is stored under the end key. |
Ah you're right, the covering optimization is disabled in the local command queue. However, I don't actually think this issue requires us to use the covering optimization, I think it just requires that we treat a parent |
Here's what I see poring through the logs:
The first PushTxn is assigned ids 12 and 13 in the local command queue. There are already two operations ahead of it: IDs 7 and 8 are from a write to the (new RHS) range descriptor, and 9 and 10 look like a heartbeat for this transaction. (In "waiting for 3 requests", the heartbeat counts double because its reads and writes are counted separately)
By the time EndTxn comes around, we see a second heartbeat, with ids 14 and 15, then the EndTxn's writes get ids 22-29 (the reads are missing here but show up later). All of the things that the push was waiting on have cleared the command queue; it should be able to proceed now. This is waiting on the push and the second heartbeat (2x); I think to get to 5 commands the first heartbeat had to finish in between the prereq computation and this log message.
And now we see the same goroutine entering the command queue for a PushTxn again (I'll call this second incarnation of the first push Push1.2). I believe this means that after the preceding log block, the PushTxn was allowed to proceed, but hit a retryable error somewhere before it was proposed to raft. I'm not sure what that error could be, but it appears to have done the right thing in the command queue: the first incarnation of the command has been removed (so the 14-15 heartbeat should be executing now). The new push has ids 31-32. We also see IDs 17-22 showing up with the read-only portion of the EndTransaction. They weren't there before (maybe the covering optimization at work?) The 2 commands Push1.2 is waiting on are the read and write sides of the EndTransaction.
And now we get to Push2, with IDs 34-35. It's hard to tell from this which 2 commands it's waiting on. One is definitely Push1.2. But what's the other? The candidates are the second heartbeat or the EndTransaction (each of which have both read and write halves). Based on what happens next I think it has to be the heartbeat (probably the read half due to some of the timestamp-related logic in the command queue).
And now Push1.2 is canceled, and both EndTxn and Push2 are allowed to proceed (we can infer from this that Heartbeat2 also finished). So the full sequence of commands is |
It's worse than that: we're only setting We need to better encapsulate the This makes me nervous about our test coverage here. Should we fix this or consider reverting #9448 until post 1.1? |
Exactly, so right now we aren't properly treating cancellation of |
That's not quite accurate. The key is that this will affect all |
Right, but this is true of all PushTxn and EndTxn requests so all usage patterns except single KV ops can be affected by this. |
Now that it looks like we fixed the consistency issue here, the rest of this issue can be moved to 1.2, right? |
Yeah. Or even better, retitle this to the consistency issue that took it over, close it, and make a new issue to lift the restriction in 1.2. |
Default the .meta zone config to 1h GC TTL. The shorter GC TTL reflects the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta ranges smaller. See cockroachdb#16266 See cockroachdb#14990
Default the .meta zone config to 1h GC TTL and default the .liveness zone config to 1m GC TTL. The shorter GC TTLs reflect the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta and liveness ranges smaller. See cockroachdb#16266 See cockroachdb#14990
Default the .meta zone config to 1h GC TTL and default the .liveness zone config to 10m GC TTL. The shorter GC TTLs reflect the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta and liveness ranges smaller. See cockroachdb#16266 See cockroachdb#14990
Default the .meta zone config to 1h GC TTL and default the .liveness zone config to 10m GC TTL. The shorter GC TTLs reflect the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta and liveness ranges smaller. See cockroachdb#16266 See cockroachdb#14990
Default the .meta zone config to 1h GC TTL and default the .liveness zone config to 10m GC TTL. The shorter GC TTLs reflect the lack of need for ever performing historical queries on these ranges coupled with the desire to keep the meta and liveness ranges smaller. See cockroachdb#16266 See cockroachdb#14990 Release note (general change): Clusters are now initialized with default .meta and .liveness zones with lower GC TTL configurations.
https://sentry.io/cockroach-labs/cockroachdb/issues/284616795/
The text was updated successfully, but these errors were encountered: