-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: declare lock spans for AddSSTable
#71676
Conversation
Thanks! One question about the release note: I think this crash is only possible in the case of an |
@dt I think part of why this was missed was because If that is the case, then this change LGTM. |
True, I've looked it over and it seems to mostly affect cockroach/pkg/sql/rowexec/bulk_row_writer.go Lines 141 to 143 in 5608424
|
6e630ae
to
6a69c8e
Compare
I suppose when disallowshadowing is false, we don't strictly need the current I wonder if IMPORT INTO should instead have been scanning the whole table, after it went offline, to create a timestamp at which it knows there are no intents and cannot be any later (thanks to tscache). |
4a3d22d
to
335c1c6
Compare
Ok, I've updated the PR to only declare lock spans when
That might make sense. The current approach also has the downside of returning a Let's take this into account for the MVCCification work on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go, line 44 at r1 (raw file):
args := req.(*roachpb.AddSSTableRequest) // DisallowShadowing may encounter intents while checking for key collisions, // particularly in the case of IMPORT INTO.
This comment should speak more towards the intended behavior of AddSSTable
with and without DisallowShadowing
. Why is one able to ignore conflicting writes/intents while the other is not? And how do we want an AddSSTable
that has DisallowShadowing
set to true to interact with conflicting intents?
Also, the effect of this is that we will now begin waiting in the lock table for any intent that overlaps the AddSSTableRequest's key span, not just that collide with the individual keys in the sstable. Is this ok @dt? Could this lead to starvation where there are always overlapping intents at higher and higher timestamps?
This is leading me to question whether this is the right fix or whether the current interaction between AddSStableRequest
and conflicting intents makes sense. If we addressed #71697, could we then treat intents the same way that we treat other values in checkForKeyCollisionsGo
, throwing a hard error if they exist at lower timestamps and ignoring them if they exist at higher timestamps?
It's very unclear to me what invariants are being guaranteed by the layer above KV, and which subset of those we want to check even if there is a guarantee, and which are too expensive to check (I am hoping the latter set is the empty set, since checking the separated lock table is cheap). I think I am basically echoing what @nvanbenschoten said in his previous comment with "This comment should speak more ...". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @erikgrinaker)
pkg/kv/kvserver/batcheval/cmd_add_sstable.go, line 44 at r1 (raw file):
Will look into the current AddSSTable semantics and guarantees, and write up a comment summarizing them.
This is leading me to question whether this is the right fix or whether the current interaction between AddSStableRequest and conflicting intents makes sense. If we addressed #71697, could we then treat intents the same way that we treat other values in checkForKeyCollisionsGo, throwing a hard error if they exist at lower timestamps and ignoring them if they exist at higher timestamps?
That sounds reasonable to me, given the current behavior. We'd still have to resolve the intents at a higher level though, to get the updated write timestamp, right? Which would involve returning a WriteIntentError in any case?
For MVCC-compliant AddSSTable we'd have to treat the intents as any other 1PC write would. But we'd still support a non-MVCC AddSSTable for e.g. streaming replication.
Ok, this is a pretty hefty brain-dump, mostly for my own sake. I'm going to condense it down to a comment in this PR, unless anyone spots any glaring flaws. Will probably flesh this out as an RFC or tech note for MVCC-compliant Tl;dr I think the current solution (declaring lock spans when @nvanbenschoten's suggestion above doesn't seem right, because we want to error on any conflicting keys regardless of whether they are past or future, and we need to resolve the intent to see if it was committed or aborted. When General
|
Operation | DisallowShadowing | Timestamp | Isolation |
---|---|---|---|
Imports | true [2,3] |
Now | Offline table |
CREATE TABLE AS SELECT | true |
Read TS? | Table descriptor |
Materialized views | true |
Read TS | Table descriptor |
Index backfills | false |
Now | Index descriptor |
Restore (backup) | true |
Key TS | Table descriptor? |
Streaming replication | false |
Key TS | Offline tenant |
From this, we see that only imports are vulnerable to existing intents, since
all other callers with DisallowShadowing: true
rely on SQL descriptors for
isolation, and no other SQL transaction can have interacted with a span owned by
a descriptor that has not yet been created.
Similarly, we see that declaring lock spans for DisallowShadowing: true
should
not conflict with active/future transactions, since the entities are either
offline or the descriptors do not exist yet. Notably, index backfills allow
shadowing, and so will not declare lock spans, thus will not prevent concurrent
index writes.
Furthermore, with the new incremental index backfill approach under development,
only backup restoration and streaming replication will have a potential need to
use non-MVCC AddSSTable
. All other operations should be able to use
DisallowShadowing: true
and write at the batch timestamp, while declaring lock
spans without interfering with concurrent transactions.
335c1c6
to
19edb7b
Compare
Updated the PR with a comment summarizing |
c0c3006
to
81d755d
Compare
81d755d
to
39d28d7
Compare
Thanks for the detailed analysis Erik! Everything you said above sounds correct, including your proposed next steps.
Agreed. I had read the logic in
This does seem like the right approach if we want to avoid unnecessary head-of-line blocking for conflicting concurrent transactions. In theory, it may also be important to avoid starvation whereby the
This probably isn't a conversation for this PR, but @sumeerbhola and I were talking yesterday and were surprised that this doesn't lead to user-perceived latency during index builds. If an AddSSTable request is acquiring a wide write latch, it's going to block any other writer that overlaps with its span (even those with no logical overlap) for it's entire Raft pass, which could be quite expensive. Do we see this show up in the foreground performance of writes to a table during an index backfill? Is this a concern? EDIT: he just reminded me that we plan to do something about this in the future by performing the bulk of the backfill of new indexes while it is fully offline.
By this, do you mean that instead of throwing an error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker)
I see, I wasn't aware that range laches were treated differently here -- I thought these wait-queues were always FIFO. That's worth keeping in mind. I still think a range latch is the right call here, considering they are mostly used in empty key spans. These SSTs can contain in excess of 1 million keys, and I would expect taking out that many point latches to have a significant cost. If starvation is shown to be problematic, we can expose a request parameter to control this behavior.
Yes, this work is already underway -- see the RFC. Your point still seems valid for the current index backfills though. I'm not intimately familiar with how they are implemented, but perhaps @dt happens to know otherwise. But it doesn't appear too important since we're overhauling this anyway.
This wasn't thought all the way through. We should introduce a parameter that makes Is there anything else outstanding for this PR then @nvanbenschoten, or is it good to merge? |
This looks good to merge. |
Although, I do think we should ask the question of why we never hit this crash in a test. That indicates that we never exercise the code path of an |
39d28d7
to
2572ada
Compare
Yep, added a test for it now. I'm about to start work on an MVCC-compliant |
`AddSSTable` did not declare lock spans, even though it can return `WriteIntentError` when encountering unresolved intents (e.g. when checking for key collisions with `DisallowShadowing` set). This would cause the concurrency manager to error with e.g.: ``` cannot handle WriteIntentError ‹conflicting intents on /Table/84/1/99/49714/0› for request without lockTableGuard; were lock spans declared for this request? ``` This patch makes `AddSSTable` take out lock spans via `DefaultDeclareIsolatedKeys` if `DisallowShadowing` is set. This will automatically handle any unresolved intents via the concurrency manager. Release note (bug fix): `IMPORT INTO` no longer crashes when encountering unresolved write intents.
2572ada
to
d82f194
Compare
bors r=nvanbenschoten,dt,aayushshah15 |
Build succeeded: |
AddSSTable
did not declare lock spans, even though it can returnWriteIntentError
when encountering unresolved intents (e.g. whenchecking for key collisions with
DisallowShadowing
set). This wouldcause the concurrency manager to error with e.g.:
This patch makes
AddSSTable
take out lock spans viaDefaultDeclareIsolatedKeys
ifDisallowShadowing
is set. This willautomatically handle any unresolved intents via the concurrency manager.
Release note (bug fix):
IMPORT INTO
no longer crashes when encounteringunresolved write intents.