Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storageccl: retry SST chunks with new splits on err #26984

Merged
merged 2 commits into from
Jun 26, 2018

Conversation

dt
Copy link
Member

@dt dt commented Jun 26, 2018

Simpler alternative to #26930.

Closes #26930.

Previously an ImportRequest would fail to add SSTables that spanned the
boundries of the target range(s). This reattempts the AddSSTable call
with re-chunked SSTables that avoid spanning the bounds returned in
range mismatch error. It does this by iterating the SSTable to build and
add smaller sstables for either side of the split.

This error currently happens rarely in practice -- we usually explicitly
split ranges immediately before sending an Import with matching boundsto
them. Usually the empty, just-split range has no reason to split again,
so the Import usually succeeds.

However in some cases, like resuming a prior RESTORE, we may be
re-Importing into ranges that are not empty and could have split at
points other than those picked by the RESTORE statement.

Fixes #17819.

Subsumes #24299.
Closes #24299.

Release note: none.

@dt dt requested review from maddyblue, bobvawter, tbg and a team June 26, 2018 15:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Review status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


Comments from Reviewable

@dt
Copy link
Member Author

dt commented Jun 26, 2018

I should mention the simplicity here does come at a (small?) price: First there's the extra memory of keeping an extra SST in memory while we produce the split ones, rather than going back to the original source and reading from it directly. Additionally, this approach could result in more SSTs, since the approach in #26930 would start a new, full-sized chunk at the split, rather than just the remainder of the rejected chunk, which could be very small, e.g. if the split is very near the end of a chunk.

The former I think is probably OK and as long we we aren't making new ranges based on the chunking here, I think the later is OK too -- a small chunk just means slightly more overhead but shouldn't change the result.

dt added 2 commits June 26, 2018 16:35
Usually we split span requests automatically in DistSender however
AddSSTable requests are special: their large SST file is an opaque
payload that contains keys that could span the post-split requests.

We currently assert during ingestion that the span of an SST's contents
matches the request's span, as the file is ingested whole, without
modification, meaning that currently a split-and-truncated request
fails.

However we if a request will fail, we can reject it as soon as we'd try
to split it, before sending these large, expensive files around. More
importantly, if communicate back to the caller the actual range bounds,
it can try again, generating re-chunked SSTs with bounds that should be
OK.

Note: we could try to make the recipient able to correctly ingest only
the subset of the unsplit SST by iterating over it and generating a new
file containing just those keys in the request span. However this is
significant complexity to add to a raft command evaluation, and changes
the performance characteristics of `AddSSTable` that make it appealing
in the first place.

Release note: none.
Previously an ImportRequest would fail to add SSTables that spanned the
boundries of the target range(s). This reattempts the AddSSTable call
with re-chunked SSTables that avoid spanning the bounds returned in
range mismatch error. It does this by iterating the SSTable to build and
add smaller sstables for either side of the split.

This error currently happens rarely in practice -- we usually explicitly
split ranges immediately before sending an Import with matching boundsto
them. Usually the empty, just-split range has no reason to split again,
so the Import usually succeeds.

However in some cases, like resuming a prior RESTORE, we may be
re-Importing into ranges that are *not* empty and could have split at
points other than those picked by the RESTORE statement.

Fixes cockroachdb#17819.

Release note: none.
@dt
Copy link
Member Author

dt commented Jun 26, 2018

bors r+

craig bot pushed a commit that referenced this pull request Jun 26, 2018
26811: kv: update the TCS's txn on requests way out r=andreimatei a=andreimatei

The TxnCoordSender maintains a copy of the transaction record, used for
things like heartbeating and creating new transactions after a
TransactionAbortedError. This copy is supposed to be kept in sync with
the client.Txn's copy. Before this patch, the syncing was done by
updating the TCS's txn when a response comes back from a request.
This patch moves to updating the TCS's txn on a request's way out, in
addition to continuing to update it when a request comes back.
Besides being the sane thing to do™, this assures that, if the heartbeat
loop triggers before the response to the BeginTransaction's batch comes
back, the transaction already has the key set. Without this patch, if
the heartbeat loop triggered before the BeginTxn response, it would
heartbeat key /Min, which is non-sensical (and creating load on range 0
for TPCC loadtests).

Release note: None

26881: importccl: restart IMPORT on worker node failure r=mjibson a=mjibson

Attempt to detect a context canceled error in IMPORT which is caused by
a node going away in the dist SQL run. Send a special error back to the
job registry indicating a restart should happen instead of a failure.

We are shipping this with a skipped test because it is flakey. We are
ok doing that because it is still better than what we had before in many
cases, just not all. We will work to improve the other things so that we
can correctly detect when IMPORT can be restarted due to a node outage,
which will allow us to unskip this test.

Fixes #25866
Fixes #25480

Release note (bug fix): IMPORT now detects node failure and will restart
instead of fail.

26968: settings: bump minimum supported version to v2.0 r=nvanbenschoten a=nvanbenschoten

We're currently shipping v2.1 alphas, so enforce a minimum binary
version of v2.0. This ensures that no one can upgrade directly from
v1.1 to v2.1. Instead, they need to make a pit sop in v2.0.

Release note: None

26984: storageccl: retry SST chunks with new splits on err r=dt a=dt

Simpler alternative to #26930.

Closes #26930.

Previously an ImportRequest would fail to add SSTables that spanned the
boundries of the target range(s). This reattempts the AddSSTable call
with re-chunked SSTables that avoid spanning the bounds returned in
range mismatch error. It does this by iterating the SSTable to build and
add smaller sstables for either side of the split.

This error currently happens rarely in practice -- we usually explicitly
split ranges immediately before sending an Import with matching boundsto
them. Usually the empty, just-split range has no reason to split again,
so the Import usually succeeds.

However in some cases, like resuming a prior RESTORE, we may be
re-Importing into ranges that are *not* empty and could have split at
points other than those picked by the RESTORE statement.

Fixes #17819.

Subsumes #24299.
Closes #24299.

Release note: none.

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Matt Jibson <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: David Taylor <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jun 26, 2018

Build succeeded

@craig craig bot merged commit d4d2d29 into cockroachdb:master Jun 26, 2018
@dt dt deleted the resplit branch June 26, 2018 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants