libroach: fix excessive compactions performed by DBCompactRange #26355

petermattis · 2018-06-04T13:57:03Z

Fix excessive compactions from DBCompactRange due to mishandling of
the first and last ranges to compact. When a non-empty start or end key
is specified, DBCompactRange was previously calling
rocksdb::DB::CompactRange with a null start/end key resulting in
compacting from the beginning (or to the end) of the entire key space.

See #24029

cockroach-teamcity · 2018-06-04T13:57:11Z

This change is

petermattis · 2018-06-04T13:58:03Z

This needs to be back-ported to 2.0.x.

tbg · 2018-06-04T14:10:19Z

Ouch, good find. Definitely a simple unit test for the "partitioner" logic would've caught this. Can you be bothered to add it now?

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks pending.

Comments from Reviewable

petermattis · 2018-06-04T14:29:00Z

Ouch, good find. Definitely a simple unit test for the "partitioner" logic would've caught this. Can you be bothered to add it now?

Now that we have C++ unit tests this should be easier to do. Let me take a look.

Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

bdarnell · 2018-06-04T15:33:02Z

Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

Fix excessive compactions from `DBCompactRange` due to mishandling of the first and last ranges to compact. When a non-empty start or end key is specified, DBCompactRange was previously calling `rocksdb::DB::CompactRange` with a `null` start/end key resulting in compacting from the beginning (or to the end) of the entire key space. See cockroachdb#24029

petermattis · 2018-06-04T17:55:08Z

Refactored so that the core logic can be tested. PTAL.

Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

bdarnell · 2018-06-04T18:47:12Z

LGTM

Review status: 0 of 3 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

petermattis · 2018-06-04T18:52:55Z

bors r=bdarnell,tschottdorf

craig · 2018-06-04T19:05:42Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2018-06-04T20:01:11Z

Build failed (retrying...)

GitHub CI (Cockroach)

24589: distsqlrun: don't allocate between fused processors r=jordanlewis a=jordanlewis distsqlrun: don't allocate between fused processors Previously, `ProcOutputHelper.ProcessRow` (and, by extension, all `RowSource.Next` implementations) always allocated a fresh `EncDatumRow`. This was wasteful - not every processor needs to be able to hold a reference to the output of `RowSource.Next`. Now, `ProcessRow` never allocates a fresh `EncDatumRow`, and the contract of `RowSource.Next` has been changed to say that it's not valid to hang on to a row returned by `Next` past the next call to `Next`. Processors that need to hold on to a row from their upstreams have been modified to make an explicit copy to achieve this safely. Finally, a new `copyingRowReceiver` is introduced that makes a copy of every row that is `Push`'d to it. A `copyingRowReceiver` is inserted before every router, since routers all expect that their inputs will be immutable. This preserves the safety of sending outputs of `RowSource.Next`, which aren't safe to hold on to, to routers, which expect rows that *are* safe to hold on to. Release note: None Fixes #22462. Fixes #24452. 26355: libroach: fix excessive compactions performed by DBCompactRange r=bdarnell,tschottdorf a=petermattis Fix excessive compactions from `DBCompactRange` due to mishandling of the first and last ranges to compact. When a non-empty start or end key is specified, DBCompactRange was previously calling `rocksdb::DB::CompactRange` with a `null` start/end key resulting in compacting from the beginning (or to the end) of the entire key space. See #24029 Co-authored-by: Jordan Lewis <[email protected]> Co-authored-by: Peter Mattis <[email protected]>

craig · 2018-06-04T20:46:48Z

Build succeeded

GitHub CI (Cockroach)

26403: release-2.0: libroach: fix excessive compactions performed by DBCompactRange r=bdarnell,benesch a=petermattis Backport 1/1 commits from #26355. /cc @cockroachdb/release --- Fix excessive compactions from `DBCompactRange` due to mishandling of the first and last ranges to compact. When a non-empty start or end key is specified, DBCompactRange was previously calling `rocksdb::DB::CompactRange` with a `null` start/end key resulting in compacting from the beginning (or to the end) of the entire key space. See #24029 Co-authored-by: Peter Mattis <[email protected]>

Now that DBCompactRange no longer attempts to compact the entire database (cockroachdb#26355), serializing ClearRange requests is enough to prevent a large DROP TABLE from bricking a cluster. They're slow enough that the compaction queue can keep up and purge range deletion tombstones before enough pile up to wedge the cluster. TODO(benesch): verify whether actually need to turn down the compaction queue threshold from 2m. This is a partial fix for cockroachdb#24029.

Now that DBCompactRange no longer attempts to compact the entire database (cockroachdb#26355), sending ClearRange requests in sequential batches of 50is enough to prevent a large DROP TABLE from bricking a cluster. They're slow enough that the compaction queue can keep up and purge range deletion tombstones before enough pile up to wedge the cluster. This is a partial fix for cockroachdb#24029. Release note (bug fix): The performance impact of dropping a large table has been substantially reduced.

26615: release-2.0: sql,compactor: rate limit clear range requests r=bdarnell,petermattis a=benesch Backports #26449. I'm running a clearrange roachtest with this diff applied tonight. If it passes we're good to go. ```diff diff --git a/pkg/cmd/roachtest/clearrange.go b/pkg/cmd/roachtest/clearrange.go index ea5bcdff8..2b244af6d 100644 --- a/pkg/cmd/roachtest/clearrange.go +++ b/pkg/cmd/roachtest/clearrange.go @@ -30,19 +30,9 @@ func registerClearRange(r *registry) { // thoroughly brick the cluster. Stable: false, Run: func(ctx context.Context, t *test, c *cluster) { - t.Status(`downloading store dumps`) - // Created via: - // roachtest --cockroach cockroach-v2.0.1 store-gen --stores=10 bank \ - // --payload-bytes=10240 --ranges=0 --rows=65104166 - fixtureURL := `gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1` - location := storeDirURL(fixtureURL, c.nodes, "2.0") + t.Status(`waiting for compactions to disappear`) + time.Sleep(90 * time.Minute) - // Download this store dump, which measures around 2TB (across all nodes). - if err := downloadStoreDumps(ctx, c, location, c.nodes); err != nil { - t.Fatal(err) - } - - c.Put(ctx, cockroach, "./cockroach") c.Start(ctx) // Also restore a much smaller table. We'll use it to run queries against @@ -81,7 +71,7 @@ func registerClearRange(r *registry) { // above didn't brick the cluster. // // Don't lower this number, or the test may pass erroneously. - const minutes = 60 + const minutes = 120 t.WorkerStatus("repeatedly running COUNT(*) on small table") for i := 0; i < minutes; i++ { after := time.After(time.Minute) ``` --- Now that DBCompactRange no longer attempts to compact the entire database (#26355), sending ClearRange requests in sequential batches of 50is enough to prevent a large DROP TABLE from bricking a cluster. They're slow enough that the compaction queue can keep up and purge range deletion tombstones before enough pile up to wedge the cluster. This is a partial fix for #24029. Release note (bug fix): The performance impact of dropping a large table has been substantially reduced. Co-authored-by: Nikhil Benesch <[email protected]>

26449: sql,compactor: rate limit clear range requests r=petermattis,bdarnell a=benesch Now that DBCompactRange no longer attempts to compact the entire database (#26355), serializing ClearRange requests is enough to prevent a large DROP TABLE from bricking a cluster. They're slow enough that the compaction queue can keep up and purge range deletion tombstones before enough pile up to wedge the cluster. This hopefully supersedes #26372. It is much more surgical in approach. TODO(benesch): verify whether actually need to turn down the compaction queue threshold from 2m. This is a partial fix for #24029. Here's a screenshot of this PR in action: <img width="986" alt="screen shot 2018-06-06 at 12 38 17 am" src="https://user-images.githubusercontent.com/882976/41017238-125718f4-6922-11e8-99f6-dc870618f446.png"> Note that this screenshot was generated from a slightly older version of this patch. I'm rerunning tonight with exactly this patch to verify that I didn't break anything. Co-authored-by: Nikhil Benesch <[email protected]>

petermattis requested a review from a team June 4, 2018 13:57

petermattis requested review from bdarnell and tbg June 4, 2018 13:57

petermattis force-pushed the pmattis/libroach-compact-range branch from e8f8cbe to e2f099f Compare June 4, 2018 17:54

craig bot merged commit e2f099f into cockroachdb:master Jun 4, 2018

petermattis deleted the pmattis/libroach-compact-range branch June 4, 2018 20:48

petermattis mentioned this pull request Jun 4, 2018

release-2.0: libroach: fix excessive compactions performed by DBCompactRange #26403

Merged

tbg mentioned this pull request Jun 5, 2018

stability: liveness failures during rebalancing #25512

Closed

benesch mentioned this pull request Jun 6, 2018

sql,compactor: rate limit clear range requests #26449

Merged

benesch mentioned this pull request Jun 12, 2018

release-2.0: sql,compactor: rate limit clear range requests #26615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libroach: fix excessive compactions performed by DBCompactRange #26355

libroach: fix excessive compactions performed by DBCompactRange #26355

petermattis commented Jun 4, 2018

cockroach-teamcity commented Jun 4, 2018

petermattis commented Jun 4, 2018

tbg commented Jun 4, 2018

petermattis commented Jun 4, 2018

bdarnell commented Jun 4, 2018

petermattis commented Jun 4, 2018

bdarnell commented Jun 4, 2018

petermattis commented Jun 4, 2018

craig bot commented Jun 4, 2018

craig bot commented Jun 4, 2018

craig bot commented Jun 4, 2018

libroach: fix excessive compactions performed by DBCompactRange #26355

libroach: fix excessive compactions performed by DBCompactRange #26355

Conversation

petermattis commented Jun 4, 2018

cockroach-teamcity commented Jun 4, 2018

petermattis commented Jun 4, 2018

tbg commented Jun 4, 2018

petermattis commented Jun 4, 2018

bdarnell commented Jun 4, 2018

petermattis commented Jun 4, 2018

bdarnell commented Jun 4, 2018

petermattis commented Jun 4, 2018

craig bot commented Jun 4, 2018

Build failed (retrying...)

craig bot commented Jun 4, 2018

Build failed (retrying...)

craig bot commented Jun 4, 2018

Build succeeded