Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup/restore: revisit various rate limitings #14798

Closed
danhhz opened this issue Apr 11, 2017 · 0 comments
Closed

backup/restore: revisit various rate limitings #14798

danhhz opened this issue Apr 11, 2017 · 0 comments
Assignees
Milestone

Comments

@danhhz
Copy link
Contributor

danhhz commented Apr 11, 2017

In order to get BACKUP and RESTORE stabilized for 1.0, we had to add a few rate limits. It feels a bit like papering over the issues, so these should be revisited and better solutions found post-1.0.

Specifically:

  • Sending AdminSplit in presplitRanges for RESTORE
  • Sending Export and Import requests in BACKUP/RESTORE
  • Sending WriteBatch requests in Import
@danhhz danhhz added this to the Later milestone Apr 11, 2017
@danhhz danhhz self-assigned this Apr 11, 2017
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 11, 2017
We're already limiting these on the server-side, but the BACKUP/RESTORE
gateway would fill up its distsender/grpc/something and cause all sorts
of badness (node liveness timeouts leading to mass leaseholder
transfers, poor performance on SQL workloads, etc) as well as log spam
about slow distsender requests.

There is likely some better fix post 1.0, this is being tracked in cockroachdb#14798.

For cockroachdb#14792.
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 12, 2017
We're already limiting these on the server-side, but the BACKUP/RESTORE
gateway would fill up its distsender/grpc/something and cause all sorts
of badness (node liveness timeouts leading to mass leaseholder
transfers, poor performance on SQL workloads, etc) as well as log spam
about slow distsender requests.

There is likely some better fix post 1.0, this is being tracked in cockroachdb#14798.

For cockroachdb#14792.
danhhz added a commit to danhhz/cockroach that referenced this issue May 1, 2017
When syncing after every raft commit is on (the default recently
switched to true in 801f3a0) then presplits on sufficiently large
restores were behaving badly.

I suspect this was overloading the disk, slowing down rocksdb commits
(similar to the underlying cause addressed by our recent Import rate
limiting changes). Unfortunately, this is not entirely confirmed by our
metrics. Disk iops and usage were slightly elevated during the bad
periods, but not really beyond what happens in a healthy cluster with
block_writer traffic.

However, reducing this constant from 100 to 20 clearly fixes the split
badness in the 2tb restore acceptance test (while making the splits take
8 minutes, up from ~3). This is unfortunate, as it's possible to go
faster in other circumstances, but there's already an issue to revisit
this rate limiting post 1.0 (cockroachdb#14798).
danhhz added a commit to danhhz/cockroach that referenced this issue Jul 21, 2017
This was largely motivated by how long it takes to presplit in very
large restores. Doing them all upfront required rate-limiting, but no
number was well-tuned for every cluster. Additionally, even at a high
presplit rate of 100, a 10 TB cluster would take 52 minutes before it
even started scattering.

Now, one goroutine iterates through every span being imported,
presplitting and scattering before moving on to the next one. Upon
split+scatter, the span is sent into a buffered channel read by the
Import goroutines, which prevents it from getting too far ahead of the
Imports. This both acts as a natural rate limiter for the splits as well
as bounds the number of empty ranges created if a RESTORE fails or is
cancelled.

Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster.

Since each range is now scattered individually, we no longer need the
jitter in the scatter implementation (plus it now slows down the
RESTORE), so it's removed.

Restore really needs a refactor, but I'm going to be making a couple
more changes leading up to 1.1 so I'll leave cleanup until after they go
in.

This removes most tunable constants in RESTORE and the remaining ones
are defined in terms of the number of nodes in the cluster and the
number of cpus on a node, so this:

Closes cockroachdb#14798.
danhhz added a commit to danhhz/cockroach that referenced this issue Jul 25, 2017
This was largely motivated by how long it takes to presplit in very
large restores. Doing them all upfront required rate-limiting, but no
number was well-tuned for every cluster. Additionally, even at a high
presplit rate of 100, a 10 TB cluster would take 52 minutes before it
even started scattering.

Now, one goroutine iterates through every span being imported,
presplitting and scattering before moving on to the next one. Upon
split+scatter, the span is sent into a buffered channel read by the
Import goroutines, which prevents it from getting too far ahead of the
Imports. This both acts as a natural rate limiter for the splits as well
as bounds the number of empty ranges created if a RESTORE fails or is
cancelled.

Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster.

Since each range is now scattered individually, we no longer need the
jitter in the scatter implementation (plus it now slows down the
RESTORE), so it's removed.

Restore really needs a refactor, but I'm going to be making a couple
more changes leading up to 1.1 so I'll leave cleanup until after they go
in.

This removes most tunable constants in RESTORE and the remaining ones
are defined in terms of the number of nodes in the cluster and the
number of cpus on a node, so this:

Closes cockroachdb#14798.
danhhz added a commit to danhhz/cockroach that referenced this issue Jul 25, 2017
This was largely motivated by how long it takes to presplit in very
large restores. Doing them all upfront required rate-limiting, but no
number was well-tuned for every cluster. Additionally, even at a high
presplit rate of 100, a 10 TB cluster would take 52 minutes before it
even started scattering.

Now, one goroutine iterates through every span being imported,
presplitting and scattering before moving on to the next one. Upon
split+scatter, the span is sent into a buffered channel read by the
Import goroutines, which prevents it from getting too far ahead of the
Imports. This both acts as a natural rate limiter for the splits as well
as bounds the number of empty ranges created if a RESTORE fails or is
cancelled.

Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster.

Since each range is now scattered individually, we no longer need the
jitter in the scatter implementation (plus it now slows down the
RESTORE), so it's removed.

Restore really needs a refactor, but I'm going to be making a couple
more changes leading up to 1.1 so I'll leave cleanup until after they go
in.

This removes most tunable constants in RESTORE and the remaining ones
are defined in terms of the number of nodes in the cluster and the
number of cpus on a node, so this:

Closes cockroachdb#14798.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant