-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup/restore: revisit various rate limitings #14798
Milestone
Comments
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Apr 11, 2017
We're already limiting these on the server-side, but the BACKUP/RESTORE gateway would fill up its distsender/grpc/something and cause all sorts of badness (node liveness timeouts leading to mass leaseholder transfers, poor performance on SQL workloads, etc) as well as log spam about slow distsender requests. There is likely some better fix post 1.0, this is being tracked in cockroachdb#14798. For cockroachdb#14792.
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Apr 12, 2017
We're already limiting these on the server-side, but the BACKUP/RESTORE gateway would fill up its distsender/grpc/something and cause all sorts of badness (node liveness timeouts leading to mass leaseholder transfers, poor performance on SQL workloads, etc) as well as log spam about slow distsender requests. There is likely some better fix post 1.0, this is being tracked in cockroachdb#14798. For cockroachdb#14792.
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
May 1, 2017
When syncing after every raft commit is on (the default recently switched to true in 801f3a0) then presplits on sufficiently large restores were behaving badly. I suspect this was overloading the disk, slowing down rocksdb commits (similar to the underlying cause addressed by our recent Import rate limiting changes). Unfortunately, this is not entirely confirmed by our metrics. Disk iops and usage were slightly elevated during the bad periods, but not really beyond what happens in a healthy cluster with block_writer traffic. However, reducing this constant from 100 to 20 clearly fixes the split badness in the 2tb restore acceptance test (while making the splits take 8 minutes, up from ~3). This is unfortunate, as it's possible to go faster in other circumstances, but there's already an issue to revisit this rate limiting post 1.0 (cockroachdb#14798).
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Jul 21, 2017
This was largely motivated by how long it takes to presplit in very large restores. Doing them all upfront required rate-limiting, but no number was well-tuned for every cluster. Additionally, even at a high presplit rate of 100, a 10 TB cluster would take 52 minutes before it even started scattering. Now, one goroutine iterates through every span being imported, presplitting and scattering before moving on to the next one. Upon split+scatter, the span is sent into a buffered channel read by the Import goroutines, which prevents it from getting too far ahead of the Imports. This both acts as a natural rate limiter for the splits as well as bounds the number of empty ranges created if a RESTORE fails or is cancelled. Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster. Since each range is now scattered individually, we no longer need the jitter in the scatter implementation (plus it now slows down the RESTORE), so it's removed. Restore really needs a refactor, but I'm going to be making a couple more changes leading up to 1.1 so I'll leave cleanup until after they go in. This removes most tunable constants in RESTORE and the remaining ones are defined in terms of the number of nodes in the cluster and the number of cpus on a node, so this: Closes cockroachdb#14798.
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Jul 25, 2017
This was largely motivated by how long it takes to presplit in very large restores. Doing them all upfront required rate-limiting, but no number was well-tuned for every cluster. Additionally, even at a high presplit rate of 100, a 10 TB cluster would take 52 minutes before it even started scattering. Now, one goroutine iterates through every span being imported, presplitting and scattering before moving on to the next one. Upon split+scatter, the span is sent into a buffered channel read by the Import goroutines, which prevents it from getting too far ahead of the Imports. This both acts as a natural rate limiter for the splits as well as bounds the number of empty ranges created if a RESTORE fails or is cancelled. Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster. Since each range is now scattered individually, we no longer need the jitter in the scatter implementation (plus it now slows down the RESTORE), so it's removed. Restore really needs a refactor, but I'm going to be making a couple more changes leading up to 1.1 so I'll leave cleanup until after they go in. This removes most tunable constants in RESTORE and the remaining ones are defined in terms of the number of nodes in the cluster and the number of cpus on a node, so this: Closes cockroachdb#14798.
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Jul 25, 2017
This was largely motivated by how long it takes to presplit in very large restores. Doing them all upfront required rate-limiting, but no number was well-tuned for every cluster. Additionally, even at a high presplit rate of 100, a 10 TB cluster would take 52 minutes before it even started scattering. Now, one goroutine iterates through every span being imported, presplitting and scattering before moving on to the next one. Upon split+scatter, the span is sent into a buffered channel read by the Import goroutines, which prevents it from getting too far ahead of the Imports. This both acts as a natural rate limiter for the splits as well as bounds the number of empty ranges created if a RESTORE fails or is cancelled. Overall tpch-10 RESTORE time remains 12:30 on a 4-node cluster. Since each range is now scattered individually, we no longer need the jitter in the scatter implementation (plus it now slows down the RESTORE), so it's removed. Restore really needs a refactor, but I'm going to be making a couple more changes leading up to 1.1 so I'll leave cleanup until after they go in. This removes most tunable constants in RESTORE and the remaining ones are defined in terms of the number of nodes in the cluster and the number of cpus on a node, so this: Closes cockroachdb#14798.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In order to get BACKUP and RESTORE stabilized for 1.0, we had to add a few rate limits. It feels a bit like papering over the issues, so these should be revisited and better solutions found post-1.0.
Specifically:
presplitRanges
for RESTOREThe text was updated successfully, but these errors were encountered: