Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: investigate fsync latency spikes #106231

Open
jbowens opened this issue Jul 5, 2023 · 3 comments
Open

storage: investigate fsync latency spikes #106231

jbowens opened this issue Jul 5, 2023 · 3 comments
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-investigation Further steps needed to qualify. C-label will change. T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jul 5, 2023

We've seen many instances of fsync latency spikes in cloud clusters (including in cockroachlabs/support#2395). These fsync latency spikes can be 10+ seconds long, but without being the 20 seconds necessary to trigger disk stall detection to terminate the node.

These fsync latency stalls can be extremely disruptive to the cluster. In cockroachlabs/support#2395 overall throughput tanked as eventually every worker in the bounded worker pool becomes stuck on some operation waiting for the slow disk. There are issues (eg, #88699) already tracking the work to reduce the impact of one node's slow disk on overall cluster throughput. But I think there's something additional to investigate with respect to cloud platforms and why these stalls occur.

  • Is our volume of in-progress IOPS highly variable, and we momentarily exhaust IOPS limit resulting in throttling? If so perf: user-level IO scheduler pebble#18 could help ensure we avoid starving the WAL writer through saturating IOPS.
  • Is it possible there's something within the process introducing latency between the point at which fsync is measured (eg, the Pebble LogWriter.flushLoop) and the fsync itself? This seems unlikely. Although we have non-trivial logic within the VFS stack, the fsync codepaths are very minimal and contain no locking.

We should try to reproduce across cloud providers and investigate. For example, write a roachtest that demonstrates the issues mentioned above.

Informs #107623.

Jira issue: CRDB-29450

@blathers-crl
Copy link

blathers-crl bot commented Jul 5, 2023

Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the T-storage Storage Team label Jul 5, 2023
@jbowens jbowens added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. labels Jul 7, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Jul 10, 2023

We discussed during storage triage and a few other avenues of exploration / remedies were also discussed.

[@RaduBerinde]: The metrics that CockroachDB surfaces (eg, through timeseries) have very low granularity: We collect Store metrics every 10 seconds. This makes it very difficult to observe momentary IOPS exhaustion. Surfacing higher fidelity metrics here could help.

Should we be momentarily exhausting IOPS, short of implementing a user-level IO scheduler, we could:

  • [@RaduBerinde]: Move the WAL to a separate volume would help isolate the LogWriter from IOPS exhaustion due to flushes, compactions and reads.
  • [@nicktrav]: Separate the raft log (kvserver: separate raft log #16624) and moving that engine to a separate volume from the applied state could also improve isolation for foreground writes.

@jbowens jbowens added C-investigation Further steps needed to qualify. C-label will change. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jul 28, 2023
@jbowens
Copy link
Collaborator Author

jbowens commented Aug 14, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-investigation Further steps needed to qualify. C-label will change. T-storage Storage Team
Projects
Status: Backlog
Development

No branches or pull requests

1 participant