Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer hash checking and rehashing off-thread #1190

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Mar 6, 2024

Right now, a significant amount of work in the Crucible Downstairs is simply hashing data:

Together, this take up about 17% of our flamegraph, and are both in the do_work task (dw_task):

Screenshot 2024-03-06 at 11 43 59 AM

This PR moves that hash checking off-thread into the rayon thread pool, which is common for this kind of blocking work:

    ┌──────────┐              ┌───────────┐
    │FramedRead│              │FramedWrite│
    └────┬─────┘              └─────▲─────┘
         │                          │
         │         ┌────────────────┴────────────────────┐
         │         │         framed_write_task           │
         │         └─▲─────▲──────────────────▲──────────┘
         │           │     │                  │
         │       ping│     │invalid           │
         │  ┌────────┘     │frame             │responses
         │  │              │errors            │
         │  │              │                  │
    ┌────▼──┴─┐ message   ┌┴──────┐  job     ┌┴────────┐
    │resp_loop├──────────►│pf_task├─────────►│ dw_task │
    └──┬───▲──┘ channel   └──┬────┘ channel  └▲────────┘
       │   │                 │                │
  defer│   │oneshot          │                │
      ┌▼───┴┐                │                │
      │rayon│             add│work         new│work
      └─────┘                │                │
                             │                │
    per-connection           │                │
   ========================= │ ============== │ ===============
    shared state          ┌──▼────────────────┴────────────┐
                          │           Downstairs           │
                          └────────────────────────────────┘

The strategy is very similar to #1066 and #1089 , and uses the same DeferredQueue data structure (moved to crucible_common):

  • When messages arrive, they are pushed onto the deferred queue if (1) they are writes or (2) there are other messages in the deferred queue. This ensures that messages always arrive in order, while not deferring unnecessarily
  • Deferred messages arrive (in order) back at resp_loop with all of their write metadata (e.g. hashes) precomputed. We then use those hashes in region_write_pre, instead of computing them in Region::region_write and ExtentInner::write

This is a roughly 10% speedup for large writes:

1M WRITE: bw=809MiB/s (848MB/s), 809MiB/s-809MiB/s (848MB/s-848MB/s), io=47.5GiB (51.0GB), run=60129-60129msec
4K WRITE: bw=22.8MiB/s (23.9MB/s), 22.8MiB/s-22.8MiB/s (23.9MB/s-23.9MB/s), io=1367MiB (1433MB), run=60010-60010msec
1M WRITE: bw=808MiB/s (847MB/s), 808MiB/s-808MiB/s (847MB/s-847MB/s), io=47.4GiB (50.9GB), run=60046-60046msec
4M WRITE: bw=816MiB/s (855MB/s), 816MiB/s-816MiB/s (855MB/s-855MB/s), io=47.9GiB (51.4GB), run=60098-60098msec

Previously, I was seeing numbers in the 700-ish range, e.g.

1M WRITE: bw=725MiB/s (760MB/s), 725MiB/s-725MiB/s (760MB/s-760MB/s), io=42.6GiB (45.7GB), run=60134-60134msec
4K WRITE: bw=20.1MiB/s (21.1MB/s), 20.1MiB/s-20.1MiB/s (21.1MB/s-21.1MB/s), io=1208MiB (1266MB), run=60027-60027msec
1M WRITE: bw=720MiB/s (755MB/s), 720MiB/s-720MiB/s (755MB/s-755MB/s), io=42.2GiB (45.3GB), run=60052-60052msec
4M WRITE: bw=716MiB/s (751MB/s), 716MiB/s-716MiB/s (751MB/s-751MB/s), io=42.0GiB (45.1GB), run=60099-60099msec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant