limit `read_many` concurrency based on in-flight IO memory #491

HippoBaro · 2022-01-03T23:37:36Z

Add the option to limit the amount of concurrent IO requests based on
memory usage. This is a useful knob in conjunction with IO requests
coalescing because it makes sense to keep a high number of concurrent
small IO requests but less so if they are extremely large. i.e., if all
the IO requests are 4MiB large, it doesn't make much sense to schedule
more than a few at a time. Scheduling too many at once could starve
other concurrent IO tasks for no throughput benefit. Conversely, It
makes sense to schedule 4KiB requests with a higher concurrency level.

This new memory limit is added alongside the existing concurrent limit.
The final number of concurrent IO requests dispatched by a read_many
stream will be determined by whichever limit is reached first.

For instance, given the following pseudocode:

file.read_many(iovecs, 0, Some(0))
  .with_concurrency(128)
  .with_memory_limit(1 << 20)

If iovecs contains 4KiB requests, the stream will schedule 128 IO
requests concurrently because 128 * 4KiB <= 1MiB. Conversely, if
iovecs contains 256KiB requests, then the stream will schedule 4 IO
requests concurrently, because 4 * 256KiB <= 1MiB.

glommer · 2022-01-04T00:53:49Z

@HippoBaro we need to be careful here. If your claims is that we could accept a higher concurrency of smaller requests, then that indicates that the limit must be memory-based.

This starts to look a bit like the seastar I/O scheduler, btw, which is fairly complex. I am obviously okay with intermediate steps but it would be nice to at least ground it in something, and memory seems like a better dispatch limit.

At the very least, we could have both a req_nr and memory limit, and stop when any of them is reached (which becomes like a poor man's seastar scheduler)

HippoBaro · 2022-01-04T01:15:23Z

At the very least, we could have both a req_nr and memory limit and stop when any of them is reached (which becomes like a poor man's seastar scheduler)

This is what this does! This adds the memory limit on top of the concurrency limit. The final amount of concurrent IO requests dispatched is limited by whichever limit comes first.

HippoBaro · 2022-01-04T01:22:57Z

which becomes like a poor man's seastar scheduler

Very much so. Right now, this mini scheduling logic works on a per-stream basis. Meaning you can create hundreds of streams and drive them concurrently, and you'll schedule way too many IO requests for your own good. Each stream won't know about the others. That's an issue. IIRC seastar has a central IO scheduler that coordinates everything across cores.

glommer · 2022-01-04T01:28:17Z

I misunderstood you from your explanation. This is okay.

HippoBaro · 2022-01-04T01:29:42Z

I'll rephrase the commit message to clearly state that this new limit comes on top of the existing one

Add the option to limit the amount of concurrent IO requests based on memory usage. This is a useful knob in conjunction with IO requests coalescing because it makes sense to keep a high number of concurrent small IO requests but less so if they are extremely large. i.e., if all the IO requests are 4MiB large, it doesn't make much sense to schedule more than a few at a time. Scheduling too many at once could starve other concurrent IO tasks for no throughput benefit. Conversely, It makes sense to schedule 4KiB requests with a higher concurrency level. This new memory limit is added alongside the existing concurrent limit. The final number of concurrent IO requests dispatched by a `read_many` stream will be determined by whichever limit is reached first. For instance, given the following pseudocode: ```rust file.read_many(iovecs, 0, Some(0)) .with_concurrency(128) .with_memory_limit(1 << 20) ``` If `iovecs` contains 4KiB requests, the stream will schedule 128 IO requests concurrently because 128 * 4KiB <= 1MiB. Conversely, if `iovecs` contains 256KiB requests, then the stream will schedule 4 IO requests concurrently, because 4 * 256KiB <= 1MiB.

glommer approved these changes Jan 4, 2022

View reviewed changes

HippoBaro force-pushed the read_many_memory_limit branch from 6471358 to f7b19cc Compare January 4, 2022 01:37

HippoBaro merged commit 50d8d85 into DataDog:master Jan 4, 2022

HippoBaro deleted the read_many_memory_limit branch January 4, 2022 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit `read_many` concurrency based on in-flight IO memory #491

limit `read_many` concurrency based on in-flight IO memory #491

HippoBaro commented Jan 3, 2022 •

edited

Loading

glommer commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

glommer commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

limit read_many concurrency based on in-flight IO memory #491

limit read_many concurrency based on in-flight IO memory #491

Conversation

HippoBaro commented Jan 3, 2022 • edited Loading

glommer commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

glommer commented Jan 4, 2022

HippoBaro commented Jan 4, 2022

limit `read_many` concurrency based on in-flight IO memory #491

limit `read_many` concurrency based on in-flight IO memory #491

HippoBaro commented Jan 3, 2022 •

edited

Loading