Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit read_many concurrency based on in-flight IO memory #491

Merged
merged 1 commit into from
Jan 4, 2022

Conversation

HippoBaro
Copy link
Member

@HippoBaro HippoBaro commented Jan 3, 2022

Add the option to limit the amount of concurrent IO requests based on
memory usage. This is a useful knob in conjunction with IO requests
coalescing because it makes sense to keep a high number of concurrent
small IO requests but less so if they are extremely large. i.e., if all
the IO requests are 4MiB large, it doesn't make much sense to schedule
more than a few at a time. Scheduling too many at once could starve
other concurrent IO tasks for no throughput benefit. Conversely, It
makes sense to schedule 4KiB requests with a higher concurrency level.

This new memory limit is added alongside the existing concurrent limit.
The final number of concurrent IO requests dispatched by a read_many
stream will be determined by whichever limit is reached first.

For instance, given the following pseudocode:

file.read_many(iovecs, 0, Some(0))
  .with_concurrency(128)
  .with_memory_limit(1 << 20)

If iovecs contains 4KiB requests, the stream will schedule 128 IO
requests concurrently because 128 * 4KiB <= 1MiB. Conversely, if
iovecs contains 256KiB requests, then the stream will schedule 4 IO
requests concurrently, because 4 * 256KiB <= 1MiB.

@glommer
Copy link
Collaborator

glommer commented Jan 4, 2022

@HippoBaro we need to be careful here. If your claims is that we could accept a higher concurrency of smaller requests, then that indicates that the limit must be memory-based.

This starts to look a bit like the seastar I/O scheduler, btw, which is fairly complex. I am obviously okay with intermediate steps but it would be nice to at least ground it in something, and memory seems like a better dispatch limit.

At the very least, we could have both a req_nr and memory limit, and stop when any of them is reached (which becomes like a poor man's seastar scheduler)

@HippoBaro
Copy link
Member Author

At the very least, we could have both a req_nr and memory limit and stop when any of them is reached (which becomes like a poor man's seastar scheduler)

This is what this does! This adds the memory limit on top of the concurrency limit. The final amount of concurrent IO requests dispatched is limited by whichever limit comes first.

@HippoBaro
Copy link
Member Author

which becomes like a poor man's seastar scheduler

Very much so. Right now, this mini scheduling logic works on a per-stream basis. Meaning you can create hundreds of streams and drive them concurrently, and you'll schedule way too many IO requests for your own good. Each stream won't know about the others. That's an issue. IIRC seastar has a central IO scheduler that coordinates everything across cores.

@glommer
Copy link
Collaborator

glommer commented Jan 4, 2022

I misunderstood you from your explanation. This is okay.

@HippoBaro
Copy link
Member Author

I'll rephrase the commit message to clearly state that this new limit comes on top of the existing one

Add the option to limit the amount of concurrent IO requests based on
memory usage. This is a useful knob in conjunction with IO requests
coalescing because it makes sense to keep a high number of concurrent
small IO requests but less so if they are extremely large. i.e., if all
the IO requests are 4MiB large, it doesn't make much sense to schedule
more than a few at a time. Scheduling too many at once could starve
other concurrent IO tasks for no throughput benefit. Conversely, It
makes sense to schedule 4KiB requests with a higher concurrency level.

This new memory limit is added alongside the existing concurrent limit.
The final number of concurrent IO requests dispatched by a `read_many`
stream will be determined by whichever limit is reached first.

For instance, given the following pseudocode:

```rust
file.read_many(iovecs, 0, Some(0))
  .with_concurrency(128)
  .with_memory_limit(1 << 20)
```

If `iovecs` contains 4KiB requests, the stream will schedule 128 IO
requests concurrently because 128 * 4KiB <= 1MiB. Conversely, if
`iovecs` contains 256KiB requests, then the stream will schedule 4 IO
requests concurrently, because 4 * 256KiB <= 1MiB.
@HippoBaro HippoBaro force-pushed the read_many_memory_limit branch from 6471358 to f7b19cc Compare January 4, 2022 01:37
@HippoBaro HippoBaro merged commit 50d8d85 into DataDog:master Jan 4, 2022
@HippoBaro HippoBaro deleted the read_many_memory_limit branch January 4, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants