-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow seeking within a prefetch stream #556
Conversation
057e7db
to
b4bfd09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution! Just a few minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM pending any feedback from @passaro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7c5821e
to
ae10a21
Compare
The old test was hiding a bug because it used a hard coded part size of 8MB regardless of what the client used. awslabs#552 changed that and now this test runs out of memory a lot because it degrades to doing 1 byte requests. I don't think it's worth playing with the logic because it requires a weird config to get there, so just fix the test. Signed-off-by: James Bornholt <[email protected]>
/// Add a new part to the front of the window, and drop any parts necessary to fit the new part | ||
/// within the maximum size. | ||
pub fn push(&mut self, part: Part) { | ||
if part.len() > self.max_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we dropping all the parts in case of new part length being greater than max size? Shouldn't we just abort this backward seek and keep the seek window for some other backward seek?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the seek case, this is the "store new data after it's been read" case. So we can't keep the old parts around as the new part is further into the object — keeping the old ones but not the new ones would put a gap in the object.
// is at most 256KiB backwards and then 512KiB forwards. For forwards seeks, we're also | ||
// making a guess about where the optimal cut-off point is before it would be faster to | ||
// just start a new request instead. | ||
max_forward_seek_distance: 16 * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these max value guesses decided? Like I am curious to know how do we estimate cut-off point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total guess.
current_task: Option<RequestTask<TaskError<Client>>>, | ||
// Currently we only every spawn at most one future task (see [spawn_next_request]) | ||
future_tasks: Arc<RwLock<VecDeque<RequestTask<TaskError<Client>>>>>, | ||
future_tasks: VecDeque<RequestTask<TaskError<Client>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we remove the read write lock on Request task queue now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually never needed it — it was never being shared. Just a leftover of an old design, I think.
Description of change
This change addresses two problems:
To solve these problems, this change allows the prefetcher to tolerate a little bit of seeking in both directions.
These seek mechanisms are protected by two new configurations for maximum forwards and backwards seek distance. Forwards seek distance is a trade-off between waiting to stream many unneeded bytes from S3 versus the latency of starting a new request. Backwards seek distance is a memory usage trade-off. In both cases, I chose fairly arbitrary numbers, except that the buffers are big enough to tolerate Linux async readahead (which manifests as 256k backwards then 512k forwards).
I tested the effectiveness of this change in two ways:
Relevant issues: #488
Does this change impact existing behavior?
No, this is a bug fix that improves performance.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).