Allow seeking within a prefetch stream #556

jamesbornholt · 2023-10-13T03:41:28Z

Description of change

This change addresses two problems:

Linux asynchronous readahead confuses our prefetcher by sometimes making the stream appear to go backwards, even though the customer is actually just reading sequentially (Readahead reordering causes prefetcher resets #488). The problem is that with parallel FUSE threads, the two asynchronous read operations can arrive to the prefetcher out of order.
Request patterns that seek forwards through an object (say, reading a byte at every 1MB interval) currently don't benefit from prefetching at all, even though it would be dramatically faster and cheaper to stream the entire object.

To solve these problems, this change allows the prefetcher to tolerate a little bit of seeking in both directions.

For forwards seeking, when we see a seek of an acceptable distance, we fast-forward through the stream to the desired target offset, ignoring the skipped bytes (except for later use in backwards seeking, see below).
For backwards seeking, we keep around a little bit of previously read data (or data skipped by forwards seeking) and can reload it in the event that a seek goes backwards. We do this by creating a fake new request containing the rewound bytes, so that the existing read logic will pick them up.

These seek mechanisms are protected by two new configurations for maximum forwards and backwards seek distance. Forwards seek distance is a trade-off between waiting to stream many unneeded bytes from S3 versus the latency of starting a new request. Backwards seek distance is a memory usage trade-off. In both cases, I chose fairly arbitrary numbers, except that the buffers are big enough to tolerate Linux async readahead (which manifests as 256k backwards then 512k forwards).

I tested the effectiveness of this change in two ways:

To check we fixed Readahead reordering causes prefetcher resets #488 I tried some sequential reads. They're no longer sometimes slow, and the metrics confirm the new seek logic is being triggered the way we expect.
To check that forwards seeking works well, I wrote a benchmark that reads 1024 bytes at every 1MiB interval in an object. That benchmark went from ~25s without this change to ~0.7s with this change, for a 2GiB file.

Relevant issues: #488

Does this change impact existing behavior?

No, this is a bug fix that improves performance.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

passaro

Nice solution! Just a few minor comments.

mountpoint-s3/src/prefetch.rs

dannycjones

Nice, LGTM pending any feedback from @passaro.

passaro

LGTM

The old test was hiding a bug because it used a hard coded part size of 8MB regardless of what the client used. awslabs#552 changed that and now this test runs out of memory a lot because it degrades to doing 1 byte requests. I don't think it's worth playing with the logic because it requires a weird config to get there, so just fix the test. Signed-off-by: James Bornholt <[email protected]>

sauraank · 2023-10-17T12:54:00Z

mountpoint-s3/src/prefetch/seek_window.rs

+    /// Add a new part to the front of the window, and drop any parts necessary to fit the new part
+    /// within the maximum size.
+    pub fn push(&mut self, part: Part) {
+        if part.len() > self.max_size {


Why are we dropping all the parts in case of new part length being greater than max size? Shouldn't we just abort this backward seek and keep the seek window for some other backward seek?

This isn't the seek case, this is the "store new data after it's been read" case. So we can't keep the old parts around as the new part is further into the object — keeping the old ones but not the new ones would put a gap in the object.

sauraank · 2023-10-17T13:08:50Z

mountpoint-s3/src/prefetch.rs

+            // is at most 256KiB backwards and then 512KiB forwards. For forwards seeks, we're also
+            // making a guess about where the optimal cut-off point is before it would be faster to
+            // just start a new request instead.
+            max_forward_seek_distance: 16 * 1024 * 1024,


How are these max value guesses decided? Like I am curious to know how do we estimate cut-off point?

Total guess.

sauraank · 2023-10-17T13:31:43Z

mountpoint-s3/src/prefetch.rs

    current_task: Option<RequestTask<TaskError<Client>>>,
    // Currently we only every spawn at most one future task (see [spawn_next_request])
-    future_tasks: Arc<RwLock<VecDeque<RequestTask<TaskError<Client>>>>>,
+    future_tasks: VecDeque<RequestTask<TaskError<Client>>>,


Why can we remove the read write lock on Request task queue now?

We actually never needed it — it was never being shared. Just a leftover of an old design, I think.

jamesbornholt had a problem deploying to PR integration tests October 13, 2023 03:41 — with GitHub Actions Failure

jamesbornholt force-pushed the prefetcher-readahead2 branch from 057e7db to b4bfd09 Compare October 13, 2023 03:42

jamesbornholt temporarily deployed to PR integration tests October 13, 2023 03:42 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR integration tests October 13, 2023 03:42 — with GitHub Actions Failure

jamesbornholt temporarily deployed to PR integration tests October 13, 2023 03:42 — with GitHub Actions Inactive

jamesbornholt added the performance PRs to run benchmarks on label Oct 13, 2023

jamesbornholt had a problem deploying to PR benchmarks October 13, 2023 03:54 — with GitHub Actions Failure

gpicciani requested review from passaro, dannycjones and sauraank October 13, 2023 11:00

passaro reviewed Oct 13, 2023

View reviewed changes

jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR benchmarks October 13, 2023 15:18 — with GitHub Actions Failure

jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR benchmarks October 13, 2023 15:18 — with GitHub Actions Failure

jamesbornholt had a problem deploying to PR integration tests October 13, 2023 15:18 — with GitHub Actions Failure

jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive

jamesbornholt marked this pull request as ready for review October 16, 2023 02:48

dannycjones previously approved these changes Oct 16, 2023

View reviewed changes

dannycjones mentioned this pull request Oct 16, 2023

Add new DataCache trait and InMemoryDataCache implementation #557

Merged

passaro previously approved these changes Oct 16, 2023

View reviewed changes

jamesbornholt temporarily deployed to PR integration tests October 16, 2023 16:58 — with GitHub Actions Inactive

jamesbornholt dismissed dannycjones’s stale review via ae10a21 October 16, 2023 22:05

jamesbornholt force-pushed the prefetcher-readahead2 branch from 7c5821e to ae10a21 Compare October 16, 2023 22:05

jamesbornholt temporarily deployed to PR benchmarks October 16, 2023 22:05 — with GitHub Actions Inactive

jamesbornholt temporarily deployed to PR integration tests October 16, 2023 22:05 — with GitHub Actions Inactive

jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 03:08 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR integration tests October 17, 2023 03:08 — with GitHub Actions Failure

jamesbornholt temporarily deployed to PR integration tests October 17, 2023 03:08 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR benchmarks October 17, 2023 03:08 — with GitHub Actions Failure

jamesbornholt temporarily deployed to PR integration tests October 17, 2023 03:08 — with GitHub Actions Inactive

jamesbornholt temporarily deployed to PR integration tests October 17, 2023 14:19 — with GitHub Actions Inactive

jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 14:20 — with GitHub Actions Inactive

jamesbornholt had a problem deploying to PR benchmarks October 17, 2023 14:20 — with GitHub Actions Failure

sauraank reviewed Oct 17, 2023

View reviewed changes

dannycjones mentioned this pull request Oct 17, 2023

Add configurable user agent prefix flag for mount-s3 #548

Merged

jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 18:12 — with GitHub Actions Inactive

passaro approved these changes Oct 18, 2023

View reviewed changes

passaro added this pull request to the merge queue Oct 18, 2023

dannycjones approved these changes Oct 18, 2023

View reviewed changes

Merged via the queue into awslabs:main with commit f58dbc5 Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow seeking within a prefetch stream #556

Allow seeking within a prefetch stream #556

jamesbornholt commented Oct 13, 2023 •

edited

Loading

passaro left a comment

dannycjones left a comment

passaro left a comment

sauraank Oct 17, 2023

jamesbornholt Oct 17, 2023

sauraank Oct 17, 2023

jamesbornholt Oct 17, 2023

sauraank Oct 17, 2023

jamesbornholt Oct 17, 2023

Allow seeking within a prefetch stream #556

Allow seeking within a prefetch stream #556

Conversation

jamesbornholt commented Oct 13, 2023 • edited Loading

Description of change

Does this change impact existing behavior?

passaro left a comment

Choose a reason for hiding this comment

dannycjones left a comment

Choose a reason for hiding this comment

passaro left a comment

Choose a reason for hiding this comment

sauraank Oct 17, 2023

Choose a reason for hiding this comment

jamesbornholt Oct 17, 2023

Choose a reason for hiding this comment

sauraank Oct 17, 2023

Choose a reason for hiding this comment

jamesbornholt Oct 17, 2023

Choose a reason for hiding this comment

sauraank Oct 17, 2023

Choose a reason for hiding this comment

jamesbornholt Oct 17, 2023

Choose a reason for hiding this comment

jamesbornholt commented Oct 13, 2023 •

edited

Loading