Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow seeking within a prefetch stream #556

Merged
merged 3 commits into from
Oct 18, 2023

Conversation

jamesbornholt
Copy link
Member

@jamesbornholt jamesbornholt commented Oct 13, 2023

Description of change

This change addresses two problems:

  1. Linux asynchronous readahead confuses our prefetcher by sometimes making the stream appear to go backwards, even though the customer is actually just reading sequentially (Readahead reordering causes prefetcher resets #488). The problem is that with parallel FUSE threads, the two asynchronous read operations can arrive to the prefetcher out of order.
  2. Request patterns that seek forwards through an object (say, reading a byte at every 1MB interval) currently don't benefit from prefetching at all, even though it would be dramatically faster and cheaper to stream the entire object.

To solve these problems, this change allows the prefetcher to tolerate a little bit of seeking in both directions.

  • For forwards seeking, when we see a seek of an acceptable distance, we fast-forward through the stream to the desired target offset, ignoring the skipped bytes (except for later use in backwards seeking, see below).
  • For backwards seeking, we keep around a little bit of previously read data (or data skipped by forwards seeking) and can reload it in the event that a seek goes backwards. We do this by creating a fake new request containing the rewound bytes, so that the existing read logic will pick them up.

These seek mechanisms are protected by two new configurations for maximum forwards and backwards seek distance. Forwards seek distance is a trade-off between waiting to stream many unneeded bytes from S3 versus the latency of starting a new request. Backwards seek distance is a memory usage trade-off. In both cases, I chose fairly arbitrary numbers, except that the buffers are big enough to tolerate Linux async readahead (which manifests as 256k backwards then 512k forwards).

I tested the effectiveness of this change in two ways:

  • To check we fixed Readahead reordering causes prefetcher resets #488 I tried some sequential reads. They're no longer sometimes slow, and the metrics confirm the new seek logic is being triggered the way we expect.
  • To check that forwards seeking works well, I wrote a benchmark that reads 1024 bytes at every 1MiB interval in an object. That benchmark went from ~25s without this change to ~0.7s with this change, for a 2GiB file.

Relevant issues: #488

Does this change impact existing behavior?

No, this is a bug fix that improves performance.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 03:42 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 03:42 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 03:42 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt added the performance PRs to run benchmarks on label Oct 13, 2023
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution! Just a few minor comments.

mountpoint-s3/src/prefetch.rs Show resolved Hide resolved
mountpoint-s3/src/prefetch.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/prefetch.rs Show resolved Hide resolved
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 13, 2023 15:18 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt marked this pull request as ready for review October 16, 2023 02:48
dannycjones
dannycjones previously approved these changes Oct 16, 2023
Copy link
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM pending any feedback from @passaro.

passaro
passaro previously approved these changes Oct 16, 2023
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 16:58 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 16:58 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 16:58 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 16:58 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 16, 2023 22:05 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 16, 2023 22:05 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 22:05 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 22:05 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 22:05 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 16, 2023 22:05 — with GitHub Actions Inactive
The old test was hiding a bug because it used a hard coded part size of
8MB regardless of what the client used. awslabs#552 changed that and now this
test runs out of memory a lot because it degrades to doing 1 byte
requests. I don't think it's worth playing with the logic because it
requires a weird config to get there, so just fix the test.

Signed-off-by: James Bornholt <[email protected]>
@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 03:08 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 03:08 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 03:08 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 03:08 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 14:19 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 14:19 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 14:19 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR integration tests October 17, 2023 14:19 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 14:20 — with GitHub Actions Inactive
/// Add a new part to the front of the window, and drop any parts necessary to fit the new part
/// within the maximum size.
pub fn push(&mut self, part: Part) {
if part.len() > self.max_size {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we dropping all the parts in case of new part length being greater than max size? Shouldn't we just abort this backward seek and keep the seek window for some other backward seek?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the seek case, this is the "store new data after it's been read" case. So we can't keep the old parts around as the new part is further into the object — keeping the old ones but not the new ones would put a gap in the object.

// is at most 256KiB backwards and then 512KiB forwards. For forwards seeks, we're also
// making a guess about where the optimal cut-off point is before it would be faster to
// just start a new request instead.
max_forward_seek_distance: 16 * 1024 * 1024,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these max value guesses decided? Like I am curious to know how do we estimate cut-off point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total guess.

current_task: Option<RequestTask<TaskError<Client>>>,
// Currently we only every spawn at most one future task (see [spawn_next_request])
future_tasks: Arc<RwLock<VecDeque<RequestTask<TaskError<Client>>>>>,
future_tasks: VecDeque<RequestTask<TaskError<Client>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we remove the read write lock on Request task queue now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually never needed it — it was never being shared. Just a leftover of an old design, I think.

@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 18:12 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt temporarily deployed to PR benchmarks October 17, 2023 18:12 — with GitHub Actions Inactive
@passaro passaro added this pull request to the merge queue Oct 18, 2023
Merged via the queue into awslabs:main with commit f58dbc5 Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance PRs to run benchmarks on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Readahead reordering causes prefetcher resets
4 participants