Stream large blobs to remote cache directly from local cache file #19711

huonw · 2023-08-30T11:29:08Z

This (hopefully) optimises storing large blobs to a remote cache, by streaming them directly from the file stored on disk in the "FSDB".

This builds on the FSDB local store work (#18153), relying on large objects being stored as an immutable file on disk, in the cache managed by Pants.

This is an optimisation in several ways:

Cutting out an extra temporary file:
- Previously Store::store_large_blob_remote would load the whole blob from the local store and then write that to a temporary file. This was appropriate with LMBD-backed blobs.
- With new FSDB, there's already a file that can be used, no need for that temporary, and so the file creation and writing overhead can be eliminated .
Reducing sync IO in async tasks, due to mmap:
- Previously ByteStore::store_buffered would take that temporary file and mmap it, to be able to slice into Bytes more efficiently... except this is secretly blocking/sync IO, happening within async tasks (AIUI: when accessing a mmap'd byte that's only on disk, not yet in memory, the whole OS thread is blocked/descheduled while the OS pulls the relevant part of the file into memory, i.e. tokio can't run another task on that thread).
- This new approach uses normal tokio async IO mechanisms to read the file, and thus hopefully this has higher concurrency.
- (This also eliminates the unmaintained memmap dependency.)

I haven't benchmarked this though.

My main motivation for this is firming up the provider API before adding new byte store providers, for #11149. This also resolves some TODOs and even eliminates some unsafe, yay!

The commits are individually reviewable.

Fixes #19049, fixes #14341 (memmap removed), closes #17234 (solves the same problem but with an approach that wasn't possible at the time).

huonw · 2023-08-30T11:31:57Z

src/rust/engine/fs/store/src/lib.rs

+                    None => {
+                      Self::store_lmdb_blob_remote(local, remote_store.store, entry_type, digest)


As the comment in store_lmdb_blob_remote now notes, this code makes the assumption that anything in LMDB is small enough to load into memory. AIUI, LARGE_FILE_SIZE_LIMIT is 512KB, i.e. these blobs should be that large at most.

Do you think that's reasonable?

Yea, I do think that that is reasonable.

huonw · 2023-08-30T11:36:33Z

src/rust/engine/fs/store/src/remote.rs

+
+  /// Store the bytes in `bytes` into the remote store, as an optimisation of `store` when the bytes
+  /// are already in memory
+  async fn store_bytes(&self, digest: Digest, bytes: Bytes) -> Result<(), String>;


I think there's a fair argument that there's no need for store_bytes and everything can go through store, using Cursor::new(bytes) to create an appropriate AsyncRead from Bytes.

My thinking is that passing Bytes in directly saves some memory copies for the batch case, where that object can be splatted into store_bytes_batch and its batch upload request directly, without copying or slicing or anything (whereas using source would require reading it into a separate Bytes).

Maybe that optimisation is irrelevant when this code does network IO anyway, and it'd be better to just have this trait be store, load and list_missing_digests.

Thoughts?

AFAICT, you're right that it is one fewer copy currently to pass in Bytes. As mentioned in the comment on store_lmdb_blob_remote though, we used to write in a streaming fashion while blocking a spawn_blocking task (with block_on): that meant that we were copying directly from a MMAP into a protobuf to use with gRPC.

It's possible that we could re-introduce that optimization at some point, which would make the ~~streaming~~ batch store_bytes API superior again. But at the same time, these are fairly small blobs, so the benefits of streaming are definitely reduced.

huonw · 2023-08-30T11:38:27Z

src/rust/engine/fs/store/src/remote/reapi_tests.rs

+}
+
+#[tokio::test]
+async fn store_source_read_error_immediately() {


Most of these tests are just copied/adapted from the store_bytes ones below, except for this one and store_source_read_error_later which test the new/interesting code path: what happens if the AsyncReads fail.

huonw · 2023-08-30T11:44:12Z

src/rust/engine/fs/store/src/remote/reapi.rs

+    // an arbitrary source (e.g. file) might be small enough to write via the batch API, but we
+    // ignore that possibility for now


Is it worth considering this optimisation? This which would mean pulling the entire (sufficiently-small) source into a Bytes to be able to call self.store_bytes_batch, similar to the store_bytes fn above.

It shouldn't be, due to the limitations on when FSDB is used. And right now that is the only source of data (we always capture locally and then upload).

Hm, just noting that the batch API size limit is 4194304 (4 MiB) which is greater than the FSDB limit of 512 KiB, i.e. the REAPI code would be happy enough to upload some moderate FSDB-sourced-files via the batch API (and, if we supported uploading multiple files in a batch, we could upload up to 8 FSDB-sourced files in one batch).

I will thus leave this comment here, but not do any action with it for now.

huonw · 2023-08-31T00:46:36Z

(While working on this, I noticed #19732. I'll fix that in an independent PR.)

stuhood

Thanks!

stuhood · 2023-09-01T17:29:53Z

src/rust/engine/fs/store/src/lib.rs

+                    None => {
+                      Self::store_lmdb_blob_remote(local, remote_store.store, entry_type, digest)


Yea, I do think that that is reasonable.

stuhood · 2023-09-01T17:36:39Z

src/rust/engine/fs/store/src/remote.rs

+
+  /// Store the bytes in `bytes` into the remote store, as an optimisation of `store` when the bytes
+  /// are already in memory
+  async fn store_bytes(&self, digest: Digest, bytes: Bytes) -> Result<(), String>;


AFAICT, you're right that it is one fewer copy currently to pass in Bytes. As mentioned in the comment on store_lmdb_blob_remote though, we used to write in a streaming fashion while blocking a spawn_blocking task (with block_on): that meant that we were copying directly from a MMAP into a protobuf to use with gRPC.

It's possible that we could re-introduce that optimization at some point, which would make the ~~streaming~~ batch store_bytes API superior again. But at the same time, these are fairly small blobs, so the benefits of streaming are definitely reduced.

src/rust/engine/fs/store/src/remote/reapi.rs

stuhood · 2023-09-01T17:43:10Z

src/rust/engine/fs/store/src/remote/reapi.rs

+      if let Some(ref read_err) = *error_occurred.lock() {
+        // check if reading `source` locally hit an error: if so, propagate that error (there will
+        // likely be a remote error too, because our write will be too short, but the local error is
+        // the interesting root cause)


stuhood · 2023-09-01T17:44:07Z

src/rust/engine/fs/store/src/remote/reapi.rs

+    // an arbitrary source (e.g. file) might be small enough to write via the batch API, but we
+    // ignore that possibility for now


It shouldn't be, due to the limitations on when FSDB is used. And right now that is the only source of data (we always capture locally and then upload).

…files

huonw · 2023-09-10T01:54:41Z

@stuhood this has changed moderately significantly with #19737: getting retries to work has required creating a new StoreSource_ trait to be able to run AsyncSeek, and adjusting the StoreSource type alias to ensure this can be used while 'static.

stuhood

Thanks!

src/rust/engine/fs/store/src/remote.rs

src/rust/engine/fs/store/src/remote/reapi.rs

huonw added 8 commits August 30, 2023 16:46

Define ByteStoreProvider::store

9e4da77

Implement ByteStore::store

d97b832

Have Store call store when appropriate

bf25157

remove chunk_size_bytes

bbefe4b

remove store_buffered

0554d9d

Replace ByteSource with Bytes, for store_bytes

3d67be5

Implement and test reapi store

80acaaa

Docs and minor tweaks

708d6a8

huonw added the category:performance label Aug 30, 2023

huonw requested review from stuhood and thejcannon August 30, 2023 11:39

huonw commented Aug 30, 2023

View reviewed changes

huonw changed the title ~~Stream large blobs to remote cache directly from files on disk~~ Stream large blobs to remote cache directly from local cache file Aug 30, 2023

preamble

704d75f

huonw mentioned this pull request Aug 31, 2023

Restore retries for stores to remote REAPI byte stores #19737

Merged

stuhood approved these changes Sep 1, 2023

View reviewed changes

huonw added 6 commits September 8, 2023 15:13

Merge remote-tracking branch 'upstream/main' into huonw/19049-stream-…

ba5e409

…files

Rewrite to use async_stream

cf5aa46

Implement seek for EventuallyFailingReader

3935016

Pass attempts in from retry_call

3a481f5

Add retry to .store()

4a49edf

Rewrite retry_call attempt testing

972255c

huonw requested a review from stuhood September 10, 2023 01:54

stuhood approved these changes Sep 11, 2023

View reviewed changes

src/rust/engine/fs/store/src/remote.rs Outdated Show resolved Hide resolved

src/rust/engine/fs/store/src/remote/reapi.rs Outdated Show resolved Hide resolved

huonw added 4 commits September 12, 2023 08:55

Review: store -> store_file using tokio::fs::File directly

2ca968b

Remove now-unused EventuallyFailingReader

f0813a2

review: early return for empty stream

22fc621

Missed minor updates for store -> store_file rename

0363e6b

Reduce spurious changes

b775f51

huonw merged commit b625f09 into main Sep 12, 2023

huonw deleted the huonw/19049-stream-files branch September 12, 2023 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream large blobs to remote cache directly from local cache file #19711

Stream large blobs to remote cache directly from local cache file #19711

huonw commented Aug 30, 2023 •

edited

Loading

huonw Aug 30, 2023

stuhood Sep 1, 2023

huonw Aug 30, 2023

stuhood Sep 1, 2023 •

edited

Loading

huonw Aug 30, 2023 •

edited

Loading

huonw Aug 30, 2023 •

edited

Loading

stuhood Sep 1, 2023 •

edited

Loading

huonw Sep 8, 2023

huonw commented Aug 31, 2023

stuhood left a comment

stuhood Sep 1, 2023

stuhood Sep 1, 2023 •

edited

Loading

stuhood Sep 1, 2023

stuhood Sep 1, 2023 •

edited

Loading

huonw commented Sep 10, 2023

stuhood left a comment

		None => {
		Self::store_lmdb_blob_remote(local, remote_store.store, entry_type, digest)

		// an arbitrary source (e.g. file) might be small enough to write via the batch API, but we
		// ignore that possibility for now

Stream large blobs to remote cache directly from local cache file #19711

Stream large blobs to remote cache directly from local cache file #19711

Conversation

huonw commented Aug 30, 2023 • edited Loading

huonw Aug 30, 2023

Choose a reason for hiding this comment

stuhood Sep 1, 2023

Choose a reason for hiding this comment

huonw Aug 30, 2023

Choose a reason for hiding this comment

stuhood Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

huonw Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

huonw Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

stuhood Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

huonw Sep 8, 2023

Choose a reason for hiding this comment

huonw commented Aug 31, 2023

stuhood left a comment

Choose a reason for hiding this comment

stuhood Sep 1, 2023

Choose a reason for hiding this comment

stuhood Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

stuhood Sep 1, 2023

Choose a reason for hiding this comment

stuhood Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

huonw commented Sep 10, 2023

stuhood left a comment

Choose a reason for hiding this comment

huonw commented Aug 30, 2023 •

edited

Loading

stuhood Sep 1, 2023 •

edited

Loading

huonw Aug 30, 2023 •

edited

Loading

huonw Aug 30, 2023 •

edited

Loading

stuhood Sep 1, 2023 •

edited

Loading

stuhood Sep 1, 2023 •

edited

Loading

stuhood Sep 1, 2023 •

edited

Loading