Skip to content

Commit

Permalink
Implement disk-based DataCache with no eviction (#593)
Browse files Browse the repository at this point in the history
* Implement disk-based DataCache with no checksums or eviction

Signed-off-by: Daniel Carl Jones <[email protected]>

* Fix typos

Signed-off-by: Daniel Carl Jones <[email protected]>

* Replace Base64URL encoding with Base64URLUnpadded encoding for data cache

Signed-off-by: Daniel Carl Jones <[email protected]>

* Ensure cached indicies are sorted in DiskDataCache

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add trace message when creating block in cache

Signed-off-by: Daniel Carl Jones <[email protected]>

* WIP: Add checksums to on-disk cache

Signed-off-by: Daniel Carl Jones <[email protected]>

* Remove cached_block_indices implementation on DiskDataCache

Signed-off-by: Daniel Carl Jones <[email protected]>

* Move version identifier to constant

Signed-off-by: Daniel Carl Jones <[email protected]>

* Replace SerializableCrc32c with u32

Signed-off-by: Daniel Carl Jones <[email protected]>

* Update DataBlock::new(..) to return Result

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add verification of block metadata to unpack after reading

Signed-off-by: Daniel Carl Jones <[email protected]>

* Replace Base64 encoding with SHA256 hash

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add TODO to split directories into sub-directories to avoid hitting any FS-specific max number of dir entries

Signed-off-by: Daniel Carl Jones <[email protected]>

* Remove intermediate buffers when (de)serializing DataBlock with bincode

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add cache version identifer to the start of blocks written to disk

Signed-off-by: Daniel Carl Jones <[email protected]>

* Fix comment on ETag::into_inner

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add rustdoc to DataBlock::new

Signed-off-by: Daniel Carl Jones <[email protected]>

* Fix typo in rustdoc for DataBlock::data

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add expected version to data block read error message

Signed-off-by: Daniel Carl Jones <[email protected]>

* Split DataBlock header fields into BlockHeader

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add checksum validation on on-disk cache DataBlock header contents

Signed-off-by: Daniel Carl Jones <[email protected]>

* Remove outdated TODO

Signed-off-by: Daniel Carl Jones <[email protected]>

* Add test for detecting when DataBlock requires version bump

Signed-off-by: Daniel Carl Jones <[email protected]>

* Refactor errors for DataBlock

Signed-off-by: Daniel Carl Jones <[email protected]>

* Rename DataBlock to DiskBlock

Signed-off-by: Daniel Carl Jones <[email protected]>

---------

Signed-off-by: Daniel Carl Jones <[email protected]>
  • Loading branch information
dannycjones authored Nov 3, 2023
1 parent 404ba9c commit 9a4cfd8
Show file tree
Hide file tree
Showing 6 changed files with 492 additions and 5 deletions.
14 changes: 10 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions mountpoint-s3-client/src/object_client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ impl ETag {
&self.etag
}

/// Unpack the [String] contained by the [ETag] wrapper
pub fn into_inner(self) -> String {
self.etag
}

/// Creating default etag for tests
#[doc(hidden)]
pub fn for_tests() -> Self {
Expand Down
6 changes: 5 additions & 1 deletion mountpoint-s3/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ anyhow = { version = "1.0.64", features = ["backtrace"] }
async-channel = "1.8.0"
async-lock = "2.6.0"
async-trait = "0.1.57"
bytes = "1.2.1"
bytes = { version = "1.2.1", features = ["serde"] }
clap = { version = "4.1.9", features = ["derive"] }
crc32c = "0.6.3"
ctrlc = { version = "3.2.3", features = ["termination"] }
Expand All @@ -36,6 +36,10 @@ nix = "0.26.2"
time = { version = "0.3.17", features = ["macros", "formatting"] }
const_format = "0.2.30"
serde_json = "1.0.95"
serde = { version = "1.0.190", features = ["derive"] }
bincode = "1.3.3"
sha2 = "0.10.6"
hex = "0.4.3"

[target.'cfg(target_os = "linux")'.dependencies]
procfs = { version = "0.15.1", default-features = false }
Expand Down
10 changes: 10 additions & 0 deletions mountpoint-s3/src/checksums.rs
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,16 @@ impl ChecksummedBytes {
}
Ok(())
}

/// Provide the underlying bytes and the associated checksum,
/// which may be recalculated if the checksum covers a larger slice than the current slice.
/// Validation may or may not be triggered, and **bytes or checksum may be corrupt** even if result returns [Ok].
///
/// If you are only interested in the underlying bytes, **you should use `into_bytes()`**.
pub fn into_inner(self) -> Result<(Bytes, Crc32c), IntegrityError> {
self.shrink_to_fit()?;
Ok((self.curr_slice, self.checksum))
}
}

impl Default for ChecksummedBytes {
Expand Down
1 change: 1 addition & 0 deletions mountpoint-s3/src/data_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
//! reducing both the number of requests as well as the latency for the reads.
//! Ultimately, this means reduced cost in terms of S3 billing as well as compute time.
pub mod disk_data_cache;
pub mod in_memory_data_cache;

use std::ops::RangeBounds;
Expand Down
Loading

0 comments on commit 9a4cfd8

Please sign in to comment.