Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

115-offloadable-bloom-filter #121

Merged
merged 57 commits into from
Nov 10, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
224f401
Restructuring: separate index with it's disk state and index subtree …
Justarone Mar 28, 2021
07fad73
Merge master into 'b+-tree-on-disk-index'
Justarone Mar 28, 2021
2dec254
Fix minor issues
Justarone Mar 28, 2021
aaf13c7
B+ tree work in progress
Justarone Mar 31, 2021
5bc2daa
Now it works with trees with height higher than 1
Justarone Apr 4, 2021
f3c3767
Add one node case processing
Justarone Apr 4, 2021
06764c5
Fix bug with pick of first elem in node
Justarone Apr 4, 2021
8784e33
Remove panic in bptree module
Justarone Apr 4, 2021
c18683a
Write test, restructure and find bug
Justarone Apr 4, 2021
7d4de97
Fix bug and rewrite bad tests
Justarone Apr 4, 2021
ddc86fe
Fix namings and warning
Justarone Apr 4, 2021
3c6906b
Fix last recs bug
Justarone Apr 4, 2021
c846b0d
Fix review issues
Justarone Apr 5, 2021
ea0844a
Merge master
Justarone Apr 5, 2021
6fde0e4
Implement builder pattern for bptree serialize process
Justarone Apr 6, 2021
96f6626
Change serializer's dynamic check on static one
Justarone Apr 10, 2021
5cd2199
IndexStruct doesn't use IndexHeader now
Justarone Apr 14, 2021
2a517bb
Fix duplicated code
Justarone Apr 14, 2021
c637870
Make file index independent from filter
Justarone Apr 14, 2021
e156056
Start generalization (serializer is generalized)
Justarone Apr 18, 2021
f4fa7e2
Remove redundant bound
Justarone Apr 18, 2021
47443b4
Revert "Remove redundant bound"
Justarone Apr 28, 2021
3487f15
Revert "Start generalization (serializer is generalized)" (there
Justarone Apr 28, 2021
b907134
Fix review issues
Justarone Apr 28, 2021
4645abc
Merge branch 'master' into b+-tree-on-disk-index
Justarone Apr 28, 2021
9cc305c
Add benchmarks for indices
Justarone May 5, 2021
ae6304f
Change benchmarks params
Justarone May 6, 2021
7b8c7ce
Fix error when last node gets only 1 key
Justarone May 8, 2021
c2007dc
Rewrite elems distribution per layer logic
Justarone May 9, 2021
27158db
Add root node in serialized form in RAM
Justarone May 11, 2021
8c82bab
Add search in serialized node (seems like deserialization is expensive)
Justarone May 11, 2021
2bd77b0
Remove deserialization from leaf nodes (that's also expensive)
Justarone May 12, 2021
e47b9d9
Remove vector creation operation and change distribution strategy a b…
Justarone May 15, 2021
24e9211
Change keys distribution in leaf node
Justarone May 15, 2021
83dc3de
Ordered headers are used as leaves
Justarone May 19, 2021
3c4e5d5
Remove redundant read in file on the left side of leaf node and push …
Justarone May 19, 2021
bdecc68
Make get_any return the latest header instead of first one (to enable…
Justarone May 19, 2021
1c2b1cb
Reverse tree in file and move headers after tree (now during search b…
Justarone May 23, 2021
c4cc6d9
Revert "Make get_any return the latest header instead of first one (t…
Justarone May 25, 2021
80c3a0a
Merge branch 'b+-tree-headers-as-leaves' into b+-tree-on-disk-index
Justarone May 25, 2021
ea2c3df
Remove leaves stage because now it's redundant (headers are used as l…
Justarone May 25, 2021
18d61c2
Fix description of b+-tree indices
Justarone May 25, 2021
f0aff14
Bloom filter offload
vovac12 Aug 12, 2021
98f339d
Fix
vovac12 Aug 13, 2021
43fd3fa
Shorter default impl
vovac12 Aug 15, 2021
da1481c
Platform agnosting bloom filter buffer
vovac12 Aug 16, 2021
c12d590
Add BloomDataProvider trait
vovac12 Aug 17, 2021
1267830
Add method to get allocated memory
vovac12 Aug 17, 2021
02b852e
Merge commit 'ca21f33604bb852861d0aeb75f0203bba481761a' of github.com…
vovac12 Nov 1, 2021
c83ca1a
Merge branch 'master' of github.com:qoollo/pearl into 115-offloadable…
vovac12 Nov 1, 2021
0745db3
Fix errors and add unit test
vovac12 Nov 2, 2021
996ad02
Update CHANGELOG.md
vovac12 Nov 2, 2021
d864aef
Fix review issues
vovac12 Nov 3, 2021
0af2ff6
Fix review issues
vovac12 Nov 8, 2021
84f2c20
Fix review issues
vovac12 Nov 9, 2021
a032ed6
Merge branch 'master' into 115-offloadable-bloom-filter
piakushin Nov 9, 2021
9c2bca7
Merge branch '115-offloadable-bloom-filter' of github.com:qoollo/pear…
vovac12 Nov 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/blob/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,19 @@ impl Blob {
}
}

pub(crate) fn check_filters_in_memory(&self, key: &[u8]) -> bool {
trace!("check filters (range and bloom)");
if let FilterResult::NotContains = self.index.check_filters_in_memory(key) {
false
} else {
true
}
}

pub(crate) fn is_filter_offloaded(&self) -> bool {
self.index.is_filter_offloaded()
}

pub(crate) fn index_memory(&self) -> usize {
self.index.memory_used()
}
Expand Down
19 changes: 17 additions & 2 deletions src/blob/index/bloom.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use bitvec::order::Lsb0;
#[derive(Debug, Clone)]
pub(crate) struct Bloom {
inner: Option<BitVec<Lsb0, u64>>,
offset_in_file: Option<u64>,
bits_count: usize,
hashers: Vec<AHasher>,
config: Config,
Expand All @@ -17,6 +18,7 @@ impl Default for Bloom {
bits_count: 0,
hashers: vec![],
config: Default::default(),
offset_in_file: None,
}
}
}
Expand Down Expand Up @@ -109,17 +111,28 @@ impl Bloom {
hashers: Self::hashers(config.hashers_count),
config,
bits_count,
offset_in_file: None,
}
}

pub fn clear(&mut self) {
self.inner = Some(bitvec![Lsb0, u64; 0; self.bits_count]);
self.offset_in_file = None;
}

pub fn is_offloaded(&self) -> bool {
self.inner.is_none()
}

pub fn offload_from_memory(&mut self) {
self.inner = None;
}

pub fn set_offset_in_file(&mut self, offset: u64) {
self.offset_in_file =
Some(offset + self.buffer_start_position().expect("Should not fail") as u64);
}

pub fn hashers(k: usize) -> Vec<AHasher> {
trace!("@TODO create configurable hashers???");
(0..k)
Expand Down Expand Up @@ -147,6 +160,7 @@ impl Bloom {
config: save.config,
inner: Some(inner),
bits_count: save.bits_count,
offset_in_file: None,
}
}

Expand Down Expand Up @@ -209,7 +223,9 @@ impl Bloom {
if self.bits_count == 0 {
idruzhitskiy marked this conversation as resolved.
Show resolved Hide resolved
return Ok(false);
}
let start_pos = self.buffer_start_position()?;
let start_pos = self
.offset_in_file
.ok_or_else(|| anyhow::anyhow!("Offset should be set for in-file operations"))?;
for index in hashers.iter_mut().map(|hasher| {
hasher.write(item.as_ref());
hasher.finish() % self.bits_count as u64
Expand Down Expand Up @@ -241,7 +257,6 @@ impl Bloom {
#[async_trait::async_trait]
pub(crate) trait BloomDataProvider {
async fn read_byte(&self, index: u64) -> Result<u8>;
async fn read_all(&self) -> Result<Vec<u8>>;
}

mod tests {
Expand Down
48 changes: 24 additions & 24 deletions src/blob/index/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ pub(crate) struct IndexStruct<FileIndex: FileIndexTrait> {
mem: Option<MemoryAttrs>,
range_filter: RangeFilter,
bloom_filter: Bloom,
bloom_offset: usize,
params: IndexParams,
inner: State<FileIndex>,
name: FileName,
Expand Down Expand Up @@ -81,7 +80,6 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
params,
bloom_filter: filter,
range_filter: RangeFilter::new(),
bloom_offset: 0,
inner: State::InMemory(BTreeMap::new()),
mem,
name,
Expand Down Expand Up @@ -114,6 +112,18 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
}
}

pub(crate) fn check_filters_in_memory(&self, key: &[u8]) -> FilterResult {
if !self.range_filter.contains(key) {
FilterResult::NotContains
} else if self.params.bloom_is_on
&& self.bloom_filter.contains_in_memory(key) == Some(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can squash this condition with previous

{
FilterResult::NotContains
} else {
FilterResult::NeedAdditionalCheck
}
}

pub async fn check_bloom_key(&self, key: &[u8]) -> Result<Option<bool>> {
if self.params.bloom_is_on {
if let Some(result) = self.bloom_filter.contains_in_memory(key) {
Expand All @@ -126,6 +136,10 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
}
}

pub fn is_filter_offloaded(&self) -> bool {
self.bloom_filter.is_offloaded()
}

pub fn bloom_memory_allocated(&self) -> usize {
self.bloom_filter.memory_allocated()
}
Expand All @@ -142,14 +156,13 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
let findex = FileIndex::from_file(name.clone(), ioring.clone()).await?;
findex.validate().with_context(|| "Header is corrupt")?;
let meta_buf = findex.read_meta().await?;
let (bloom_filter, range_filter, bloom_offset) = Self::deserialize_filters(&meta_buf)?;
let (bloom_filter, range_filter) = Self::deserialize_filters(&meta_buf)?;
let params = IndexParams::new(config.bloom_config.is_some(), config.recreate_index_file);
trace!("index restored successfuly");
let index = Self {
inner: State::OnDisk(findex),
mem: None,
name,
bloom_offset,
bloom_filter,
range_filter,
params,
Expand All @@ -169,6 +182,7 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
}
debug!("blob index simple in memory headers {}", headers.len());
let (meta_buf, bloom_offset) = self.serialize_filters()?;
self.bloom_filter.set_offset_in_file(bloom_offset as u64);
let findex = FileIndex::from_records(
&self.name.to_path(),
self.ioring.clone(),
Expand All @@ -180,7 +194,6 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
let size = findex.file_size() as usize;
self.inner = State::OnDisk(findex);
self.mem = None;
self.bloom_offset = bloom_offset;
Ok(size)
} else {
Ok(0)
Expand All @@ -199,24 +212,24 @@ impl<FileIndex: FileIndexTrait> IndexStruct<FileIndex> {
Ok((buf, bloom_offset))
}

fn deserialize_filters(buf: &[u8]) -> Result<(Bloom, RangeFilter, usize)> {
fn deserialize_filters(buf: &[u8]) -> Result<(Bloom, RangeFilter)> {
let (range_size_buf, rest_buf) = buf.split_at(size_of::<u64>());
let range_size = deserialize(&range_size_buf)?;
let (range_buf, bloom_buf) = rest_buf.split_at(range_size);
let bloom = Bloom::from_raw(bloom_buf)?;
let mut bloom = Bloom::from_raw(bloom_buf)?;
piakushin marked this conversation as resolved.
Show resolved Hide resolved
bloom.set_offset_in_file((range_size + size_of::<u64>()) as u64);
let range = RangeFilter::from_raw(range_buf)?;
Ok((bloom, range, range_size + size_of::<u64>()))
Ok((bloom, range))
}

async fn load_in_memory(&mut self, findex: FileIndex) -> Result<()> {
let (record_headers, records_count) = findex.get_records_headers().await?;
self.mem = Some(compute_mem_attrs(&record_headers, records_count));
self.inner = State::InMemory(record_headers);
let meta_buf = findex.read_meta().await?;
let (bloom_filter, range_filter, bloom_offset) = Self::deserialize_filters(&meta_buf)?;
let (bloom_filter, range_filter) = Self::deserialize_filters(&meta_buf)?;
self.bloom_filter = bloom_filter;
self.range_filter = range_filter;
self.bloom_offset = bloom_offset;
Ok(())
}

Expand Down Expand Up @@ -357,20 +370,7 @@ pub(crate) trait FileIndexTrait: Sized + Send + Sync {
impl<FileIndex: FileIndexTrait> BloomDataProvider for IndexStruct<FileIndex> {
async fn read_byte(&self, index: u64) -> Result<u8> {
match &self.inner {
State::OnDisk(findex) => findex.read_meta_at(self.bloom_offset as u64 + index).await,
_ => Err(anyhow::anyhow!("Can't read from in-memory index")),
}
}

async fn read_all(&self) -> Result<Vec<u8>> {
match &self.inner {
State::OnDisk(findex) => {
let meta = findex.read_meta().await?;
Ok(meta
.get(self.bloom_offset..)
.ok_or_else(|| anyhow::anyhow!("Incorrect bloom offset"))?
.to_vec())
}
State::OnDisk(findex) => findex.read_meta_at(index).await,
_ => Err(anyhow::anyhow!("Can't read from in-memory index")),
}
}
Expand Down
22 changes: 17 additions & 5 deletions src/storage/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -795,12 +795,24 @@ impl<K: Key> Storage<K> {
.read()
.await
.iter()
.map(|blob| blob.check_filters(key.as_ref()))
.collect::<FuturesUnordered<_>>()
.any(|value| value)
.await;
.filter(|blob| !blob.is_filter_offloaded())
.any(|blob| blob.check_filters_in_memory(key.as_ref()));

Some(in_active || in_closed)
if !(in_active || in_closed) {
piakushin marked this conversation as resolved.
Show resolved Hide resolved
let in_closed_offloaded = inner
.blobs
.read()
.await
.iter()
.filter(|blob| blob.is_filter_offloaded())
piakushin marked this conversation as resolved.
Show resolved Hide resolved
.map(|blob| blob.check_filters(key.as_ref()))
.collect::<FuturesUnordered<_>>()
.any(|value| value)
.await;
Some(in_closed_offloaded)
} else {
Some(true)
}
}

/// Offload bloom filters for closed blobs
Expand Down