Skip to content

Commit

Permalink
Account Bloom/Ribbon filter construction memory in global memory limit (
Browse files Browse the repository at this point in the history
#9073)

Summary:
Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge.

**Context:**
Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.

The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](#9073 (comment)) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.

**Summary:**
- Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
   - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
   - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
   - final filter (i.e, `mutable_buf`'s size).
      - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
- Added option fo run `filter_bench` with this memory reservation feature

Pull Request resolved: #9073

Test Plan:
- Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
  - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
- Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
- Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
   - FastLocalBloom
      - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
         - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
      - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
         - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
      - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
         - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)

    - Ribbon128
       - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
           - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
       - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
           - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
       - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
          - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
          - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.

Reviewed By: pdillinger

Differential Revision: D31991348

Pulled By: hx235

fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
  • Loading branch information
hx235 authored and facebook-github-bot committed Nov 18, 2021
1 parent 4f678b5 commit 74544d5
Show file tree
Hide file tree
Showing 17 changed files with 696 additions and 10 deletions.
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
* Provided support for tracking per-sst user-defined timestamp information in MANIFEST.
* Added new option "adaptive_readahead" in ReadOptions. For iterators, RocksDB does auto-readahead on noticing sequential reads and by enabling this option, readahead_size of current file (if reads are sequential) will be carried forward to next file instead of starting from the scratch at each level (except L0 level files). If reads are not sequential it will fall back to 8KB. This option is applicable only for RocksDB internal prefetch buffer and isn't supported with underlying file system prefetching.
* Added the read count and read bytes related stats to Statistics for tiered storage hot, warm, and cold file reads.
* Added an option to dynamically charge an updating estimated memory usage of block-based table building to block cache if block cache available. It currently only includes charging memory usage of constructing (new) Bloom Filter and Ribbon Filter to block cache. To enable this feature, set `BlockBasedTableOptions::reserve_table_builder_memory = true`.

### Bug Fixes
* Prevent a `CompactRange()` with `CompactRangeOptions::change_level == true` from possibly causing corruption to the LSM state (overlapping files within a level) when run in parallel with another manual compaction. Note that setting `force_consistency_checks == true` (the default) would cause the DB to enter read-only mode in this scenario and return `Status::Corruption`, rather than committing any corruption.
Expand Down
2 changes: 2 additions & 0 deletions cache/cache_entry_roles.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToCamelString{{
"OtherBlock",
"WriteBuffer",
"CompressionDictionaryBuildingBuffer",
"FilterConstruction",
"Misc",
}};

Expand All @@ -32,6 +33,7 @@ std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToHyphenString{{
"other-block",
"write-buffer",
"compression-dictionary-building-buffer",
"filter-construction",
"misc",
}};

Expand Down
3 changes: 3 additions & 0 deletions cache/cache_entry_roles.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ enum class CacheEntryRole {
// BlockBasedTableBuilder reservations to account for
// compression dictionary building buffer's memory usage
kCompressionDictionaryBuildingBuffer,
// Filter reservations to account for
// (new) bloom and ribbon filter construction's memory usage
kFilterConstruction,
// Default bucket, for miscellaneous cache entries. Do not use for
// entries that could potentially add up to large usage.
kMisc,
Expand Down
14 changes: 14 additions & 0 deletions cache/cache_reservation_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,11 @@ template Status
CacheReservationManager::MakeCacheReservation<CacheEntryRole::kMisc>(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationHandle<CacheEntryRole::kMisc>>* handle);
template Status CacheReservationManager::MakeCacheReservation<
CacheEntryRole::kFilterConstruction>(
std::size_t incremental_memory_used,
std::unique_ptr<
CacheReservationHandle<CacheEntryRole::kFilterConstruction>>* handle);

template <CacheEntryRole R>
Status CacheReservationManager::IncreaseCacheReservation(
Expand Down Expand Up @@ -153,6 +158,14 @@ Slice CacheReservationManager::GetNextCacheKey() {
return Slice(cache_key_, static_cast<std::size_t>(end - cache_key_));
}

template <CacheEntryRole R>
Cache::DeleterFn CacheReservationManager::TEST_GetNoopDeleterForRole() {
return GetNoopDeleterForRole<R>();
}

template Cache::DeleterFn CacheReservationManager::TEST_GetNoopDeleterForRole<
CacheEntryRole::kFilterConstruction>();

template <CacheEntryRole R>
CacheReservationHandle<R>::CacheReservationHandle(
std::size_t incremental_memory_used,
Expand All @@ -175,4 +188,5 @@ CacheReservationHandle<R>::~CacheReservationHandle() {
// Explicitly instantiate templates for "CacheEntryRole" values we use.
// This makes it possible to keep the template definitions in the .cc file.
template class CacheReservationHandle<CacheEntryRole::kMisc>;
template class CacheReservationHandle<CacheEntryRole::kFilterConstruction>;
} // namespace ROCKSDB_NAMESPACE
6 changes: 6 additions & 0 deletions cache/cache_reservation_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,12 @@ class CacheReservationManager

static constexpr std::size_t GetDummyEntrySize() { return kSizeDummyEntry; }

// For testing only - it is to help ensure the NoopDeleterForRole<R>
// accessed from CacheReservationManager and the one accessed from the test
// are from the same translation units
template <CacheEntryRole R>
static Cache::DeleterFn TEST_GetNoopDeleterForRole();

private:
static constexpr std::size_t kSizeDummyEntry = 256 * 1024;
// The key will be longer than keys for blocks in SST files so they won't
Expand Down
Loading

0 comments on commit 74544d5

Please sign in to comment.