[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071

hx235 · 2021-10-24T04:00:33Z

Note: This PR is the 2nd PR of a bigger PR stack (#9073).

Context:
CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used) accepts an accumulated total memory used (e.g, used 10MB so far) instead of usage change (e.g, increase by 5 MB, decrease by 5 MB). It has benefits including consolidating API for increase and decrease as described in #8506.

However, not every CacheReservationManager user keeps track of this accumulated total memory usage. For example, Bloom/Ribbon Filter construction (e.g, here in #9073) does not while WriteBufferManager and compression dictionary buffering do.

Considering future users might or might not keep track of this counter and implementing this counter within CacheReservationManager is easy due to the passed-in std::size_t new_memory_used in calling CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used), it is proposed to add a new API CacheReservationManager::GetTotalMemoryUsage().

As noted in the API comments, since CacheReservationManager is NOT thread-safe, external synchronization is
needed in calling UpdateCacheReservation() if you want GetTotalMemoryUsed() returns the indeed latest memory used.

Summary:

Added and updated private counter memory_used_ every time CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used) is called regardless if the call returns non-okay status
Added CacheReservationManager::GetTotalMemoryUsage() to return memory_used_

Test:

Passing new tests
Passing existing tests

facebook-github-bot · 2021-10-24T19:49:02Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

pdillinger · 2021-10-25T20:15:25Z

cache/cache_reservation_manager.h

@@ -90,6 +118,7 @@ class CacheReservationManager {
  std::shared_ptr<Cache> cache_;
  bool delayed_decrease_;
  std::atomic<std::size_t> cache_allocated_size_;
+  std::atomic<std::size_t> memory_used_;


I found it confusing that the class is marked NOT thread safe and yet we are using atomics here. Are GetTotalReservedCacheSize() and GetTotalMemoryUsed() actually considered thread safe?

Good question! GetTotalReservedCacheSize() can be considered thread-safe and I just updated the class comment to specify that.

For GetTotalReservedCacheSize(), the reason to use atomic variable is because I decided to replace WriteBufferManager's atomic variable dummy_size_ in its previous implementation with CacheReservationManager::cache_allocated_size_ cuz they calculate the same things as you can see from here and here.

So for consistency, I used atomic variable for cache_allocated_size_;.

In term of using std::memory_order_relaxed for such an atomic variable, there was a concern on GetTotalReservedCacheSize() might not pick up the latest update from another threads but this discussion here convinced me.

However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation() in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.

(I was actually planning to include this in my tech debt week cuz the more I look back on my previous work, the more tech debt I could find :p Btw, cc @ajkr here since I was planning to reach out for this for tech debt week)

For memory_used_ - yes, it suffices to just use size_t with external synchronization or atomic with memory_order::seq_cst/(or memory_order_acquire but I need to look into that) for multi-threaded setting, as we always want the latest memory usage for the same reason above. I updated my PR to use the former since this class was already emphasized as non thread-safe.

However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation() in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.

Without external synchronization, we only expose read-only access to cache_allocated_size_ via GetTotalReservedCacheSize(), right? I am not sure how UpdateCacheReservation() could do the wrong thing since any function that modifies cache_allocated_size_ should be run entirely before or entirely after UpdateCacheReservation() thanks to the external synchronization requirement.

modifies cache_allocated_size_ should be run entirely before or entirely after UpdateCacheReservation() thanks to the external synchronization requirement.

That's correct :) The thing I worry about is the memory order memory_order_relaxed currently being used by cache_allocated_size_. If I understand correctly, memory_order_relaxed does not guarantee the latest write from thread A can be seen by thread B. So if thread A doing an update changing cache_allocated_size_ and thread B doing an update after that based on the cache_allocated_size_, thread B might not see thread A's update due to the memory order.

I am thinking of changing to memory_order_seq_cst or memory_order_acquire/memory_order_release but need to be careful due to performance penalties (which is I assume to be the reason to use memory_order_relaxed) and deadlock (since we have external synchronization using lock and internal memory order "lock")

If I understand correctly, memory_order_relaxed does not guarantee the latest write from thread A can be seen by thread B

It should guarantee it in the relevant case:

A updates cache_allocated_size_

A releases mutex

B acquires mutex

B reads cache_allocated_size_

I think all memory operations prior to A's release of mutex should be visible by B after its acquire of mutex.

Oh interesting! I didn't know mutex can affect memory order :) Let me think more about that.

The "4.12 Memory Synchronization" is a relevant reference for the pthread_mutex_* functions - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12.

Interesting! I used to think c++'s atomic memory order works independently of mutex - but the second answer in here seems to support your argument.

The author said "In addition, memory operations that are released by a mutex are synchronized (visible) with another thread that acquires the same mutex." by reading between lines in some c++ standard.

I wish there was c++ standard reference explicitly stating that :/

My feeling is memory_order_relaxed is an ambiguous name because it could mean (a) no special memory ordering, or (b) especially loose memory ordering. Since in my understanding it actually means (a), I think of it the same as updating a non-atomic like a plain int in a mutex (which of course has to be visible to the next thread that grabs the mutex), with the additional guarantee that out-of-mutex observers cannot see partial writes (because it's atomic).

Good point 👍

facebook-github-bot · 2021-10-26T02:46:14Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

hx235 · 2021-10-26T03:43:36Z

Update:

Clarified in header comment an exception to non thread-safe usage of CacheReservationManager when calling GetTotalReservedCacheSize()
Implemented memory_used_ as ordinary std::size_t instead of atomic version for the reason mentioned in here
Ready for review when the signals are cleared

facebook-github-bot · 2021-10-26T03:44:35Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-26T03:52:43Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-10-26T04:51:57Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-10-26T04:52:24Z

Update:

Rebased onto upstream/main for clearer commit history

facebook-github-bot · 2021-10-26T04:53:14Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-26T05:36:49Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-10-26T05:41:26Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-10-26T05:41:43Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-10-26T21:54:26Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-10-26T21:54:40Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-10-26T21:55:08Z

Update

Added new line at the end of file as required
Ready for review after signals are clear

ajkr · 2021-10-29T00:15:44Z

cache/cache_reservation_manager.h

+  //        If the new_memoy_used passed in reflects a hypothetical usage
+  //        instead of the actual memory usage and this hypothetical usage is
+  //        not going to happen, you might consider calling
+  //        UpdateCacheReservation() to reversely update the actual memory usage
+  //        since the hypothetical usage that passed in will affect
+  //        GetTotalMemoryUsed() result


What about adding a higher-level SetTotalMemoryUsed() API that internally sets memory_used_ and calls UpdateCacheReservation()? And if there are no users of hypothetical usage (are there?), we can even make UpdateCacheReservation() private.

Or, if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API. The parameter is currently called new_mem_used so it is what I'd expect GetTotalMemoryUsed() to return (and not what I'd expect to be a hypothetical number).

What about adding a higher-level SetTotalMemoryUsed() API that internally sets memory_used_ and calls UpdateCacheReservation()?

That's an interesting idea! The reason why I didn't expose SetTotalMemoryUsed() is to save one step for class user calling UpdateCacheReservation(). The underlying assumption of mine is that the class user is more likely to care about modifying the cache reservation than modifying the memory being tracked.

And if there are no users of hypothetical usage (are there?)
There wasn't before but there now is one in filter construction here.

This implementation is based on the discussion with @pdillinger that we can potentially save the expensive operation ResetAndFindSeedToSolve which also allocates the actual memory by checking the cache status first. So the hypothetical memory usage might not become reality in some case. It is exactly the case this comment is emphasizing: a case where we have "reserve-before-use" and "use-not-happening-after" and we care about GetTotalMemoryUsed() being accurate.

if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API.

Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".

Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".

@ajkr is this an acceptable solution based on the discussion above?

Yes that works for me as it makes the API easy to understand. Sorry for the delay.

Summary: Note: it might conflict with another CRM related PR #9071 and so will merge after that's merged. Context: As `CacheReservationManager` being used by more memory users, it is convenient to retrieve the dummy entry size for `CacheReservationManager` instead of hard-coding `256 * 1024` in writing tests. Plus it allows more flexibility to change our implementation on dummy entry size. A follow-up PR is needed to replace those hard-coded dummy entry size value in `db_test2.cc`, `db_write_buffer_manager_test.cc`, `write_buffer_manager_test.cc`, `table_test.cc` and the ones introduced in #9072 (comment). - Exposed the private static constexpr `kDummyEntrySize` through public static `CacheReservationManager::GetDummyEntrySize()` Pull Request resolved: #9072 Test Plan: - Passing new tests - Passing existing tests Reviewed By: ajkr Differential Revision: D32043684 Pulled By: hx235 fbshipit-source-id: ddefc6921c052adab6a2cda2394eb26da3076a50

hx235 · 2021-11-04T20:24:56Z

Update:

Rebase to resolve conflict on CacheReservationManager.h due to [ResourceMngmt] Add new API CacheReservationManager::GetDummyEntrySize() #9072

facebook-github-bot · 2021-11-05T05:51:45Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-11-05T05:51:58Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-11-05T05:53:18Z

Update:

Removed confusing API comment
Rebased
Ready for review when signals are cleared

facebook-github-bot · 2021-11-05T07:42:33Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-11-05T07:42:42Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr

LGTM!

facebook-github-bot · 2021-11-09T00:07:12Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-11-09T00:15:42Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-11-09T00:16:31Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-11-09T16:17:18Z

@hx235 merged this pull request in ffd6085.

Summary: Note: This PR is the 3rd PR of a bigger PR stack (#9073) and depends on the second PR (#9071). **See changes from this PR only https://github.com/facebook/rocksdb/pull/9130/commits/00447324d082136b0e777d3ab6a3df3a8452c633** Context: pdillinger brought up a good [point](#9073 (comment)) about lacking RAII support for per cache reservation in `CacheReservationManager` when reviewing #9073. To summarize the discussion, the current API `CacheReservationManager::UpdateCacheReservation()` requires callers to explicitly calculate and pass in a correct`new_mem_used` to release a cache reservation (if they don't want to rely on the clean-up during `CacheReservationManager`'s destruction - such as they want to release it earlier). While this implementation has convenience in some use-case such as `WriteBufferManager`, where [reservation](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L69-L91) and [release](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L109-L129) amounts do not necessarily correspond symmetrically and thus a flexible `new_mem_used` inputing is needed, it can be prone to caller's calculation error as well as cause a mass of codes in releasing cache in other use-case such as filter construction, where reservation and release amounts do correspond symmetrically and many code paths requiring a cache release, as [pointed](#9073 (comment)) out by pdillinger. Therefore we decided to provide a new API in `CacheReservationManager` to update reservation with better RAII support for per cache reservation, using a handle to manage the life time of that particular cache reservation. - Added a new class `CacheReservationHandle` - Added a new API `CacheReservationManager::MakeCacheReservation()` that outputs a `CacheReservationHandle` for managing the reservation - Updated class comments to clarify two different cache reservation methods Tests: - Passing new tests Pull Request resolved: #9130 Reviewed By: pdillinger Differential Revision: D32199446 Pulled By: hx235 fbshipit-source-id: 1cba7c636e5ecfb55b0c1e0c2d218cc9b5b30b4e

#9073) Summary: Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge. **Context:** Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`. The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](#9073 (comment)) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option. **Summary:** - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction: - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance), - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`), - final filter (i.e, `mutable_buf`'s size). - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as explicitly delete the filter bits builder when done with the final filter in block based table. - Added option fo run `filter_bench` with this memory reservation feature Pull Request resolved: #9073 Test Plan: - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory` - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below: - FastLocalBloom - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0) - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`: - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0) - Ribbon128 - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0) - new feature (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `: - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0) - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console. Reviewed By: pdillinger Differential Revision: D31991348 Pulled By: hx235 fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e

facebook-github-bot added the CLA Signed label Oct 24, 2021

hx235 changed the title ~~[Depend on #9070][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage()~~ [2/3][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() Oct 24, 2021

hx235 mentioned this pull request Oct 24, 2021

[ResourceMngmt] Add new API CacheReservationManager::GetDummyEntrySize() #9072

Closed

hx235 requested a review from ajkr October 24, 2021 19:48

pdillinger reviewed Oct 25, 2021

View reviewed changes

hx235 marked this pull request as draft October 26, 2021 02:20

hx235 force-pushed the crm_memory_usage branch from c7627c5 to 141bdb6 Compare October 26, 2021 02:46

hx235 marked this pull request as ready for review October 26, 2021 03:44

hx235 marked this pull request as draft October 26, 2021 03:48

hx235 force-pushed the crm_memory_usage branch from 141bdb6 to 975d5b9 Compare October 26, 2021 03:52

hx235 marked this pull request as ready for review October 26, 2021 04:51

hx235 force-pushed the crm_memory_usage branch from 975d5b9 to 5f6955a Compare October 26, 2021 05:36

hx235 force-pushed the crm_memory_usage branch from 8242a70 to d0d73dc Compare October 26, 2021 21:54

hx235 mentioned this pull request Oct 28, 2021

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

Closed

ajkr reviewed Oct 29, 2021

View reviewed changes

hx235 force-pushed the crm_memory_usage branch from fb146b5 to 0354c3d Compare November 5, 2021 05:51

hx235 force-pushed the crm_memory_usage branch from 0354c3d to 43ea567 Compare November 5, 2021 07:42

hx235 requested a review from ajkr November 5, 2021 15:09

ajkr approved these changes Nov 8, 2021

View reviewed changes

hx235 added 7 commits November 8, 2021 16:06

Add new API

c487f27

Clarify some CRM class comments

e811d7a

Replace std::atomic<std::size_t> memory_used_ with std::size_t

7d9431a

Clarify class comments on not thread-safe issue

25f3df0

Reword

bcf0a4e

Add empty line at EOF

8423984

Feedback: remove confusing API comment

145983a

hx235 force-pushed the crm_memory_usage branch from 43ea567 to 145983a Compare November 9, 2021 00:07

facebook-github-bot closed this in ffd6085 Nov 9, 2021

facebook-github-bot added the Merged label Nov 9, 2021

[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071

[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071

Conversation

hx235 commented Oct 24, 2021 • edited Loading

facebook-github-bot commented Oct 24, 2021

Choose a reason for hiding this comment

hx235 Oct 26, 2021 • edited Loading

Choose a reason for hiding this comment

ajkr Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajkr Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 26, 2021

hx235 commented Oct 26, 2021 • edited Loading

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

hx235 commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

facebook-github-bot commented Oct 26, 2021

hx235 commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 Nov 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hx235 commented Nov 4, 2021

facebook-github-bot commented Nov 5, 2021

facebook-github-bot commented Nov 5, 2021

hx235 commented Nov 5, 2021

facebook-github-bot commented Nov 5, 2021

facebook-github-bot commented Nov 5, 2021

ajkr left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 9, 2021

facebook-github-bot commented Nov 9, 2021

facebook-github-bot commented Nov 9, 2021

facebook-github-bot commented Nov 9, 2021

hx235 commented Oct 24, 2021 •

edited

Loading

hx235 Oct 26, 2021 •

edited

Loading

ajkr Oct 29, 2021 •

edited

Loading

hx235 Oct 29, 2021 •

edited

Loading

ajkr Nov 10, 2021 •

edited

Loading

hx235 commented Oct 26, 2021 •

edited

Loading

hx235 Nov 4, 2021 •

edited

Loading