-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071
Conversation
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
cache/cache_reservation_manager.h
Outdated
@@ -90,6 +118,7 @@ class CacheReservationManager { | |||
std::shared_ptr<Cache> cache_; | |||
bool delayed_decrease_; | |||
std::atomic<std::size_t> cache_allocated_size_; | |||
std::atomic<std::size_t> memory_used_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it confusing that the class is marked NOT thread safe and yet we are using atomics here. Are GetTotalReservedCacheSize()
and GetTotalMemoryUsed()
actually considered thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! GetTotalReservedCacheSize()
can be considered thread-safe and I just updated the class comment to specify that.
For GetTotalReservedCacheSize()
, the reason to use atomic variable is because I decided to replace WriteBufferManager's atomic variable dummy_size_
in its previous implementation with CacheReservationManager::cache_allocated_size_
cuz they calculate the same things as you can see from here and here.
So for consistency, I used atomic variable for cache_allocated_size_;
.
In term of using std::memory_order_relaxed
for such an atomic variable, there was a concern on GetTotalReservedCacheSize()
might not pick up the latest update from another threads but this discussion here convinced me.
However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation()
in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.
(I was actually planning to include this in my tech debt week cuz the more I look back on my previous work, the more tech debt I could find :p Btw, cc @ajkr here since I was planning to reach out for this for tech debt week)
For memory_used_
- yes, it suffices to just use size_t
with external synchronization or atomic with memory_order::seq_cst
/(or memory_order_acquire
but I need to look into that) for multi-threaded setting, as we always want the latest memory usage for the same reason above. I updated my PR to use the former since this class was already emphasized as non thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation() in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.
Without external synchronization, we only expose read-only access to cache_allocated_size_
via GetTotalReservedCacheSize()
, right? I am not sure how UpdateCacheReservation()
could do the wrong thing since any function that modifies cache_allocated_size_
should be run entirely before or entirely after UpdateCacheReservation()
thanks to the external synchronization requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modifies cache_allocated_size_ should be run entirely before or entirely after UpdateCacheReservation() thanks to the external synchronization requirement.
That's correct :) The thing I worry about is the memory order memory_order_relaxed
currently being used by cache_allocated_size_
. If I understand correctly, memory_order_relaxed
does not guarantee the latest write from thread A can be seen by thread B. So if thread A doing an update changing cache_allocated_size_
and thread B doing an update after that based on the cache_allocated_size_
, thread B might not see thread A's update due to the memory order.
I am thinking of changing to memory_order_seq_cst
or memory_order_acquire/memory_order_release
but need to be careful due to performance penalties (which is I assume to be the reason to use memory_order_relaxed
) and deadlock (since we have external synchronization using lock and internal memory order "lock")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, memory_order_relaxed does not guarantee the latest write from thread A can be seen by thread B
It should guarantee it in the relevant case:
- A updates cache_allocated_size_
- A releases mutex
- B acquires mutex
- B reads cache_allocated_size_
I think all memory operations prior to A's release of mutex should be visible by B after its acquire of mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting! I didn't know mutex can affect memory order :) Let me think more about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "4.12 Memory Synchronization" is a relevant reference for the pthread_mutex_* functions - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I used to think c++'s atomic memory order works independently of mutex - but the second answer in here seems to support your argument.
The author said "In addition, memory operations that are released by a mutex are synchronized (visible) with another thread that acquires the same mutex." by reading between lines in some c++ standard.
I wish there was c++ standard reference explicitly stating that :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is memory_order_relaxed is an ambiguous name because it could mean (a) no special memory ordering, or (b) especially loose memory ordering. Since in my understanding it actually means (a), I think of it the same as updating a non-atomic like a plain int
in a mutex (which of course has to be visible to the next thread that grabs the mutex), with the additional guarantee that out-of-mutex observers cannot see partial writes (because it's atomic
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 👍
c7627c5
to
141bdb6
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
Update:
|
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
141bdb6
to
975d5b9
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Update:
|
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
975d5b9
to
5f6955a
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
8242a70
to
d0d73dc
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Update
|
cache/cache_reservation_manager.h
Outdated
// If the new_memoy_used passed in reflects a hypothetical usage | ||
// instead of the actual memory usage and this hypothetical usage is | ||
// not going to happen, you might consider calling | ||
// UpdateCacheReservation() to reversely update the actual memory usage | ||
// since the hypothetical usage that passed in will affect | ||
// GetTotalMemoryUsed() result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a higher-level SetTotalMemoryUsed()
API that internally sets memory_used_
and calls UpdateCacheReservation()
? And if there are no users of hypothetical usage (are there?), we can even make UpdateCacheReservation()
private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API. The parameter is currently called new_mem_used
so it is what I'd expect GetTotalMemoryUsed()
to return (and not what I'd expect to be a hypothetical number).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a higher-level
SetTotalMemoryUsed()
API that internally setsmemory_used_
and callsUpdateCacheReservation()
?
That's an interesting idea! The reason why I didn't expose SetTotalMemoryUsed()
is to save one step for class user calling UpdateCacheReservation()
. The underlying assumption of mine is that the class user is more likely to care about modifying the cache reservation than modifying the memory being tracked.
And if there are no users of hypothetical usage (are there?)
There wasn't before but there now is one in filter construction here.
This implementation is based on the discussion with @pdillinger that we can potentially save the expensive operation ResetAndFindSeedToSolve
which also allocates the actual memory by checking the cache status first. So the hypothetical memory usage might not become reality in some case. It is exactly the case this comment is emphasizing: a case where we have "reserve-before-use" and "use-not-happening-after" and we care about GetTotalMemoryUsed()
being accurate.
if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API.
Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".
@ajkr is this an acceptable solution based on the discussion above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that works for me as it makes the API easy to understand. Sorry for the delay.
Summary: Note: it might conflict with another CRM related PR #9071 and so will merge after that's merged. Context: As `CacheReservationManager` being used by more memory users, it is convenient to retrieve the dummy entry size for `CacheReservationManager` instead of hard-coding `256 * 1024` in writing tests. Plus it allows more flexibility to change our implementation on dummy entry size. A follow-up PR is needed to replace those hard-coded dummy entry size value in `db_test2.cc`, `db_write_buffer_manager_test.cc`, `write_buffer_manager_test.cc`, `table_test.cc` and the ones introduced in #9072 (comment). - Exposed the private static constexpr `kDummyEntrySize` through public static `CacheReservationManager::GetDummyEntrySize()` Pull Request resolved: #9072 Test Plan: - Passing new tests - Passing existing tests Reviewed By: ajkr Differential Revision: D32043684 Pulled By: hx235 fbshipit-source-id: ddefc6921c052adab6a2cda2394eb26da3076a50
Update:
|
fb146b5
to
0354c3d
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Update:
|
0354c3d
to
43ea567
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
43ea567
to
145983a
Compare
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Note: This PR is the 3rd PR of a bigger PR stack (#9073) and depends on the second PR (#9071). **See changes from this PR only https://github.com/facebook/rocksdb/pull/9130/commits/00447324d082136b0e777d3ab6a3df3a8452c633** Context: pdillinger brought up a good [point](#9073 (comment)) about lacking RAII support for per cache reservation in `CacheReservationManager` when reviewing #9073. To summarize the discussion, the current API `CacheReservationManager::UpdateCacheReservation()` requires callers to explicitly calculate and pass in a correct`new_mem_used` to release a cache reservation (if they don't want to rely on the clean-up during `CacheReservationManager`'s destruction - such as they want to release it earlier). While this implementation has convenience in some use-case such as `WriteBufferManager`, where [reservation](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L69-L91) and [release](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L109-L129) amounts do not necessarily correspond symmetrically and thus a flexible `new_mem_used` inputing is needed, it can be prone to caller's calculation error as well as cause a mass of codes in releasing cache in other use-case such as filter construction, where reservation and release amounts do correspond symmetrically and many code paths requiring a cache release, as [pointed](#9073 (comment)) out by pdillinger. Therefore we decided to provide a new API in `CacheReservationManager` to update reservation with better RAII support for per cache reservation, using a handle to manage the life time of that particular cache reservation. - Added a new class `CacheReservationHandle` - Added a new API `CacheReservationManager::MakeCacheReservation()` that outputs a `CacheReservationHandle` for managing the reservation - Updated class comments to clarify two different cache reservation methods Tests: - Passing new tests Pull Request resolved: #9130 Reviewed By: pdillinger Differential Revision: D32199446 Pulled By: hx235 fbshipit-source-id: 1cba7c636e5ecfb55b0c1e0c2d218cc9b5b30b4e
#9073) Summary: Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge. **Context:** Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`. The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](#9073 (comment)) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option. **Summary:** - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction: - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance), - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`), - final filter (i.e, `mutable_buf`'s size). - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as explicitly delete the filter bits builder when done with the final filter in block based table. - Added option fo run `filter_bench` with this memory reservation feature Pull Request resolved: #9073 Test Plan: - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory` - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below: - FastLocalBloom - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0) - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`: - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0) - Ribbon128 - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0) - new feature (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `: - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0) - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console. Reviewed By: pdillinger Differential Revision: D31991348 Pulled By: hx235 fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
Note: This PR is the 2nd PR of a bigger PR stack (#9073).
Context:
CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)
accepts an accumulated total memory used (e.g, used 10MB so far) instead of usage change (e.g, increase by 5 MB, decrease by 5 MB). It has benefits including consolidating API for increase and decrease as described in #8506.However, not every
CacheReservationManager
user keeps track of this accumulated total memory usage. For example, Bloom/Ribbon Filter construction (e.g, here in #9073) does not while WriteBufferManager and compression dictionary buffering do.Considering future users might or might not keep track of this counter and implementing this counter within
CacheReservationManager
is easy due to the passed-instd::size_t new_memory_used
in callingCacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)
, it is proposed to add a new APICacheReservationManager::GetTotalMemoryUsage()
.As noted in the API comments, since
CacheReservationManager
is NOT thread-safe, external synchronization isneeded in calling
UpdateCacheReservation()
if you wantGetTotalMemoryUsed()
returns the indeed latest memory used.Summary:
memory_used_
every timeCacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)
is called regardless if the call returns non-okay statusCacheReservationManager::GetTotalMemoryUsage()
to returnmemory_used_
Test: