-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement](cloud) support TTL file cache evict through LRU #37312
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
@@ -796,6 +818,70 @@ bool BlockFileCache::try_reserve_for_ttl(size_t size, std::lock_guard<std::mutex | |||
return true; | |||
} | |||
|
|||
bool BlockFileCache::try_reserve_for_ttl(size_t size, std::lock_guard<std::mutex>& cache_lock) { | |||
if (try_reserve_for_ttl_without_lru(size, cache_lock)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!try_reserve_for_ttl_without_lru(size, cache_lock) && config::enable_ttl_cache_evict_using_lru) {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot see the improvement of code correctness and readability of such change
be/src/io/cache/block_file_cache.cpp
Outdated
std::lock_guard block_lock(file_block->_mutex); | ||
auto hash = cell->file_block->get_hash_value(); | ||
remove(file_block, cache_lock, block_lock); | ||
if (_files.find(hash) == _files.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it dont need to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it dont need to do this
why?
TPC-H: Total hot run time: 40037 ms
|
TPC-DS: Total hot run time: 174600 ms
|
ClickBench: Total hot run time: 30.46 s
|
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
1c73728
to
9aa66d7
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 39860 ms
|
TPC-DS: Total hot run time: 173908 ms
|
ClickBench: Total hot run time: 30.9 s
|
run buildall |
TPC-H: Total hot run time: 40549 ms
|
TPC-DS: Total hot run time: 173135 ms
|
ClickBench: Total hot run time: 30.79 s
|
run p0 |
be/src/io/cache/block_file_cache.h
Outdated
|
||
// metrics | ||
size_t _num_read_blocks = 0; | ||
size_t _num_hit_blocks = 0; | ||
size_t _num_removed_blocks = 0; | ||
std::shared_ptr<bvar::Status<size_t>> _cur_cache_size_metrics; | ||
std::shared_ptr<bvar::Status<size_t>> _cur_ttl_cache_size_metrics; | ||
std::shared_ptr<bvar::Status<size_t>> _cur_ttl_cache_lru_queue_size_metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it "element count"? if no, add an "element count" bvar, if yes, keep the name consistency
PR approved by at least one committer and no changes requested. |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
be/src/io/cache/block_file_cache.cpp
Outdated
to_evict.push_back(cell); | ||
|
||
removed_size += cell_size; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated codes in try_reserve_for_ttl_without_lru.
be/src/io/cache/block_file_cache.cpp
Outdated
if (_time_to_key_iter.first->second == hash) { | ||
_time_to_key_iter.first = | ||
_time_to_key.erase(_time_to_key_iter.first); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many idents here, we should supress indents via function call.
Originally, TTL file cache only evict when expire. If TTL data full fill the cache, new TTL data won't be fit in cache and thus switch to SKIP_CACHE mode, which is a performance killer by overrun the S3 downloader with tons of small IOs. This commit enable evicting TTL cache actively through LRU beside the original TTL expiration. Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
7463b97
to
dfbffcb
Compare
Signed-off-by: freemandealer <[email protected]>
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
TPC-H: Total hot run time: 39936 ms
|
TPC-DS: Total hot run time: 175373 ms
|
ClickBench: Total hot run time: 30.72 s
|
…#37312) # Motivation and Basic Ideas Originally, the TTL file cache only evicts when it expires. If TTL data fills the cache, new TTL data won't fit in the cache and thus switch to SKIP_CACHE mode, which is a performance killer that overruns the S3 downloader with tons of small IOs. This commit enables evicting the TTL cache actively through LRU beside the original TTL expiration. # Performance tests We can set up a scenario where the TTL cache is full by using a small file cache space (e.g. 5GB). We use table comsumer_ttl in the regression test attached with the PR and load data several times (e.g. 20GB) to populate the TTL cache. With this setting, we execute query `select count(*) from customer_ttl where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like '%b%'` separately while enable_ttl_cache_evict_using_lru = true/false in the be conf. The results shows that the performance is increased by 43x during the tests: | ttl_cache_evict_using_lru feature | enabled | disabled (baseline) | | --------------------------------- | ------- | ------------------- | | query execution time | 8.5s | 342s | Note that the result heavily depends on the ratio of cache size and the amount of data that the query incurs, along with S3 server performance, etc, so the result is only here for reference. Such improvement is achieved by the active eviction of the TTL cache , which effectively reduces the possibility of SKIP_CACHE. During the tests: | ttl_cache_evict_using_lru feature | enabled | disabled (baseline) | | --------------------------------- | ------- | ------------------- | | SKIP_CACHE occurence | 0 | 257.1K | --------- Signed-off-by: freemandealer <[email protected]>
# Motivation and Basic Ideas Originally, the TTL file cache only evicts when it expires. If TTL data fills the cache, new TTL data won't fit in the cache and thus switch to SKIP_CACHE mode, which is a performance killer that overruns the S3 downloader with tons of small IOs. This commit enables evicting the TTL cache actively through LRU beside the original TTL expiration. # Performance tests We can set up a scenario where the TTL cache is full by using a small file cache space (e.g. 5GB). We use table comsumer_ttl in the regression test attached with the PR and load data several times (e.g. 20GB) to populate the TTL cache. With this setting, we execute query `select count(*) from customer_ttl where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like '%b%'` separately while enable_ttl_cache_evict_using_lru = true/false in the be conf. The results shows that the performance is increased by 43x during the tests: | ttl_cache_evict_using_lru feature | enabled | disabled (baseline) | | --------------------------------- | ------- | ------------------- | | query execution time | 8.5s | 342s | Note that the result heavily depends on the ratio of cache size and the amount of data that the query incurs, along with S3 server performance, etc, so the result is only here for reference. Such improvement is achieved by the active eviction of the TTL cache , which effectively reduces the possibility of SKIP_CACHE. During the tests: | ttl_cache_evict_using_lru feature | enabled | disabled (baseline) | | --------------------------------- | ------- | ------------------- | | SKIP_CACHE occurence | 0 | 257.1K | --------- Signed-off-by: freemandealer <[email protected]>
Motivation and Basic Ideas
Originally, the TTL file cache only evicts when it expires. If TTL data fills the cache, new TTL data won't fit in the cache and thus switch to SKIP_CACHE mode, which is a performance killer that overruns the S3 downloader with tons of small IOs.
This commit enables evicting the TTL cache actively through LRU beside the original TTL expiration.
Performance tests
We can set up a scenario where the TTL cache is full by using a small file cache space (e.g. 5GB). We use table comsumer_ttl in the regression test attached with the PR and load data several times (e.g. 20GB) to populate the TTL cache.
With this setting, we execute query
select count(*) from customer_ttl where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like '%b%'
separately while enable_ttl_cache_evict_using_lru = true/false in the be conf. The results shows that the performance is increased by 43x during the tests:Note that the result heavily depends on the ratio of cache size and the amount of data that the query incurs, along with S3 server performance, etc, so the result is only here for reference.
Such improvement is achieved by the active eviction of the TTL cache , which effectively reduces the possibility of SKIP_CACHE. During the tests: