-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: better control the memory size used by each part of the cache #4692
Comments
If the trade-off you mentioned could be studied throughtly, I think it is possible to manage the caching of index and filter blocks automatically. On the case you mentioned: I think we should be cautious of addding a new option to configure. The trend is to reduce the number of parameters for user friendness. Simply offering an option for users to configure does not solve the problem, but may make the product harder to use. In reality, users may simply ignore many configurations and leave them as their defaults, which defeats the purpose of adding them. I have seen users reporting the best practice of disabling the whole of block cache to avoid the distrubance caused by locking on latencies, which they have found to be the bottleneck after detailed performance analysis. And the current set of configurations satisify their needs. |
Thanks for your comment. @xtcyclist
According to the RocksDB Wiki, When the given block cache size is small, the competition between data blocks and index/filter blocks for cache space will be more fierce. In extreme cases (e.g. block cache is disabled), index/filter blocks cannot even be cached.
Currently, I have no a detailed observation on the lower bound of block cache size because it may be different in different workloads.
Do they enable other cache after disabling the block cache? Or just not use any cache and achieve better performance? This seems to be an interesting phenomenon. |
RocksDB wiki is for RocksDB or LSM-tree in general, which is fine. But, for NebulaGraph, we'd better identify a particular application scenario in the GRAPH context while we are preparing an idea for a kernel-level development in NebulaGraph. This is why I am asking for a specifric application scenario to motivate this issue. On one hand, when the block cache is not sufficiently large, caching indexes and filters may evict many data blocks, because they are accessed more frequently. In this case, it may make sense to adjust the priority of filters, indexs and data blocks to avoid thrashing while still cache some hot data. Refer to this blog from smalldatum for the priority configurations. There may be many other options to consider facing this issue, before we make the conclusion to cut off the caching of filters and indexes. On the other hand, if the block cache is already very small, there may be very little benefits we can gain from using it for query processing at the first place, considering that there is a whole family of troubles of using block caches. The above mentioned locking overhead is one of them. Refer to this blog for more details. So, what do you think of the choice of simply turning off the block cache, when there is little DRAM we could use for it? |
@xtcyclist Thank you for the relevant information. But this issue is not intended to determine which cache policy is the best. I just mean that the config According to this blog, I think it is worth noting this part. When they want to modify
I totally agree with you. If you think there is no need to add this config, I will close this issue. |
@Qiaolin-Yu In my opinion, block cache is more suitable for workloads that have high recency. If the workload is mostly scan, the overhead of block cache will become more obvious. Per my experiment with LDBC benchmark, it works better with a smaller block cache like 20GB - 32GB. But totally removing block cache definitely suffers. On the other hand, I understand that a lot companies like to have full control on the memory usage by disabling page cache, and add more cache layers above rocksdb. In this case, having control on indexes and filters totally makes sense. I think we can have a discussion on whether to expose the parameter or maybe as compromise, adding this option to |
@wenhaocs I use knife and disable the page cache. For the workload
|
@wenhaocs @xtcyclist could you please take a look at this RFC? |
Background
Currently, the kvstore of NebulaGraph only opens
cache_index_and_filter_blocks
when enabling partitioned index filter. In other cases, the optioncache_index_and_filter_blocks
is never used and defaults tofalse
.According to this section of the RocksDB Wiki, the
cache_index_and_filter_blocks
determines whether the index block and filter block will be cached into the block cache. In view of the background, there seems to be a tradeoff here.Tradeoff
If index and filter blocks are cached into the block cache, the memory size used by RocksDB can be better controlled. But if the given block cache size is too small, it may cause serious performance issues because the index and filter blocks are usually large.
Otherwise, the index and filter block will be stored in heap and only be limited by setting
max_open_files
. They may be very large and their size cannot be calculated accurately. Furthermore,block_cache_tracer
cannot trace them.Expectation
To the best of my knowledge, we can add a separate option for
cache_index_and_filter_blocks
in the storage configuration (defaultfalse
). It can be opened when given block cache size is large enough and the user wants to control the total memory size better.I can help to complete this part of the work if you think it is correct and valuable.
The text was updated successfully, but these errors were encountered: