-
Notifications
You must be signed in to change notification settings - Fork 56
confuse about cluster setting "cache_capacity_reached" and "knn.circuit_breaker.unset.percentage" #203
Comments
Hi @dengxianjie, this is definitely unusual and might be a bug. I have a couple of questions:
Additionally, in order to reproduce the issue, could you provide the exact, detailed steps you followed to get the cache to the state where graph memory usage is above 100? |
thanks for reply.
What I do during the case occur is below,
Step 2. I do random search on different index on by one, and invoke GET _opendistro/_knn/stats to observe the cache stat after every search action. Additionally, I have reached the "knn.circuit_breaker.unset.percentage" before ,at the same time bulk requeset has been limited. This suggests circuit_breaker indeed can work ever. |
Thanks @dengxianjie, was able to reproduce issue. Looking into the root cause. |
Hi @dengxianjie, I think there is a bug with Cache expiration.. In this line, we do not convert As a workaround, this setting can be set to false. Can you confirm that when you set this to false, this issue does not occur? |
Hi @jmazanec15 , wonderful, the bug you found ,logically, would influence the expiration. But my issue, more accurately is the eviction doesn't works. Is it correct? "graph_memory_usage_percentage" is unexpectedly above 100. It should EVICT some graph right now rather than wait for expiration occur. At the same time "cache_capacity_reached" and "circuit_breaker_triggered" should be true. What I confused is that it seems the circuit-breaker limitation happenned something wrong. |
Hi @dengxianjie Yes, I believe the issue is that the cache does not actually get rebuilt with the new maximum weight set by This line is throwing an exception because of improper conversion. So, this line never gets called. Therefore, the old cache is still being used, which has a default CB of 50%. So capacity based evictions will still not occur until this 50% threshold is reached (at which point the circuit breaker will be set to true). The confusion comes from the stats API, which uses the Elasticsearch setting to calculate The reason we use the setting as opposed to the cache's actual maximum weight to calculate this percentage is that the Guava cache does not actually expose a getter for maxWeight. The above PR should fix this issue in the short term. However, in the long term, we should not actually change the setting until the cache is rebuilt. I will create an issue to track this. |
👍 @jmazanec15 Completely understood. Thanks. |
For deeply understand the behavior of the cache , I try some practice of cache setting likes below
And after indexing some vector into different data, the knn statistics api shows below.
What makes me confused is that "graph_memory_usage_percentage" has been over 100, but "eviction_count" doesn't work(it always be 0) after I warmup some new index.
The "cache_capacity_reached" should be true by the design but unfortunately it was still false even though "knn.circuit_breaker.unset.percentage" has totally reached . Is it anything wrong?
My device is 250GB memory, JVM uses 31G.
The text was updated successfully, but these errors were encountered: