-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantization state cache #1960
Add quantization state cache #1960
Conversation
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Quantization Cache Class Looks good , but we need to improve on unit tests for following test cases
Let's have all these tests added. |
Signed-off-by: Ryan Bogan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good to me .
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
The most recent commits should address @navneet1v 's comments 1-5. Will add to stats in follow up PR. I also still need to add more unit testing for the new functionality. |
public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 5; // By default, set aside 5% of the JVM for | ||
// the limit | ||
public static final Integer KNN_MAX_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 10; // Quantization state cache limit cannot exceed | ||
// 10% of the JVM heap | ||
public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES = 60; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is reasoning behind these limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added initial ones based on conversation with @Vikasht34 . We'll probably need to think more on this with benchmarking but for now I needed a value to put
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the reasoning behind this
Number of segments :- 2500 ( this must be exterme)
Dimesion 16000
Binary Quantization 4 bit
Threshould will looks like this
thresholds[0] {0.1f, 0.2f, 0.3f upto 16k }
* thresholds[1] {0.4f, 0.5f, 0.6f upto 16k}
* thresholds[2] {0.7f, 0.8f, 0.9f upto 16k}
* thresholds[3] {1.0f, 1.1f, 1.2f upto 16k}
For 2500 segments, the total size of the thresholds array would be:
2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes
1,600,000,000 bytes = 1.6 GB (gigabytes)
So, the total size of your thresholds array for 2500 segments would be approximately 1.6 GB.
If you have a 32 GB heap, this array would occupy:
(1.6 GB / 32 GB) x 100% ≈ 5%
this is extreme case assuming all vectors are 16000 size and 2500 segments
Assuming P90 case where avregae Dimesion is 5000 with 4bit :- 0.5 GB which 1.5625% of 32gb heap , so we choose 5% as safe limit which can handle exterme case as well.
CacheBuilder.newBuilder().maximumWeight(2GB), the cache doesn't immediately allocate 2GB of heap space. Instead, the cache will grow in memory usage as entries are added, up to the specified weight limit (in this case, 2gb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes
This is 160,000,000 bytes - 150 MB. 5k is 50 MB. That being said, I dont think we need these limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we do need the limits, can we not make them settings? Settings are hard to remove. It seems in the future, we may want to rely on IndexInput and mmap to handle caching. Id prefer not to expose these as settings
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Outdated
Show resolved
Hide resolved
private static ClusterService clusterService; | ||
private Cache<String, QuantizationState> cache; | ||
private long maxCacheSizeInKB; | ||
private Instant evictedDueToSizeAt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be a counter of type long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand, why would it be a counter instead of just a long holding the max size (which is calculated from the percent setting)
public QuantizationState getQuantizationState(String fieldName) { | ||
return cache.getIfPresent(fieldName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be more of a logic where if entry is not present in the cache then we load the value in the cache and then return that value?
Any reason of not making this as Read-Through cache?
@Vikasht34 thoghts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well here is my thoughts on this , I am really not in favour of clubbing cache with Quantization Reader, since we don't know what would be future for Cache interms of unifying intefaces across multiple caches and implementation , I would keep this totally light weight,
Neither I want NativeQuantizationState Reader to know about cache .
QuantizationStateCache: Manages in-memory caching of quantization states.
QuantizationStateReader: Handles reading quantization states from persistent storage.
QuantizationStateWriter: Manages writing quantization states to persistent storage.
From Query flow
I will have my Interface
public interface QuantizationStateLoader {
QuantizationState getQuantizationState(String fieldName, String typeofQuantization);
}
and this inteface will handle both cache and NativeQuantizationStateReader , to read it from cache , if not read from file , deseralize based on type of Quantization and save it Cache.
So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.
if are looking for making cache clean then the current implementation is not clean. It already has knowledge of QuantizationState
object and other QS details.
But whether we want to make cache clean or not that is separate question. But it doesn't ans the question why this cache is not a Read-Through cache? You can just provide a loader functional interface here which is just loads the entry in cache if its not present. Then no-one needs to know about QS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this kind of design is, we will have to write a wrapper on top of it which will first check if the entry in cache is present or not, and then load data in the cache. Which can be just avoided here.
/** | ||
* Clears all entries from the cache. | ||
*/ | ||
public void clear() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason of making this function public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, it will be used when indices/the state reader are closed, which separates it from the rebuild() method.
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java
Show resolved
Hide resolved
Signed-off-by: Ryan Bogan <[email protected]>
There's one flaky test I'm still trying to fix. (edit: should be fixed now) |
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
private void buildCache() { | ||
this.cache = CacheBuilder.newBuilder().concurrencyLevel(1).maximumWeight(maxCacheSizeInKB).weigher((k, v) -> { | ||
try { | ||
return ((QuantizationState) v).toByteArray().length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to serialize to get the weight? That seems too expensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of right now, the only guaranteed methods to be in Quantization State are getting the params and toByteArray. This was the only way I could find to estimate the size with those methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vikasht34 we should enhance quantization state to give this estimate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take in another review on this - but need to track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a look on this. Will address this in upcoming pr.
* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 433cd70)
* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 433cd70) Co-authored-by: Ryan Bogan <[email protected]>
* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>
Description
Adds a cache for storing quantization states. This PR contains the cache and tests without integration. In a later PR, this cache will be integrated with a quantization state reader and the query flow.
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.