Add quantization state cache #1960

ryanbogan · 2024-08-14T18:09:43Z

Description

Adds a cache for storing quantization states. This PR contains the cache and tests without integration. In a later PR, this cache will be integrated with a quantization state reader and the query flow.

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ryan Bogan <[email protected]>

Vikasht34 · 2024-08-14T21:08:37Z

Quantization Cache Class Looks good , but we need to improve on unit tests for following test cases

testSingleThreadedAddAndRetrieve
2.testMultiThreadedAddAndRetrieve
3.testMultiThreadedEvict
4.testConcurrentAddAndEvict
5.testCacheClearFromMultipleThreads

Let's have all these tests added.

Signed-off-by: Ryan Bogan <[email protected]>

Vikasht34

Looks Good to me .

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan · 2024-08-15T22:32:59Z

The most recent commits should address @navneet1v 's comments 1-5. Will add to stats in follow up PR. I also still need to add more unit testing for the new functionality.

navneet1v · 2024-08-15T22:37:02Z

src/main/java/org/opensearch/knn/index/KNNSettings.java

+    public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 5; // By default, set aside 5% of the JVM for
+                                                                                                // the limit
+    public static final Integer KNN_MAX_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 10; // Quantization state cache limit cannot exceed
+                                                                                             // 10% of the JVM heap
+    public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES = 60;


what is reasoning behind these limits?

I added initial ones based on conversation with @Vikasht34 . We'll probably need to think more on this with benchmarking but for now I needed a value to put

Here is the reasoning behind this

Number of segments :- 2500 ( this must be exterme)
Dimesion 16000
Binary Quantization 4 bit
Threshould will looks like this

thresholds[0] {0.1f, 0.2f, 0.3f upto 16k } * thresholds[1] {0.4f, 0.5f, 0.6f upto 16k} * thresholds[2] {0.7f, 0.8f, 0.9f upto 16k} * thresholds[3] {1.0f, 1.1f, 1.2f upto 16k}

For 2500 segments, the total size of the thresholds array would be:
2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes

1,600,000,000 bytes = 1.6 GB (gigabytes)

So, the total size of your thresholds array for 2500 segments would be approximately 1.6 GB.
If you have a 32 GB heap, this array would occupy:
(1.6 GB / 32 GB) x 100% ≈ 5%

this is extreme case assuming all vectors are 16000 size and 2500 segments

Assuming P90 case where avregae Dimesion is 5000 with 4bit :- 0.5 GB which 1.5625% of 32gb heap , so we choose 5% as safe limit which can handle exterme case as well.

CacheBuilder.newBuilder().maximumWeight(2GB), the cache doesn't immediately allocate 2GB of heap space. Instead, the cache will grow in memory usage as entries are added, up to the specified weight limit (in this case, 2gb).

2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes

This is 160,000,000 bytes - 150 MB. 5k is 50 MB. That being said, I dont think we need these limits.

Also, if we do need the limits, can we not make them settings? Settings are hard to remove. It seems in the future, we may want to rely on IndexInput and mmap to handle caching. Id prefer not to expose these as settings

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

navneet1v · 2024-08-15T22:40:31Z

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

+    private static ClusterService clusterService;
+    private Cache<String, QuantizationState> cache;
+    private long maxCacheSizeInKB;
+    private Instant evictedDueToSizeAt;


shouldn't this just be a counter of type long?

Not sure I understand, why would it be a counter instead of just a long holding the max size (which is calculated from the percent setting)

navneet1v · 2024-08-15T22:42:30Z

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

+    public QuantizationState getQuantizationState(String fieldName) {
+        return cache.getIfPresent(fieldName);
+    }


Shouldn't this be more of a logic where if entry is not present in the cache then we load the value in the cache and then return that value?

Any reason of not making this as Read-Through cache?

@Vikasht34 thoghts?

Well here is my thoughts on this , I am really not in favour of clubbing cache with Quantization Reader, since we don't know what would be future for Cache interms of unifying intefaces across multiple caches and implementation , I would keep this totally light weight,

Neither I want NativeQuantizationState Reader to know about cache .

QuantizationStateCache: Manages in-memory caching of quantization states.
QuantizationStateReader: Handles reading quantization states from persistent storage.
QuantizationStateWriter: Manages writing quantization states to persistent storage.

From Query flow
I will have my Interface

public interface QuantizationStateLoader { QuantizationState getQuantizationState(String fieldName, String typeofQuantization); }

and this inteface will handle both cache and NativeQuantizationStateReader , to read it from cache , if not read from file , deseralize based on type of Quantization and save it Cache.

So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.

So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.

if are looking for making cache clean then the current implementation is not clean. It already has knowledge of QuantizationState object and other QS details.

But whether we want to make cache clean or not that is separate question. But it doesn't ans the question why this cache is not a Read-Through cache? You can just provide a loader functional interface here which is just loads the entry in cache if its not present. Then no-one needs to know about QS.

The problem with this kind of design is, we will have to write a wrapper on top of it which will first check if the entry in cache is present or not, and then load data in the cache. Which can be just avoided here.

navneet1v · 2024-08-15T22:42:59Z

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

+    /**
+     * Clears all entries from the cache.
+     */
+    public void clear() {


any reason of making this function public?

Eventually, it will be used when indices/the state reader are closed, which separates it from the rebuild() method.

Signed-off-by: Ryan Bogan <[email protected]>

src/main/java/org/opensearch/knn/index/KNNSettings.java

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan · 2024-08-16T21:32:30Z

There's one flaky test I'm still trying to fix. (edit: should be fixed now)

Signed-off-by: Ryan Bogan <[email protected]>

jmazanec15 · 2024-08-16T21:47:48Z

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java

+    private void buildCache() {
+        this.cache = CacheBuilder.newBuilder().concurrencyLevel(1).maximumWeight(maxCacheSizeInKB).weigher((k, v) -> {
+            try {
+                return ((QuantizationState) v).toByteArray().length;


We need to serialize to get the weight? That seems too expensive

As of right now, the only guaranteed methods to be in Quantization State are getting the params and toByteArray. This was the only way I could find to estimate the size with those methods

@Vikasht34 we should enhance quantization state to give this estimate

We can take in another review on this - but need to track.

Let me take a look on this. Will address this in upcoming pr.

* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 433cd70)

* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]> (cherry picked from commit 433cd70) Co-authored-by: Ryan Bogan <[email protected]>

* Add quantization state cache Signed-off-by: Ryan Bogan <[email protected]> * Fix compile on tests and add changelog Signed-off-by: Ryan Bogan <[email protected]> * Add javadocs Signed-off-by: Ryan Bogan <[email protected]> * Change cache implementation to use Cache class Signed-off-by: Ryan Bogan <[email protected]> * Make add method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Make get instance method synchronized Signed-off-by: Ryan Bogan <[email protected]> * Fix compile Signed-off-by: Ryan Bogan <[email protected]> * Remove lock Signed-off-by: Ryan Bogan <[email protected]> * Add more unit tests Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener to update when cache is full Signed-off-by: Ryan Bogan <[email protected]> * Add removal listener, weigher, and fix tests Signed-off-by: Ryan Bogan <[email protected]> * Decouple from cluster service Signed-off-by: Ryan Bogan <[email protected]> * Add testing Signed-off-by: Ryan Bogan <[email protected]> * Change to SneakyThrows Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky tests Signed-off-by: Ryan Bogan <[email protected]> * Move test file to same package as main class to fix flaky test Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> --------- Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan added 3 commits August 14, 2024 11:09

Add quantization state cache

9a37e90

Signed-off-by: Ryan Bogan <[email protected]>

Fix compile on tests and add changelog

0c645e7

Signed-off-by: Ryan Bogan <[email protected]>

Add javadocs

7b54f3f

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan marked this pull request as ready for review August 14, 2024 20:26

ryanbogan requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski and luyuncheng as code owners August 14, 2024 20:26

ryanbogan added 5 commits August 14, 2024 13:55

Change cache implementation to use Cache class

d9d6e51

Signed-off-by: Ryan Bogan <[email protected]>

Make add method synchronized

ecd0e45

Signed-off-by: Ryan Bogan <[email protected]>

Make get instance method synchronized

52f5a9e

Signed-off-by: Ryan Bogan <[email protected]>

Fix compile

8a52fc1

Signed-off-by: Ryan Bogan <[email protected]>

Remove lock

a0cdfae

Signed-off-by: Ryan Bogan <[email protected]>

Add more unit tests

25ccde9

Signed-off-by: Ryan Bogan <[email protected]>

Vikasht34 approved these changes Aug 14, 2024

View reviewed changes

navneet1v reviewed Aug 15, 2024

View reviewed changes

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java Show resolved Hide resolved

ryanbogan added 2 commits August 15, 2024 15:09

Add removal listener to update when cache is full

a9a9579

Signed-off-by: Ryan Bogan <[email protected]>

Add removal listener, weigher, and fix tests

1b83364

Signed-off-by: Ryan Bogan <[email protected]>

navneet1v reviewed Aug 15, 2024

View reviewed changes

ryanbogan added 3 commits August 16, 2024 10:12

Decouple from cluster service

e1c9e5a

Signed-off-by: Ryan Bogan <[email protected]>

Add testing

f3bacf5

Signed-off-by: Ryan Bogan <[email protected]>

Change to SneakyThrows

3db0ceb

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan added 2 commits August 16, 2024 12:00

Merge branch 'main' into quantization_state_cache

12e4e3a

Fix flaky tests

58a8f7e

Signed-off-by: Ryan Bogan <[email protected]>

shatejas reviewed Aug 16, 2024

View reviewed changes

src/main/java/org/opensearch/knn/index/KNNSettings.java Show resolved Hide resolved

...in/java/org/opensearch/knn/quantization/models/quantizationState/QuantizationStateCache.java Show resolved Hide resolved

shatejas approved these changes Aug 16, 2024

View reviewed changes

Fix flaky tests

5cd7986

Signed-off-by: Ryan Bogan <[email protected]>

ryanbogan added 2 commits August 16, 2024 14:45

Move test file to same package as main class to fix flaky test

d3cc6db

Signed-off-by: Ryan Bogan <[email protected]>

Fix spotless

a7c8cc0

Signed-off-by: Ryan Bogan <[email protected]>

jmazanec15 reviewed Aug 16, 2024

View reviewed changes

jmazanec15 approved these changes Aug 19, 2024

View reviewed changes

ryanbogan added backport 2.x v2.17.0 labels Aug 19, 2024

navneet1v approved these changes Aug 20, 2024

View reviewed changes

ryanbogan merged commit 433cd70 into opensearch-project:main Aug 20, 2024
35 checks passed

opensearch-trigger-bot bot mentioned this pull request Aug 20, 2024

[Backport 2.x] Add quantization state cache #1990

Merged

ryanbogan deleted the quantization_state_cache branch August 20, 2024 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantization state cache #1960

Add quantization state cache #1960

ryanbogan commented Aug 14, 2024 •

edited

Loading

Vikasht34 commented Aug 14, 2024

Vikasht34 left a comment

ryanbogan commented Aug 15, 2024

navneet1v Aug 15, 2024

ryanbogan Aug 15, 2024

Vikasht34 Aug 15, 2024 •

edited

Loading

jmazanec15 Aug 16, 2024

jmazanec15 Aug 16, 2024

navneet1v Aug 15, 2024

ryanbogan Aug 15, 2024

navneet1v Aug 15, 2024

Vikasht34 Aug 15, 2024 •

edited

Loading

navneet1v Aug 20, 2024

navneet1v Aug 20, 2024

navneet1v Aug 15, 2024

ryanbogan Aug 15, 2024 •

edited

Loading

ryanbogan commented Aug 16, 2024 •

edited

Loading

jmazanec15 Aug 16, 2024

ryanbogan Aug 16, 2024

jmazanec15 Aug 16, 2024

jmazanec15 Aug 17, 2024

Vikasht34 Aug 19, 2024

Add quantization state cache #1960

Add quantization state cache #1960

Conversation

ryanbogan commented Aug 14, 2024 • edited Loading

Description

Check List

Vikasht34 commented Aug 14, 2024

Vikasht34 left a comment

Choose a reason for hiding this comment

ryanbogan commented Aug 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vikasht34 Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vikasht34 Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanbogan Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

ryanbogan commented Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanbogan commented Aug 14, 2024 •

edited

Loading

Vikasht34 Aug 15, 2024 •

edited

Loading

Vikasht34 Aug 15, 2024 •

edited

Loading

ryanbogan Aug 15, 2024 •

edited

Loading

ryanbogan commented Aug 16, 2024 •

edited

Loading