Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quantization state cache #1960

Merged
merged 19 commits into from
Aug 20, 2024

Conversation

ryanbogan
Copy link
Member

@ryanbogan ryanbogan commented Aug 14, 2024

Description

Adds a cache for storing quantization states. This PR contains the cache and tests without integration. In a later PR, this cache will be integrated with a quantization state reader and the query flow.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Vikasht34
Copy link
Contributor

Quantization Cache Class Looks good , but we need to improve on unit tests for following test cases

  1. testSingleThreadedAddAndRetrieve
    2.testMultiThreadedAddAndRetrieve
    3.testMultiThreadedEvict
    4.testConcurrentAddAndEvict
    5.testCacheClearFromMultipleThreads

Let's have all these tests added.

Signed-off-by: Ryan Bogan <[email protected]>
Copy link
Contributor

@Vikasht34 Vikasht34 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good to me .

@ryanbogan
Copy link
Member Author

The most recent commits should address @navneet1v 's comments 1-5. Will add to stats in follow up PR. I also still need to add more unit testing for the new functionality.

Comment on lines +111 to +115
public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 5; // By default, set aside 5% of the JVM for
// the limit
public static final Integer KNN_MAX_QUANTIZATION_STATE_CACHE_SIZE_LIMIT_PERCENTAGE = 10; // Quantization state cache limit cannot exceed
// 10% of the JVM heap
public static final Integer KNN_DEFAULT_QUANTIZATION_STATE_CACHE_EXPIRY_TIME_MINUTES = 60;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is reasoning behind these limits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added initial ones based on conversation with @Vikasht34 . We'll probably need to think more on this with benchmarking but for now I needed a value to put

Copy link
Contributor

@Vikasht34 Vikasht34 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the reasoning behind this

Number of segments :- 2500 ( this must be exterme)
Dimesion 16000
Binary Quantization 4 bit
Threshould will looks like this

thresholds[0] {0.1f, 0.2f, 0.3f  upto 16k }
     *   thresholds[1] {0.4f, 0.5f, 0.6f upto 16k}
     *   thresholds[2] {0.7f, 0.8f, 0.9f upto 16k}
     *   thresholds[3] {1.0f, 1.1f, 1.2f upto 16k}

For 2500 segments, the total size of the thresholds array would be:
2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes

1,600,000,000 bytes = 1.6 GB (gigabytes)

So, the total size of your thresholds array for 2500 segments would be approximately 1.6 GB.
If you have a 32 GB heap, this array would occupy:
(1.6 GB / 32 GB) x 100% ≈ 5%

this is extreme case assuming all vectors are 16000 size and 2500 segments

Assuming P90 case where avregae Dimesion is 5000 with 4bit :- 0.5 GB which 1.5625% of 32gb heap , so we choose 5% as safe limit which can handle exterme case as well.

CacheBuilder.newBuilder().maximumWeight(2GB), the cache doesn't immediately allocate 2GB of heap space. Instead, the cache will grow in memory usage as entries are added, up to the specified weight limit (in this case, 2gb).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2500 segments x 16,000 columns x 4 bytes/element = 1,600,000,000 bytes

This is 160,000,000 bytes - 150 MB. 5k is 50 MB. That being said, I dont think we need these limits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we do need the limits, can we not make them settings? Settings are hard to remove. It seems in the future, we may want to rely on IndexInput and mmap to handle caching. Id prefer not to expose these as settings

private static ClusterService clusterService;
private Cache<String, QuantizationState> cache;
private long maxCacheSizeInKB;
private Instant evictedDueToSizeAt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this just be a counter of type long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand, why would it be a counter instead of just a long holding the max size (which is calculated from the percent setting)

Comment on lines +95 to +97
public QuantizationState getQuantizationState(String fieldName) {
return cache.getIfPresent(fieldName);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be more of a logic where if entry is not present in the cache then we load the value in the cache and then return that value?

Any reason of not making this as Read-Through cache?

@Vikasht34 thoghts?

Copy link
Contributor

@Vikasht34 Vikasht34 Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well here is my thoughts on this , I am really not in favour of clubbing cache with Quantization Reader, since we don't know what would be future for Cache interms of unifying intefaces across multiple caches and implementation , I would keep this totally light weight,

Neither I want NativeQuantizationState Reader to know about cache .

QuantizationStateCache: Manages in-memory caching of quantization states.
QuantizationStateReader: Handles reading quantization states from persistent storage.
QuantizationStateWriter: Manages writing quantization states to persistent storage.

From Query flow
I will have my Interface

public interface QuantizationStateLoader {
    QuantizationState getQuantizationState(String fieldName, String typeofQuantization);
}

and this inteface will handle both cache and NativeQuantizationStateReader , to read it from cache , if not read from file , deseralize based on type of Quantization and save it Cache.

So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Cache is clean , neither it has to know about type of Quantization State to deseralize and neither about how it was originally stored.

if are looking for making cache clean then the current implementation is not clean. It already has knowledge of QuantizationState object and other QS details.

But whether we want to make cache clean or not that is separate question. But it doesn't ans the question why this cache is not a Read-Through cache? You can just provide a loader functional interface here which is just loads the entry in cache if its not present. Then no-one needs to know about QS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this kind of design is, we will have to write a wrapper on top of it which will first check if the entry in cache is present or not, and then load data in the cache. Which can be just avoided here.

/**
* Clears all entries from the cache.
*/
public void clear() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason of making this function public?

Copy link
Member Author

@ryanbogan ryanbogan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, it will be used when indices/the state reader are closed, which separates it from the rebuild() method.

Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
@ryanbogan
Copy link
Member Author

ryanbogan commented Aug 16, 2024

There's one flaky test I'm still trying to fix. (edit: should be fixed now)

private void buildCache() {
this.cache = CacheBuilder.newBuilder().concurrencyLevel(1).maximumWeight(maxCacheSizeInKB).weigher((k, v) -> {
try {
return ((QuantizationState) v).toByteArray().length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to serialize to get the weight? That seems too expensive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of right now, the only guaranteed methods to be in Quantization State are getting the params and toByteArray. This was the only way I could find to estimate the size with those methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vikasht34 we should enhance quantization state to give this estimate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take in another review on this - but need to track.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look on this. Will address this in upcoming pr.

@ryanbogan ryanbogan merged commit 433cd70 into opensearch-project:main Aug 20, 2024
35 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 20, 2024
* Add quantization state cache

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile on tests and add changelog

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Change cache implementation to use Cache class

Signed-off-by: Ryan Bogan <[email protected]>

* Make add method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Make get instance method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile

Signed-off-by: Ryan Bogan <[email protected]>

* Remove lock

Signed-off-by: Ryan Bogan <[email protected]>

* Add more unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener to update when cache is full

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener, weigher, and fix tests

Signed-off-by: Ryan Bogan <[email protected]>

* Decouple from cluster service

Signed-off-by: Ryan Bogan <[email protected]>

* Add testing

Signed-off-by: Ryan Bogan <[email protected]>

* Change to SneakyThrows

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Move test file to same package as main class to fix flaky test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix spotless

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 433cd70)
@ryanbogan ryanbogan deleted the quantization_state_cache branch August 20, 2024 18:47
ryanbogan added a commit that referenced this pull request Aug 20, 2024
* Add quantization state cache

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile on tests and add changelog

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Change cache implementation to use Cache class

Signed-off-by: Ryan Bogan <[email protected]>

* Make add method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Make get instance method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile

Signed-off-by: Ryan Bogan <[email protected]>

* Remove lock

Signed-off-by: Ryan Bogan <[email protected]>

* Add more unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener to update when cache is full

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener, weigher, and fix tests

Signed-off-by: Ryan Bogan <[email protected]>

* Decouple from cluster service

Signed-off-by: Ryan Bogan <[email protected]>

* Add testing

Signed-off-by: Ryan Bogan <[email protected]>

* Change to SneakyThrows

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Move test file to same package as main class to fix flaky test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix spotless

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
(cherry picked from commit 433cd70)

Co-authored-by: Ryan Bogan <[email protected]>
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
* Add quantization state cache

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile on tests and add changelog

Signed-off-by: Ryan Bogan <[email protected]>

* Add javadocs

Signed-off-by: Ryan Bogan <[email protected]>

* Change cache implementation to use Cache class

Signed-off-by: Ryan Bogan <[email protected]>

* Make add method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Make get instance method synchronized

Signed-off-by: Ryan Bogan <[email protected]>

* Fix compile

Signed-off-by: Ryan Bogan <[email protected]>

* Remove lock

Signed-off-by: Ryan Bogan <[email protected]>

* Add more unit tests

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener to update when cache is full

Signed-off-by: Ryan Bogan <[email protected]>

* Add removal listener, weigher, and fix tests

Signed-off-by: Ryan Bogan <[email protected]>

* Decouple from cluster service

Signed-off-by: Ryan Bogan <[email protected]>

* Add testing

Signed-off-by: Ryan Bogan <[email protected]>

* Change to SneakyThrows

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Fix flaky tests

Signed-off-by: Ryan Bogan <[email protected]>

* Move test file to same package as main class to fix flaky test

Signed-off-by: Ryan Bogan <[email protected]>

* Fix spotless

Signed-off-by: Ryan Bogan <[email protected]>

---------

Signed-off-by: Ryan Bogan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants