Add vertical scaling and SoftReference for snapshot repository data cache #16489

inpink · 2024-10-27T06:43:03Z

Description

Background

Currently, snapshot repository data is not cached if its compressed size exceeds 500KB.
This static limit causes repeated downloads in operations like clone, restore, and status checks, increasing latency.
The limit has not been adjusted for larger heap sizes, impacting repositories with numerous snapshots.
It doesn’t adjust with vertical or horizontal scaling. This restricts caching efficiency, even with enhanced system resources.

Changes

The new setting, opensearch.snapshot.cache.size, allows users to adjust the cache limit as a percentage of heap size. To maintain consistency in how cache sizes are set across OpenSearch, I referenced existing configurations like indices.requests.cache.size and indices.fielddata.cache.size.

snapshot.repository_data.cache.size (String): Defines the maximum size of the snapshot repository data cache. This value can be specified either as an absolute size (e.g., 2GB) or as a percentage of the node’s heap memory (e.g., 3%). It is a static setting, meaning it must be defined in the opensearch.yml file. If not set, the default size is 1% of heap memory, with an upper limit of 5%.
Based on this discussion, I set the user-selectable cache size range between 0–5%, with a default of 1% to scale proportionally with heap size.
For implementing the maximum limit, I referred to the knn plugin and its knn.model.cache.size.limit setting.
(It would be helpful if the Setting class could directly support maximum memory size limits in the future.)
I applied a [SoftReference](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html) to latestKnownRepositoryData, which is already managed with an AtomicReference for thread safety. The approach of combining both references was inspired by Gradle’s implementation.
The previous warning for cached data exceeding 5MB was removed. Now that users can configure the cache limit directly and a SoftReference is managing cache memory, issuing a warning above 5MB no longer seemed relevant.
I believed that a static setting, rather than dynamic, was more suitable here to avoid further amplifying the unpredictability of SoftReference.

(I plan to contribute documentation for this feature to the OpenSearch project. I have already opened an issue for documentation contribution.)

Testing

Setting Tests

I conducted tests in MemorySizeSettingsTests to verify that the snapshot.repository_data.cache.size option allocates memory as intended. I also confirmed that an exception is triggered if the specified percentage exceeds the maximum limit.

E2E Test with Docker

I assembled the code I modified and ran it in Docker to conduct E2E testing to ensure everything functions properly overall.
(If any additional data is needed for these E2E tests, I am happy to provide it at any time.)

Cache Size Percentage	Image
default (1%)
1.5%
3%
500KB
Over Limit(10%)

(Click the image to view it in full size.)

Cache Testing

I wrote test code to confirm that the SoftReference works as expected. However, since latestKnownRepositoryData is private, I temporarily changed its access level to protected in my local environment, and confirmed that the tests passed.

testCacheRepositoryData: Verifies that caching works as expected in repository.latestKnownRepositoryData.
testSoftReferenceRepositoryDataCacheCleared: Confirms that the cache is cleared when there is memory pressure and no objects are referencing latestKnownRepositoryData.

    public void testCachingRepositoryData() throws Exception {
        final BlobStoreRepository repository = setupRepo();
        final long pendingGeneration = repository.metadata.pendingGeneration();

        // Check the initial state of repository data and cache status
        assertThat(OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(repository)
            .getSnapshotIds().size(), equalTo(0));
        final RepositoryData emptyData = RepositoryData.EMPTY;
        writeIndexGen(repository, emptyData, emptyData.getGenId());
        RepositoryData repoData = OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(
            repository);

        // Verify caching status
        SoftReference<Tuple<Long, BytesReference>> cachedRef = repository.latestKnownRepositoryData.get();
        assertNotNull(cachedRef);
        Tuple<Long, BytesReference> cachedData = cachedRef.get();
        assertNotNull(cachedData);

        // Verify data consistency
        assertEquals(repoData, emptyData);
        assertEquals(repoData.getIndices().size(), 0);
        assertEquals(repoData.getSnapshotIds().size(), 0);
        assertEquals(pendingGeneration + 1L, repoData.getGenId());
    }

    public void testSoftReferenceRepositoryDataCacheCleared() throws Exception {
        final BlobStoreRepository repository = setupRepo();

        // Check the initial state of repository data and cache status
        assertThat(OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(repository)
            .getSnapshotIds().size(), equalTo(0));
        final RepositoryData emptyData = RepositoryData.EMPTY;
        writeIndexGen(repository, emptyData, emptyData.getGenId());

        // Induce memory pressure to test SoftReference eviction
        try {
            final ArrayList<Object[]> allocations = new ArrayList<>();
            while (true) {
                int size = (int) Math.min(Runtime.getRuntime().freeMemory() / 2, Integer.MAX_VALUE);
                allocations.add(new Object[size]);
            }
        } catch (OutOfMemoryError e) {
            // Memory shortage situation occurred
            System.out.println("Memory pressure caused OutOfMemoryError as expected");
        }

        System.gc();

        // Check if cache has been evicted
        SoftReference<Tuple<Long, BytesReference>> cachedRef = repository.latestKnownRepositoryData.get();
        assertNotNull(cachedRef);
        Tuple<Long, BytesReference> cachedData = cachedRef.get();
        assertNull(cachedData);
    }

Closing Thoughts

I’m very pleased to have contributed to OpenSearchㅡit was an exciting feature to add! Participating in discussions with excellent maintainers to provide a robust caching feature for users has been an incredibly valuable experience for me. If there’s anything that needs adjustment, I’ll make the changes as quickly as possible.

Related Issues

Resolves #16298

Check List

Functionality includes testing.
~~[ ] API changes companion pull request created, if applicable.~~
~~[ ]Public documentation issue/PR created, if applicable.~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2024-10-27T07:11:38Z

❌ Gradle check result for e746f68: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-10-27T10:38:50Z

❌ Gradle check result for c7563a5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-10-27T14:22:53Z

❌ Gradle check result for caa5223: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

dbwiddis

Please take a look at how other percentage-based settings are handled and be consistent with those.

While scaling above some small constant is reasonable, we absolutely need a maximum value here.

dbwiddis · 2024-10-27T16:45:00Z

server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java

+
+        if (Strings.hasLength(cacheSizeProperty)) {
+            long maxHeapSize = Runtime.getRuntime().maxMemory();
+            double cacheSizeRatio = Double.parseDouble(cacheSizeProperty) / 100.0;


Does this need exception handling? The catch block in lines 3075-3078 won't catch this or give an appropriate error message.

This likely needs protection against values outside the range 0 - some maximum well below 100% (looks like the "max" protects there, but, actually, why disallow disabling the cache with 0?)

This seems to assume we're always giving a percentage. Is this consistent with other percentage-based property parsing? IIRC we need the percentage character (like this) to assume this division by 100, and I'm assuming those likely have their own parsing method somewhere.... likely associated with Setting<ByteSizeValue>. Why are we manually using a system property rather than a setting?

Thank you for your helpful feedback. Based on your insights, I am currently revising the code.
Once the discussion on the max limit is completed, I will incorporate that as well and push the updates again. Thank you for your patience!

As you suggested, I have also raised an issue on the documentation project. I appreciate all the guidance! 😄😆😁

dbwiddis · 2024-10-27T17:01:18Z

I plan to contribute documentation for this feature to the OpenSearch documentation project.

FYI, you can open the issue for that now (link in the PR template) linking to this PR and assign it to yourself. That helps the doc team plan their work for the next release. Then once the PR is merged you can contribute the doc PR.

github-actions · 2024-10-30T17:20:50Z

❕ Gradle check result for 00fe7b1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2024-10-30T17:21:36Z

Codecov Report

Attention: Patch coverage is 68.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 72.10%. Comparing base (f57b889) to head (c6260ee).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...ch/repositories/blobstore/BlobStoreRepository.java	68.00%	3 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #16489      +/-   ##
============================================
+ Coverage     72.07%   72.10%   +0.03%     
- Complexity    65048    65057       +9     
============================================
  Files          5313     5313              
  Lines        303442   303458      +16     
  Branches      43910    43915       +5     
============================================
+ Hits         218719   218822     +103     
+ Misses        66786    66689      -97     
- Partials      17937    17947      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-10-31T11:06:57Z

❌ Gradle check result for 2c9bca4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2024-10-31T15:17:00Z

❕ Gradle check result for 0aedcb8: UNSTABLE

TEST FAILURES:

      1 org.opensearch.gateway.RecoveryFromGatewayIT.testBatchModeEnabledWithDisabledTimeoutAndClusterGreen

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

reta · 2024-10-31T15:40:04Z

server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java

@@ -253,6 +257,12 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
     */
    public static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";

+    public static final String SNAPSHOT_REPOSITORY_DATA_CACHE_SIZE_SETTING_NAME = "snapshot.repository_data.cache.size";


I think there was quite a bit work done to centralize cache management:

we have CacheService to create caches (and basically provide an opportunity to make sure the caches configurations are sane)

we have NodeCacheStats to provide the cache utilization statistics

It looks to me this is the direction we should be taking, no? @peteralfonsi @sgup432 what do you think folks?

Result caching is substantially more complex than what is happening here. I would really really like to avoid exposing anything in a stat API about this behavior. I would also like to avoid adding a cluster setting too. The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold. The use case here is to simply memoize the latest snapshot metadata because there's a good chance it will need to be accessed again. I'm wondering if changing the threshold to a fixed percentage of the heap (I know settling on the specific value can be difficult) plus the new soft reference behavior as a backstop against OOMs is good enough? No need for any setting or stat or anything like that. @ashking94 @reta what do you think?

The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold.

To me this is the problem: what is relationship between heap size and snapshot metadata? Fixed size is at least predictable measure, but 1% of 32Gb heap is ~300Mb. I think just keeping the weak / soft reference to the snapshot metadata should be sufficient, if JVM is low on heap, those will be reclaimed.

Hello, @reta!
The snapshot metadata grows with the number of indices and snapshots. As a best practice, the number of indices should be limited based on the available heap memory. I’ve provided more details in my comment, including an estimate of how much the snapshot metadata cache size may increase according to AWS-recommended best practices.
Thank you for your review!

The snapshot metadata grows with the number of indices and snapshots.

Yes, but to be clear, how often do we access all the snapshots? In my experience we are mostly concerned with the most recent one(s) (for restoring) or the oldest one(s) (which contain the data about indices that haven't changed much).

Both of these would be well served by a LRU / TTL model.

As a best practice, the number of indices should be limited based on the available heap memory.

This alone is a good argument for making the value a fixed percentage of heap, but I'm not sure that multiplying by the number of snapshots is necessary beyond a day or three's worth.

…data cache - Allows cache size configuration in `opensearch.yml`, adjustable within a range of 0-5% of heap memory, with a default of 1%. - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. Signed-off-by: inpink <[email protected]>

github-actions · 2024-10-31T18:27:40Z

❕ Gradle check result for c6260ee: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

reta · 2024-10-31T18:56:15Z

server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java

@@ -1132,7 +1170,8 @@ private RepositoryData safeRepositoryData(long repositoryStateId, Map<String, Bl
            cached = null;
        } else {
            genToLoad = latestKnownRepoGen.get();
-            cached = latestKnownRepositoryData.get();
+            SoftReference<Tuple<Long, BytesReference>> softRef = latestKnownRepositoryData.get();


@inpink could you please educate me why the SoftReference is being chosen and not WeakReference ?

My understand is a WeakReference will be collected even if the JVM isn't under any significant memory pressure if the object isn't otherwise being referenced. The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.

I admit there's some risk here with a SoftReference, specifically if OpenSearch's circuit breakers prevent the JVM from ever hitting a memory pressure scenario where it will collect soft references, then these references might never be reclaimed.

The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.

Correct, this is my understanding, I think if we go with weak, the risks could be largely eliminated, right?

I think if we go with weak, the risks could be largely eliminated, right?

I think we'd have the opposite risk: a performance degradation because the weak reference would almost always be collected before it could be re-used.

Fair point, so what if we go with SoftReference + TTL as an alternative? Indeed, circuit breakers may trigger any time (and this snapshot metadata reference hanging around could be the cause), but with TTL we could actually check the last accessed time and drop the SoftReference altogether?

I believe @andrross has provided an excellent explanation for why SoftReference was chosen. Thank you very much! We previously discussed SoftReference in this issue, so I’m sharing it here in case it might be helpful: link to issue discussion. Any additional thoughts you might have would be greatly appreciated.

@reta, if I understood correctly, your suggestion of a TTL was to help reduce the unpredictability associated with SoftReference—would that be accurate? This cache is accessed each time a snapshot is added, deleted, or restored. If snapshots are taken every hour over 14 days, this would mean metadata for 336 snapshots would be cached (following AWS OpenSearch’s snapshot storage approach as described here). In this case, may I ask what you would consider an appropriate TTL duration?

github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers skip-changelog Storage:Snapshots labels Oct 27, 2024

inpink force-pushed the 16298-solve branch 2 times, most recently from 9061419 to e746f68 Compare October 27, 2024 06:54

inpink force-pushed the 16298-solve branch from e746f68 to 1fb0d58 Compare October 27, 2024 07:17

inpink force-pushed the 16298-solve branch from 1fb0d58 to c7563a5 Compare October 27, 2024 10:24

inpink force-pushed the 16298-solve branch from c7563a5 to caa5223 Compare October 27, 2024 14:10

dbwiddis reviewed Oct 27, 2024

View reviewed changes

inpink mentioned this pull request Oct 29, 2024

[DOC] Add documentation on caching for the Snapshot’s RepositoryData opensearch-project/documentation-website#8629

Open

4 tasks

inpink force-pushed the 16298-solve branch 7 times, most recently from 6e8e8d3 to 00fe7b1 Compare October 30, 2024 16:17

opensearch-ci-bot mentioned this pull request Oct 30, 2024

[AUTOCUT] Gradle Check Flaky Test Report for MinimumClusterManagerNodesIT #14289

Open

inpink force-pushed the 16298-solve branch from 00fe7b1 to 2c9bca4 Compare October 31, 2024 10:30

inpink force-pushed the 16298-solve branch from 2c9bca4 to 9089542 Compare October 31, 2024 13:39

inpink changed the title ~~Add dynamic cache limit for snapshot repository data based on heap size~~ Add vertical scaling and SoftReference for snapshot repository data cache Oct 31, 2024

inpink force-pushed the 16298-solve branch from 9089542 to 0aedcb8 Compare October 31, 2024 13:45

reta reviewed Oct 31, 2024

View reviewed changes

inpink force-pushed the 16298-solve branch from 0aedcb8 to c6260ee Compare October 31, 2024 17:16

This was referenced Oct 31, 2024

[AUTOCUT] Gradle Check Flaky Test Report for RemoteStoreStatsIT #14310

Open

[AUTOCUT] Gradle Check Flaky Test Report for RecoveryFromGatewayIT #14304

Open

reta reviewed Oct 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vertical scaling and SoftReference for snapshot repository data cache #16489

Add vertical scaling and SoftReference for snapshot repository data cache #16489

inpink commented Oct 27, 2024 •

edited

Loading

github-actions bot commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

dbwiddis left a comment

dbwiddis Oct 27, 2024 •

edited

Loading

inpink Oct 29, 2024

dbwiddis commented Oct 27, 2024

github-actions bot commented Oct 30, 2024

codecov bot commented Oct 30, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024

github-actions bot commented Oct 31, 2024

reta Oct 31, 2024 •

edited

Loading

andrross Oct 31, 2024

reta Oct 31, 2024

inpink Oct 31, 2024

dbwiddis Nov 1, 2024 •

edited

Loading

github-actions bot commented Oct 31, 2024

reta Oct 31, 2024

andrross Oct 31, 2024

reta Oct 31, 2024

andrross Oct 31, 2024

reta Oct 31, 2024

inpink Oct 31, 2024

Add vertical scaling and SoftReference for snapshot repository data cache #16489

Are you sure you want to change the base?

Add vertical scaling and SoftReference for snapshot repository data cache #16489

Conversation

inpink commented Oct 27, 2024 • edited Loading

Description

Background

Changes

Testing

Closing Thoughts

Related Issues

Check List

github-actions bot commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

dbwiddis left a comment

Choose a reason for hiding this comment

dbwiddis Oct 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbwiddis commented Oct 27, 2024

github-actions bot commented Oct 30, 2024

codecov bot commented Oct 30, 2024 • edited Loading

Codecov Report

github-actions bot commented Oct 31, 2024

github-actions bot commented Oct 31, 2024

reta Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbwiddis Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Oct 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

inpink commented Oct 27, 2024 •

edited

Loading

dbwiddis Oct 27, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

reta Oct 31, 2024 •

edited

Loading

dbwiddis Nov 1, 2024 •

edited

Loading