Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vertical scaling and SoftReference for snapshot repository data cache #16489

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

inpink
Copy link
Contributor

@inpink inpink commented Oct 27, 2024

Description

Background

Currently, snapshot repository data is not cached if its compressed size exceeds 500KB.
This static limit causes repeated downloads in operations like clone, restore, and status checks, increasing latency.
The limit has not been adjusted for larger heap sizes, impacting repositories with numerous snapshots.
It doesn’t adjust with vertical or horizontal scaling. This restricts caching efficiency, even with enhanced system resources.


Changes

The new setting, opensearch.snapshot.cache.size, allows users to adjust the cache limit as a percentage of heap size. To maintain consistency in how cache sizes are set across OpenSearch, I referenced existing configurations like indices.requests.cache.size and indices.fielddata.cache.size.

  • snapshot.repository_data.cache.size (String): Defines the maximum size of the snapshot repository data cache. This value can be specified either as an absolute size (e.g., 2GB) or as a percentage of the node’s heap memory (e.g., 3%). It is a static setting, meaning it must be defined in the opensearch.yml file. If not set, the default size is 1% of heap memory, with an upper limit of 5%.

  • Based on this discussion, I set the user-selectable cache size range between 0–5%, with a default of 1% to scale proportionally with heap size.

  • For implementing the maximum limit, I referred to the knn plugin and its knn.model.cache.size.limit setting.
    (It would be helpful if the Setting class could directly support maximum memory size limits in the future.)

  • I applied a [SoftReference](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html) to latestKnownRepositoryData, which is already managed with an AtomicReference for thread safety. The approach of combining both references was inspired by Gradle’s implementation.

  • The previous warning for cached data exceeding 5MB was removed. Now that users can configure the cache limit directly and a SoftReference is managing cache memory, issuing a warning above 5MB no longer seemed relevant.

  • I believed that a static setting, rather than dynamic, was more suitable here to avoid further amplifying the unpredictability of SoftReference.

(I plan to contribute documentation for this feature to the OpenSearch project. I have already opened an issue for documentation contribution.)


Testing

Setting Tests

I conducted tests in MemorySizeSettingsTests to verify that the snapshot.repository_data.cache.size option allocates memory as intended. I also confirmed that an exception is triggered if the specified percentage exceeds the maximum limit.

E2E Test with Docker

I assembled the code I modified and ran it in Docker to conduct E2E testing to ensure everything functions properly overall.
(If any additional data is needed for these E2E tests, I am happy to provide it at any time.)

Cache Size Percentage Image
default (1%) 1%
1.5% 1.5% Cache Size
3% 3% Cache Size
500KB 500kb
Over Limit(10%) overlimit

(Click the image to view it in full size.)

Cache Testing

I wrote test code to confirm that the SoftReference works as expected. However, since latestKnownRepositoryData is private, I temporarily changed its access level to protected in my local environment, and confirmed that the tests passed.

  1. testCacheRepositoryData: Verifies that caching works as expected in repository.latestKnownRepositoryData.
  2. testSoftReferenceRepositoryDataCacheCleared: Confirms that the cache is cleared when there is memory pressure and no objects are referencing latestKnownRepositoryData.
    public void testCachingRepositoryData() throws Exception {
        final BlobStoreRepository repository = setupRepo();
        final long pendingGeneration = repository.metadata.pendingGeneration();

        // Check the initial state of repository data and cache status
        assertThat(OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(repository)
            .getSnapshotIds().size(), equalTo(0));
        final RepositoryData emptyData = RepositoryData.EMPTY;
        writeIndexGen(repository, emptyData, emptyData.getGenId());
        RepositoryData repoData = OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(
            repository);

        // Verify caching status
        SoftReference<Tuple<Long, BytesReference>> cachedRef = repository.latestKnownRepositoryData.get();
        assertNotNull(cachedRef);
        Tuple<Long, BytesReference> cachedData = cachedRef.get();
        assertNotNull(cachedData);

        // Verify data consistency
        assertEquals(repoData, emptyData);
        assertEquals(repoData.getIndices().size(), 0);
        assertEquals(repoData.getSnapshotIds().size(), 0);
        assertEquals(pendingGeneration + 1L, repoData.getGenId());
    }

    public void testSoftReferenceRepositoryDataCacheCleared() throws Exception {
        final BlobStoreRepository repository = setupRepo();

        // Check the initial state of repository data and cache status
        assertThat(OpenSearchBlobStoreRepositoryIntegTestCase.getRepositoryData(repository)
            .getSnapshotIds().size(), equalTo(0));
        final RepositoryData emptyData = RepositoryData.EMPTY;
        writeIndexGen(repository, emptyData, emptyData.getGenId());

        // Induce memory pressure to test SoftReference eviction
        try {
            final ArrayList<Object[]> allocations = new ArrayList<>();
            while (true) {
                int size = (int) Math.min(Runtime.getRuntime().freeMemory() / 2, Integer.MAX_VALUE);
                allocations.add(new Object[size]);
            }
        } catch (OutOfMemoryError e) {
            // Memory shortage situation occurred
            System.out.println("Memory pressure caused OutOfMemoryError as expected");
        }

        System.gc();

        // Check if cache has been evicted
        SoftReference<Tuple<Long, BytesReference>> cachedRef = repository.latestKnownRepositoryData.get();
        assertNotNull(cachedRef);
        Tuple<Long, BytesReference> cachedData = cachedRef.get();
        assertNull(cachedData);
    }

Closing Thoughts

I’m very pleased to have contributed to OpenSearchㅡit was an exciting feature to add! Participating in discussions with excellent maintainers to provide a robust caching feature for users has been an incredibly valuable experience for me. If there’s anything that needs adjustment, I’ll make the changes as quickly as possible.

Related Issues

Resolves #16298

Check List

  • Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ]Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for e746f68: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c7563a5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for caa5223: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at how other percentage-based settings are handled and be consistent with those.

While scaling above some small constant is reasonable, we absolutely need a maximum value here.


if (Strings.hasLength(cacheSizeProperty)) {
long maxHeapSize = Runtime.getRuntime().maxMemory();
double cacheSizeRatio = Double.parseDouble(cacheSizeProperty) / 100.0;
Copy link
Member

@dbwiddis dbwiddis Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does this need exception handling? The catch block in lines 3075-3078 won't catch this or give an appropriate error message.
  2. This likely needs protection against values outside the range 0 - some maximum well below 100% (looks like the "max" protects there, but, actually, why disallow disabling the cache with 0?)
  3. This seems to assume we're always giving a percentage. Is this consistent with other percentage-based property parsing? IIRC we need the percentage character (like this) to assume this division by 100, and I'm assuming those likely have their own parsing method somewhere.... likely associated with Setting<ByteSizeValue>. Why are we manually using a system property rather than a setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your helpful feedback. Based on your insights, I am currently revising the code.
Once the discussion on the max limit is completed, I will incorporate that as well and push the updates again. Thank you for your patience!

As you suggested, I have also raised an issue on the documentation project. I appreciate all the guidance! 😄😆😁

@dbwiddis
Copy link
Member

I plan to contribute documentation for this feature to the OpenSearch documentation project.

FYI, you can open the issue for that now (link in the PR template) linking to this PR and assign it to yourself. That helps the doc team plan their work for the next release. Then once the PR is merged you can contribute the doc PR.

Copy link
Contributor

❕ Gradle check result for 00fe7b1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

Attention: Patch coverage is 68.00000% with 8 lines in your changes missing coverage. Please review.

Project coverage is 72.10%. Comparing base (f57b889) to head (c6260ee).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ch/repositories/blobstore/BlobStoreRepository.java 68.00% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16489      +/-   ##
============================================
+ Coverage     72.07%   72.10%   +0.03%     
- Complexity    65048    65057       +9     
============================================
  Files          5313     5313              
  Lines        303442   303458      +16     
  Branches      43910    43915       +5     
============================================
+ Hits         218719   218822     +103     
+ Misses        66786    66689      -97     
- Partials      17937    17947      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for 2c9bca4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@inpink inpink changed the title Add dynamic cache limit for snapshot repository data based on heap size Add vertical scaling and SoftReference for snapshot repository data cache Oct 31, 2024
Copy link
Contributor

❕ Gradle check result for 0aedcb8: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.gateway.RecoveryFromGatewayIT.testBatchModeEnabledWithDisabledTimeoutAndClusterGreen

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@@ -253,6 +257,12 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
*/
public static final String VIRTUAL_DATA_BLOB_PREFIX = "v__";

public static final String SNAPSHOT_REPOSITORY_DATA_CACHE_SIZE_SETTING_NAME = "snapshot.repository_data.cache.size";
Copy link
Collaborator

@reta reta Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was quite a bit work done to centralize cache management:

  • we have CacheService to create caches (and basically provide an opportunity to make sure the caches configurations are sane)
  • we have NodeCacheStats to provide the cache utilization statistics

It looks to me this is the direction we should be taking, no? @peteralfonsi @sgup432 what do you think folks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result caching is substantially more complex than what is happening here. I would really really like to avoid exposing anything in a stat API about this behavior. I would also like to avoid adding a cluster setting too. The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold. The use case here is to simply memoize the latest snapshot metadata because there's a good chance it will need to be accessed again. I'm wondering if changing the threshold to a fixed percentage of the heap (I know settling on the specific value can be difficult) plus the new soft reference behavior as a backstop against OOMs is good enough? No need for any setting or stat or anything like that. @ashking94 @reta what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold.

To me this is the problem: what is relationship between heap size and snapshot metadata? Fixed size is at least predictable measure, but 1% of 32Gb heap is ~300Mb. I think just keeping the weak / soft reference to the snapshot metadata should be sufficient, if JVM is low on heap, those will be reclaimed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @reta!
The snapshot metadata grows with the number of indices and snapshots. As a best practice, the number of indices should be limited based on the available heap memory. I’ve provided more details in my comment, including an estimate of how much the snapshot metadata cache size may increase according to AWS-recommended best practices.
Thank you for your review!

Copy link
Member

@dbwiddis dbwiddis Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot metadata grows with the number of indices and snapshots.

Yes, but to be clear, how often do we access all the snapshots? In my experience we are mostly concerned with the most recent one(s) (for restoring) or the oldest one(s) (which contain the data about indices that haven't changed much).

Both of these would be well served by a LRU / TTL model.

As a best practice, the number of indices should be limited based on the available heap memory.

This alone is a good argument for making the value a fixed percentage of heap, but I'm not sure that multiplying by the number of snapshots is necessary beyond a day or three's worth.

…data cache

- Allows cache size configuration in `opensearch.yml`, adjustable within a range of 0-5% of heap memory, with a default of 1%.
- Applies `SoftReference` to cached repository data for efficient memory management under heap pressure.

Signed-off-by: inpink <[email protected]>
Copy link
Contributor

❕ Gradle check result for c6260ee: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@@ -1132,7 +1170,8 @@ private RepositoryData safeRepositoryData(long repositoryStateId, Map<String, Bl
cached = null;
} else {
genToLoad = latestKnownRepoGen.get();
cached = latestKnownRepositoryData.get();
SoftReference<Tuple<Long, BytesReference>> softRef = latestKnownRepositoryData.get();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inpink could you please educate me why the SoftReference is being chosen and not WeakReference ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understand is a WeakReference will be collected even if the JVM isn't under any significant memory pressure if the object isn't otherwise being referenced. The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.

I admit there's some risk here with a SoftReference, specifically if OpenSearch's circuit breakers prevent the JVM from ever hitting a memory pressure scenario where it will collect soft references, then these references might never be reclaimed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.

Correct, this is my understanding, I think if we go with weak, the risks could be largely eliminated, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we go with weak, the risks could be largely eliminated, right?

I think we'd have the opposite risk: a performance degradation because the weak reference would almost always be collected before it could be re-used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, so what if we go with SoftReference + TTL as an alternative? Indeed, circuit breakers may trigger any time (and this snapshot metadata reference hanging around could be the cause), but with TTL we could actually check the last accessed time and drop the SoftReference altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @andrross has provided an excellent explanation for why SoftReference was chosen. Thank you very much! We previously discussed SoftReference in this issue, so I’m sharing it here in case it might be helpful: link to issue discussion. Any additional thoughts you might have would be greatly appreciated.

@reta, if I understood correctly, your suggestion of a TTL was to help reduce the unpredictability associated with SoftReference—would that be accurate? This cache is accessed each time a snapshot is added, deleted, or restored. If snapshots are taken every hour over 14 days, this would mean metadata for 336 snapshots would be cached (following AWS OpenSearch’s snapshot storage approach as described here). In this case, may I ask what you would consider an appropriate TTL duration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers skip-changelog Storage:Snapshots
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support vertical scaling for snapshot repository data cache limit
4 participants