-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vertical scaling and SoftReference for snapshot repository data cache #16489
base: main
Are you sure you want to change the base?
Conversation
9061419
to
e746f68
Compare
❌ Gradle check result for e746f68: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for c7563a5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for caa5223: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at how other percentage-based settings are handled and be consistent with those.
While scaling above some small constant is reasonable, we absolutely need a maximum value here.
|
||
if (Strings.hasLength(cacheSizeProperty)) { | ||
long maxHeapSize = Runtime.getRuntime().maxMemory(); | ||
double cacheSizeRatio = Double.parseDouble(cacheSizeProperty) / 100.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Does this need exception handling? The catch block in lines 3075-3078 won't catch this or give an appropriate error message.
- This likely needs protection against values outside the range 0 - some maximum well below 100% (looks like the "max" protects there, but, actually, why disallow disabling the cache with 0?)
- This seems to assume we're always giving a percentage. Is this consistent with other percentage-based property parsing? IIRC we need the percentage character (like this) to assume this division by 100, and I'm assuming those likely have their own parsing method somewhere.... likely associated with
Setting<ByteSizeValue>
. Why are we manually using a system property rather than a setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your helpful feedback. Based on your insights, I am currently revising the code.
Once the discussion on the max limit is completed, I will incorporate that as well and push the updates again. Thank you for your patience!
As you suggested, I have also raised an issue on the documentation project. I appreciate all the guidance! 😄😆😁
FYI, you can open the issue for that now (link in the PR template) linking to this PR and assign it to yourself. That helps the doc team plan their work for the next release. Then once the PR is merged you can contribute the doc PR. |
6e8e8d3
to
00fe7b1
Compare
❕ Gradle check result for 00fe7b1: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16489 +/- ##
============================================
+ Coverage 72.07% 72.10% +0.03%
- Complexity 65048 65057 +9
============================================
Files 5313 5313
Lines 303442 303458 +16
Branches 43910 43915 +5
============================================
+ Hits 218719 218822 +103
+ Misses 66786 66689 -97
- Partials 17937 17947 +10 ☔ View full report in Codecov by Sentry. |
❌ Gradle check result for 2c9bca4: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❕ Gradle check result for 0aedcb8: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@@ -253,6 +257,12 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp | |||
*/ | |||
public static final String VIRTUAL_DATA_BLOB_PREFIX = "v__"; | |||
|
|||
public static final String SNAPSHOT_REPOSITORY_DATA_CACHE_SIZE_SETTING_NAME = "snapshot.repository_data.cache.size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was quite a bit work done to centralize cache management:
- we have
CacheService
to create caches (and basically provide an opportunity to make sure the caches configurations are sane) - we have
NodeCacheStats
to provide the cache utilization statistics
It looks to me this is the direction we should be taking, no? @peteralfonsi @sgup432 what do you think folks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result caching is substantially more complex than what is happening here. I would really really like to avoid exposing anything in a stat API about this behavior. I would also like to avoid adding a cluster setting too. The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold. The use case here is to simply memoize the latest snapshot metadata because there's a good chance it will need to be accessed again. I'm wondering if changing the threshold to a fixed percentage of the heap (I know settling on the specific value can be difficult) plus the new soft reference behavior as a backstop against OOMs is good enough? No need for any setting or stat or anything like that. @ashking94 @reta what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original proposal was "that we have cache size which is x% of heap size" instead of just a fixed 500KB threshold.
To me this is the problem: what is relationship between heap size and snapshot metadata? Fixed size is at least predictable measure, but 1% of 32Gb heap is ~300Mb. I think just keeping the weak / soft reference to the snapshot metadata should be sufficient, if JVM is low on heap, those will be reclaimed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @reta!
The snapshot metadata grows with the number of indices and snapshots. As a best practice, the number of indices should be limited based on the available heap memory. I’ve provided more details in my comment, including an estimate of how much the snapshot metadata cache size may increase according to AWS-recommended best practices.
Thank you for your review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot metadata grows with the number of indices and snapshots.
Yes, but to be clear, how often do we access all the snapshots? In my experience we are mostly concerned with the most recent one(s) (for restoring) or the oldest one(s) (which contain the data about indices that haven't changed much).
Both of these would be well served by a LRU / TTL model.
As a best practice, the number of indices should be limited based on the available heap memory.
This alone is a good argument for making the value a fixed percentage of heap, but I'm not sure that multiplying by the number of snapshots is necessary beyond a day or three's worth.
…data cache - Allows cache size configuration in `opensearch.yml`, adjustable within a range of 0-5% of heap memory, with a default of 1%. - Applies `SoftReference` to cached repository data for efficient memory management under heap pressure. Signed-off-by: inpink <[email protected]>
❕ Gradle check result for c6260ee: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
@@ -1132,7 +1170,8 @@ private RepositoryData safeRepositoryData(long repositoryStateId, Map<String, Bl | |||
cached = null; | |||
} else { | |||
genToLoad = latestKnownRepoGen.get(); | |||
cached = latestKnownRepositoryData.get(); | |||
SoftReference<Tuple<Long, BytesReference>> softRef = latestKnownRepositoryData.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inpink could you please educate me why the SoftReference
is being chosen and not WeakReference
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understand is a WeakReference will be collected even if the JVM isn't under any significant memory pressure if the object isn't otherwise being referenced. The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.
I admit there's some risk here with a SoftReference, specifically if OpenSearch's circuit breakers prevent the JVM from ever hitting a memory pressure scenario where it will collect soft references, then these references might never be reclaimed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JVM will hang on to a SoftReference using a more complicated formula, but is guaranteed to reclaim the space before OOMing.
Correct, this is my understanding, I think if we go with weak, the risks could be largely eliminated, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we go with weak, the risks could be largely eliminated, right?
I think we'd have the opposite risk: a performance degradation because the weak reference would almost always be collected before it could be re-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, so what if we go with SoftReference
+ TTL as an alternative? Indeed, circuit breakers may trigger any time (and this snapshot metadata reference hanging around could be the cause), but with TTL we could actually check the last accessed time and drop the SoftReference
altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @andrross has provided an excellent explanation for why SoftReference was chosen. Thank you very much! We previously discussed SoftReference in this issue, so I’m sharing it here in case it might be helpful: link to issue discussion. Any additional thoughts you might have would be greatly appreciated.
@reta, if I understood correctly, your suggestion of a TTL was to help reduce the unpredictability associated with SoftReference—would that be accurate? This cache is accessed each time a snapshot is added, deleted, or restored. If snapshots are taken every hour over 14 days, this would mean metadata for 336 snapshots would be cached (following AWS OpenSearch’s snapshot storage approach as described here). In this case, may I ask what you would consider an appropriate TTL duration?
Description
Background
Currently, snapshot repository data is not cached if its compressed size exceeds 500KB.
This static limit causes repeated downloads in operations like clone, restore, and status checks, increasing latency.
The limit has not been adjusted for larger heap sizes, impacting repositories with numerous snapshots.
It doesn’t adjust with vertical or horizontal scaling. This restricts caching efficiency, even with enhanced system resources.
Changes
The new setting,
opensearch.snapshot.cache.size
, allows users to adjust the cache limit as a percentage of heap size. To maintain consistency in how cache sizes are set across OpenSearch, I referenced existing configurations like indices.requests.cache.size and indices.fielddata.cache.size.snapshot.repository_data.cache.size
(String): Defines the maximum size of the snapshot repository data cache. This value can be specified either as an absolute size (e.g., 2GB) or as a percentage of the node’s heap memory (e.g., 3%). It is a static setting, meaning it must be defined in the opensearch.yml file. If not set, the default size is 1% of heap memory, with an upper limit of 5%.Based on this discussion, I set the user-selectable cache size range between
0–5%
, with a default of1%
to scale proportionally with heap size.For implementing the maximum limit, I referred to the knn plugin and its knn.model.cache.size.limit setting.
(It would be helpful if the Setting class could directly support maximum memory size limits in the future.)
I applied a
[SoftReference](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/ref/SoftReference.html)
tolatestKnownRepositoryData
, which is already managed with anAtomicReference
for thread safety. The approach of combining both references was inspired by Gradle’s implementation.The previous warning for cached data exceeding
5MB
was removed. Now that users can configure the cache limit directly and aSoftReference
is managing cache memory, issuing a warning above5MB
no longer seemed relevant.I believed that a
static
setting, rather thandynamic
, was more suitable here to avoid further amplifying the unpredictability ofSoftReference
.(I plan to contribute documentation for this feature to the OpenSearch project. I have already opened an issue for documentation contribution.)
Testing
I conducted tests in
MemorySizeSettingsTests
to verify that thesnapshot.repository_data.cache.size
option allocates memory as intended. I also confirmed that an exception is triggered if the specified percentage exceeds the maximum limit.I assembled the code I modified and ran it in Docker to conduct E2E testing to ensure everything functions properly overall.
(If any additional data is needed for these E2E tests, I am happy to provide it at any time.)
(Click the image to view it in full size.)
I wrote test code to confirm that the
SoftReference
works as expected. However, sincelatestKnownRepositoryData
is private, I temporarily changed its access level to protected in my local environment, and confirmed that the tests passed.testCacheRepositoryData
: Verifies that caching works as expected inrepository.latestKnownRepositoryData
.testSoftReferenceRepositoryDataCacheCleared
: Confirms that the cache is cleared when there is memory pressure and no objects are referencinglatestKnownRepositoryData
.Closing Thoughts
I’m very pleased to have contributed to OpenSearchㅡit was an exciting feature to add! Participating in discussions with excellent maintainers to provide a robust caching feature for users has been an incredibly valuable experience for me. If there’s anything that needs adjustment, I’ll make the changes as quickly as possible.
Related Issues
Resolves #16298
Check List
[ ] API changes companion pull request created, if applicable.[ ]Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.