-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Provide ZGC as defacto JVM garbage collector #16084
Comments
[Search Triage] - I think we can start by adding ZGC in our benchmarks. |
We will need to carry out some experiments with ZGC to figure out the appropriate settings to use initially. Subsequently, we can initiate some nightly tests with a couple of workloads like big5 and possibly nyc_taxis. |
I can run some performance benchmarks and share results. |
@rishabh6788 @mch2 @dblock I am from an Amazon internal team using OpenSearch for vector search. Wanted to add our observations from moving an internal RPC service with high traffic, realtime latency constraints and large heap objects from G1 GC to Generational ZGC. After the migration, our heap memory usage became almost equal to heap memory after GC usage. We saw our container memory utilization go down from ~60% to ~5% with almost no change in CPU utilization. ZGC added parallel processing to heap cleanup in order to avoid stop-the-world GC but it was executed on the entire heap memory. Then Generational ZGC added the G1 feature of young and old generations in heap which meant only young generation was being processed using parallel GC threads. Also, in G1 GC, large objects are pushed directly to old generation and they stay there until the stop-the-world GC is triggered. However, Generational ZGC allows large objects to be allocated in the young generation and cleans them much more frequently so essentially no unused objects stay lying around in heap memory. Would be very interested in seeing the impact of switching OpenSearch to generational ZGC especially for vector indexing which we see uses a lot of memory and CPU. We were motivated to move to JDK 21 and Generational ZGC from this Netflix blog post: https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b |
I ran a simple benchmark using opensearch-benchmark against single-node clusters, one with default G1 GC settings and another with Generational ZGC enabled, both version 2.19 (in development). I had to make modifications to benchmark code as it is hard-coded to only fetch G1 GC metrics. I used
Avg Heap:Max HeapAvg CPU@kkhatua @reta @sandeshkr419 @getsaurabh02 @dblock Next I can try is big5 workload. |
@rishabh6788 thanks a lot for picking it up, we are long overdue on this subject (G1, ZGC, Shenandoah GC recommendations). There is an issue open on OBS side [1], may be we could parametrize the GC selection to run full suites of tests to have a holistic picture. |
Is your feature request related to a problem? Please describe
G1GC has been the defacto garbage collector for OpenSearch. There has been discussion around the tuned settings used for G1GC (#11651), but G1GC, despite its numerous iterations is still fairly old with its introduction since JDK8.
JDK15 introduced ZeroGC (ZGC) in JDK15 for applications that require low latency by offering STW pauses to be as low as 10ms ! In addition, it allows for heaps to be set from as little as 8MB to as high as 16TB in size without impacting the pauses.
JDK21 actually improves this even further with Generational ZGC. This can potentially provide
Describe the solution you'd like
I think there is a strong value in switching to Generational ZGC now that OpenSearch uses JDK21 (Ref: https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/#java-compatibility), with performance benchmarks.
Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: