Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

Open
kkhatua opened this issue Sep 25, 2024 · 6 comments
Open

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

kkhatua opened this issue Sep 25, 2024 · 6 comments
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Search:Performance

Comments

@kkhatua
Copy link
Member

kkhatua commented Sep 25, 2024

Is your feature request related to a problem? Please describe

G1GC has been the defacto garbage collector for OpenSearch. There has been discussion around the tuned settings used for G1GC (#11651), but G1GC, despite its numerous iterations is still fairly old with its introduction since JDK8.

JDK15 introduced ZeroGC (ZGC) in JDK15 for applications that require low latency by offering STW pauses to be as low as 10ms ! In addition, it allows for heaps to be set from as little as 8MB to as high as 16TB in size without impacting the pauses.

JDK21 actually improves this even further with Generational ZGC. This can potentially provide

  • Less likelihood of allocation stalls
  • Lower heap requirement
  • Lower CPU impact due to GC

Describe the solution you'd like

I think there is a strong value in switching to Generational ZGC now that OpenSearch uses JDK21 (Ref: https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/#java-compatibility), with performance benchmarks.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@kkhatua kkhatua added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 25, 2024
@kkhatua kkhatua added Performance This is for any performance related enhancements or bugs Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. and removed Search:Performance labels Sep 25, 2024
@sandeshkr419
Copy link
Contributor

[Search Triage] - I think we can start by adding ZGC in our benchmarks.
@rishabh6788 @gkamat Can either of you take a look once?

@gkamat
Copy link

gkamat commented Oct 9, 2024

We will need to carry out some experiments with ZGC to figure out the appropriate settings to use initially. Subsequently, we can initiate some nightly tests with a couple of workloads like big5 and possibly nyc_taxis.

@rishabh6788
Copy link
Contributor

I can run some performance benchmarks and share results.
Is there any documentation on how to enable ZGC on a cluster and what settings to apply?
@sandeshkr419

@adityamohan93
Copy link

adityamohan93 commented Dec 7, 2024

@rishabh6788 @mch2 @dblock I am from an Amazon internal team using OpenSearch for vector search. Wanted to add our observations from moving an internal RPC service with high traffic, realtime latency constraints and large heap objects from G1 GC to Generational ZGC. After the migration, our heap memory usage became almost equal to heap memory after GC usage. We saw our container memory utilization go down from ~60% to ~5% with almost no change in CPU utilization. ZGC added parallel processing to heap cleanup in order to avoid stop-the-world GC but it was executed on the entire heap memory. Then Generational ZGC added the G1 feature of young and old generations in heap which meant only young generation was being processed using parallel GC threads. Also, in G1 GC, large objects are pushed directly to old generation and they stay there until the stop-the-world GC is triggered. However, Generational ZGC allows large objects to be allocated in the young generation and cleans them much more frequently so essentially no unused objects stay lying around in heap memory.

Would be very interested in seeing the impact of switching OpenSearch to generational ZGC especially for vector indexing which we see uses a lot of memory and CPU. We were motivated to move to JDK 21 and Generational ZGC from this Netflix blog post: https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b

@rishabh6788
Copy link
Contributor

rishabh6788 commented Dec 10, 2024

I ran a simple benchmark using opensearch-benchmark against single-node clusters, one with default G1 GC settings and another with Generational ZGC enabled, both version 2.19 (in development). I had to make modifications to benchmark code as it is hard-coded to only fetch G1 GC metrics. I used nyc_taxis workload with 1-shard-0-replica settings.
Below are the results:

  1. The avg and max heap usage is more with ZGC compared to G1 GC.
  2. The CPU utilization is almost same on both the clusters.
  3. Indexing and a few search queries are performing better, a few have regressed. See comparison results below.
Metric Task Baseline Contender Diff Unit
Cumulative indexing time of primary shards 177.148 166.374 -10.7747 min
Min cumulative indexing time across primary shard 0 0 0 min
Median cumulative indexing time across primary shard 0.00025 0 -0.00025 min
Max cumulative indexing time across primary shard 177.148 166.374 -10.7745 min
Cumulative indexing throttle time of primary shards 0 0 0 min
Min cumulative indexing throttle time across primary shard 0 0 0 min
Median cumulative indexing throttle time across primary shard 0 0 0 min
Max cumulative indexing throttle time across primary shard 0 0 0 min
Cumulative merge time of primary shards 86.9893 74.3284 -12.6609 min
Cumulative merge count of primary shards 61 60 -1
Min cumulative merge time across primary shard 0 0 0 min
Median cumulative merge time across primary shard 0 0 0 min
Max cumulative merge time across primary shard 86.9893 74.3284 -12.6609 min
Cumulative merge throttle time of primary shards 32.4893 22.9989 -9.49043 min
Min cumulative merge throttle time across primary shard 0 0 0 min
Median cumulative merge throttle time across primary shard 0 0 0 min
Max cumulative merge throttle time across primary shard 32.4893 22.9989 -9.49043 min
Cumulative refresh time of primary shards 10.1675 10.3381 0.17057 min
Cumulative refresh count of primary shards 137 123 -14
Min cumulative refresh time across primary shard 0 0 0 min
Median cumulative refresh time across primary shard 0.000616667 0 -0.00062 min
Max cumulative refresh time across primary shard 10.1669 10.3381 0.17118 min
Cumulative flush time of primary shards 3.33785 4.63167 1.29382 min
Cumulative flush count of primary shards 37 40 3
Min cumulative flush time across primary shard 0 0 0 min
Median cumulative flush time across primary shard 0.000166667 0 -0.00017 min
Max cumulative flush time across primary shard 3.33768 4.63167 1.29398 min
Store size 23.6706 23.6025 -0.06805 GB
Translog size 1.53668e-07 1.53668e-07 0 GB
Heap used for segments 0 0 0 MB
Heap used for doc values 0 0 0 MB
Heap used for terms 0 0 0 MB
Heap used for norms 0 0 0 MB
Heap used for points 0 0 0 MB
Heap used for stored fields 0 0 0 MB
Segment count 31 35 4
Min Throughput index 56458.2 60229.2 3771 docs/s
Mean Throughput index 59536 63671.5 4135.51 docs/s
Median Throughput index 59405.6 62868.2 3462.66 docs/s
Max Throughput index 64909 71426.4 6517.34 docs/s
50th percentile latency index 1238.24 1146.45 -91.7937 ms
90th percentile latency index 1786.89 1652.7 -134.196 ms
99th percentile latency index 4049.46 4617.23 567.773 ms
99.9th percentile latency index 11148.8 12673.4 1524.6 ms
99.99th percentile latency index 12729 15107.3 2378.25 ms
100th percentile latency index 13801.2 15481.8 1680.6 ms
50th percentile service time index 1238.24 1146.45 -91.7937 ms
90th percentile service time index 1786.89 1652.7 -134.196 ms
99th percentile service time index 4049.46 4617.23 567.773 ms
99.9th percentile service time index 11148.8 12673.4 1524.6 ms
99.99th percentile service time index 12729 15107.3 2378.25 ms
100th percentile service time index 13801.2 15481.8 1680.6 ms
error rate index 0.00668047 0.00675539 7e-05 %
Min Throughput wait-until-merges-finish 0.00231867 0.00325038 0.00093 ops/s
Mean Throughput wait-until-merges-finish 0.00231867 0.00325038 0.00093 ops/s
Median Throughput wait-until-merges-finish 0.00231867 0.00325038 0.00093 ops/s
Max Throughput wait-until-merges-finish 0.00231867 0.00325038 0.00093 ops/s
100th percentile latency wait-until-merges-finish 431281 307656 -123626 ms
100th percentile service time wait-until-merges-finish 431281 307656 -123626 ms
error rate wait-until-merges-finish 0 0 0 %
Min Throughput default 3.01951 3.01443 -0.00508 ops/s
Mean Throughput default 3.03179 3.02352 -0.00827 ops/s
Median Throughput default 3.02892 3.0214 -0.00752 ops/s
Max Throughput default 3.05619 3.04142 -0.01477 ops/s
50th percentile latency default 5.30731 6.75564 1.44833 ms
90th percentile latency default 5.67577 7.30976 1.63399 ms
99th percentile latency default 5.91656 8.32807 2.41151 ms
100th percentile latency default 5.94291 8.59634 2.65343 ms
50th percentile service time default 4.1828 5.51889 1.33608 ms
90th percentile service time default 4.29625 6.0664 1.77015 ms
99th percentile service time default 4.50608 7.34017 2.83409 ms
100th percentile service time default 4.64957 7.38317 2.7336 ms
error rate default 0 0 0 %
Min Throughput range 0.702196 0.702031 -0.00016 ops/s
Mean Throughput range 0.703607 0.703333 -0.00027 ops/s
Median Throughput range 0.703283 0.703035 -0.00025 ops/s
Max Throughput range 0.706491 0.706003 -0.00049 ops/s
50th percentile latency range 297.193 215.515 -81.6783 ms
90th percentile latency range 298.749 219.18 -79.5687 ms
99th percentile latency range 301.343 232.075 -69.2674 ms
100th percentile latency range 301.713 236.73 -64.9832 ms
50th percentile service time range 295.453 213.396 -82.0571 ms
90th percentile service time range 296.405 217.406 -78.9986 ms
99th percentile service time range 299.469 229.92 -69.5489 ms
100th percentile service time range 299.941 234.429 -65.5117 ms
error rate range 0 0 0 %
Min Throughput distance_amount_agg 0.053605 0.0598739 0.00627 ops/s
Mean Throughput distance_amount_agg 0.0536178 0.0598927 0.00627 ops/s
Median Throughput distance_amount_agg 0.0536185 0.0598951 0.00628 ops/s
Max Throughput distance_amount_agg 0.0536331 0.0599072 0.00627 ops/s
50th percentile latency distance_amount_agg 1.37111e+06 1.22383e+06 -147286 ms
90th percentile latency distance_amount_agg 1.73413e+06 1.54815e+06 -185979 ms
100th percentile latency distance_amount_agg 1.81572e+06 1.62122e+06 -194498 ms
50th percentile service time distance_amount_agg 18625.1 16709.6 -1915.49 ms
90th percentile service time distance_amount_agg 18758.5 16737.2 -2021.33 ms
100th percentile service time distance_amount_agg 18893.9 16887.7 -2006.25 ms
error rate distance_amount_agg 0 0 0 %
Min Throughput autohisto_agg 1.50854 1.50806 -0.00047 ops/s
Mean Throughput autohisto_agg 1.5141 1.51333 -0.00077 ops/s
Median Throughput autohisto_agg 1.51283 1.51214 -0.00069 ops/s
Max Throughput autohisto_agg 1.52535 1.52401 -0.00135 ops/s
50th percentile latency autohisto_agg 17.5238 20.0885 2.56468 ms
90th percentile latency autohisto_agg 17.9258 20.6454 2.71969 ms
99th percentile latency autohisto_agg 18.2304 22.6354 4.40501 ms
100th percentile latency autohisto_agg 18.3398 23.169 4.8292 ms
50th percentile service time autohisto_agg 16.0303 18.5075 2.47711 ms
90th percentile service time autohisto_agg 16.198 19.0506 2.85262 ms
99th percentile service time autohisto_agg 16.6732 21.0932 4.41992 ms
100th percentile service time autohisto_agg 16.6781 21.2906 4.61245 ms
error rate autohisto_agg 0 0 0 %
Min Throughput date_histogram_agg 1.50972 1.50959 -0.00013 ops/s
Mean Throughput date_histogram_agg 1.51607 1.51585 -0.00022 ops/s
Median Throughput date_histogram_agg 1.51462 1.51442 -0.0002 ops/s
Max Throughput date_histogram_agg 1.52894 1.52855 -0.00039 ops/s
50th percentile latency date_histogram_agg 17.3491 19.4006 2.05152 ms
90th percentile latency date_histogram_agg 17.8035 19.8617 2.05826 ms
99th percentile latency date_histogram_agg 17.9989 21.5694 3.57057 ms
100th percentile latency date_histogram_agg 18.0061 22.0181 4.01199 ms
50th percentile service time date_histogram_agg 15.9286 17.886 1.95739 ms
90th percentile service time date_histogram_agg 16.0704 18.0982 2.0278 ms
99th percentile service time date_histogram_agg 16.2659 19.9985 3.73263 ms
100th percentile service time date_histogram_agg 16.3062 20.2866 3.98039 ms
error rate date_histogram_agg 0 0 0 %
Min Throughput desc_sort_tip_amount 0.502468 0.502531 6e-05 ops/s
Mean Throughput desc_sort_tip_amount 0.504058 0.504162 0.0001 ops/s
Median Throughput desc_sort_tip_amount 0.503692 0.503787 9e-05 ops/s
Max Throughput desc_sort_tip_amount 0.507327 0.507517 0.00019 ops/s
50th percentile latency desc_sort_tip_amount 51.047 47.8502 -3.19679 ms
90th percentile latency desc_sort_tip_amount 51.9189 48.7785 -3.14041 ms
99th percentile latency desc_sort_tip_amount 53.8065 60.677 6.87047 ms
100th percentile latency desc_sort_tip_amount 55.1832 60.9529 5.76976 ms
50th percentile service time desc_sort_tip_amount 48.4245 45.0815 -3.34298 ms
90th percentile service time desc_sort_tip_amount 48.9448 45.9854 -2.95939 ms
99th percentile service time desc_sort_tip_amount 50.989 58.8204 7.83135 ms
100th percentile service time desc_sort_tip_amount 52.3877 59.5733 7.18559 ms
error rate desc_sort_tip_amount 0 0 0 %
Min Throughput asc_sort_tip_amount 0.50319 0.502946 -0.00024 ops/s
Mean Throughput asc_sort_tip_amount 0.505252 0.50485 -0.0004 ops/s
Median Throughput asc_sort_tip_amount 0.504775 0.504411 -0.00036 ops/s
Max Throughput asc_sort_tip_amount 0.509497 0.508767 -0.00073 ops/s
50th percentile latency asc_sort_tip_amount 7.84473 8.39563 0.5509 ms
90th percentile latency asc_sort_tip_amount 8.35143 8.88081 0.52938 ms
99th percentile latency asc_sort_tip_amount 8.47489 22.1422 13.6673 ms
100th percentile latency asc_sort_tip_amount 8.47799 29.3989 20.9209 ms
50th percentile service time asc_sort_tip_amount 5.07141 5.63181 0.5604 ms
90th percentile service time asc_sort_tip_amount 5.13877 5.79324 0.65447 ms
99th percentile service time asc_sort_tip_amount 5.27634 19.4516 14.1752 ms
100th percentile service time asc_sort_tip_amount 5.28687 26.5044 21.2175 ms
error rate asc_sort_tip_amount 0 0 0 %

Avg Heap:

avg-heap

Max Heap

max-heap

Avg CPU

CPU

@kkhatua @reta @sandeshkr419 @getsaurabh02 @dblock

Next I can try is big5 workload.

@reta
Copy link
Collaborator

reta commented Dec 10, 2024

@kkhatua @reta @sandeshkr419 @getsaurabh02 @dblock

@rishabh6788 thanks a lot for picking it up, we are long overdue on this subject (G1, ZGC, Shenandoah GC recommendations). There is an issue open on OBS side [1], may be we could parametrize the GC selection to run full suites of tests to have a holistic picture.

[1] opensearch-project/opensearch-benchmark#333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs Search:Performance
Projects
Status: 🆕 New
Development

No branches or pull requests

7 participants