[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

kkhatua · 2024-09-25T23:42:11Z

Is your feature request related to a problem? Please describe

G1GC has been the defacto garbage collector for OpenSearch. There has been discussion around the tuned settings used for G1GC (#11651), but G1GC, despite its numerous iterations is still fairly old with its introduction since JDK8.

JDK15 introduced ZeroGC (ZGC) in JDK15 for applications that require low latency by offering STW pauses to be as low as 10ms ! In addition, it allows for heaps to be set from as little as 8MB to as high as 16TB in size without impacting the pauses.

JDK21 actually improves this even further with Generational ZGC. This can potentially provide

Less likelihood of allocation stalls
Lower heap requirement
Lower CPU impact due to GC

Describe the solution you'd like

I think there is a strong value in switching to Generational ZGC now that OpenSearch uses JDK21 (Ref: https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/#java-compatibility), with performance benchmarks.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

sandeshkr419 · 2024-10-09T16:11:30Z

[Search Triage] - I think we can start by adding ZGC in our benchmarks.
@rishabh6788 @gkamat Can either of you take a look once?

gkamat · 2024-10-09T16:16:37Z

We will need to carry out some experiments with ZGC to figure out the appropriate settings to use initially. Subsequently, we can initiate some nightly tests with a couple of workloads like big5 and possibly nyc_taxis.

rishabh6788 · 2024-11-12T22:15:14Z

I can run some performance benchmarks and share results.
Is there any documentation on how to enable ZGC on a cluster and what settings to apply?
@sandeshkr419

adityamohan93 · 2024-12-07T21:54:39Z

@rishabh6788 @mch2 @dblock I am from an Amazon internal team using OpenSearch for vector search. Wanted to add our observations from moving an internal RPC service with high traffic, realtime latency constraints and large heap objects from G1 GC to Generational ZGC. After the migration, our heap memory usage became almost equal to heap memory after GC usage. We saw our container memory utilization go down from ~60% to ~5% with almost no change in CPU utilization. ZGC added parallel processing to heap cleanup in order to avoid stop-the-world GC but it was executed on the entire heap memory. Then Generational ZGC added the G1 feature of young and old generations in heap which meant only young generation was being processed using parallel GC threads. Also, in G1 GC, large objects are pushed directly to old generation and they stay there until the stop-the-world GC is triggered. However, Generational ZGC allows large objects to be allocated in the young generation and cleans them much more frequently so essentially no unused objects stay lying around in heap memory.

Would be very interested in seeing the impact of switching OpenSearch to generational ZGC especially for vector indexing which we see uses a lot of memory and CPU. We were motivated to move to JDK 21 and Generational ZGC from this Netflix blog post: https://netflixtechblog.com/bending-pause-times-to-your-will-with-generational-zgc-256629c9386b

rishabh6788 · 2024-12-10T07:50:45Z

I ran a simple benchmark using opensearch-benchmark against single-node clusters, one with default G1 GC settings and another with Generational ZGC enabled, both version 2.19 (in development). I had to make modifications to benchmark code as it is hard-coded to only fetch G1 GC metrics. I used nyc_taxis workload with 1-shard-0-replica settings.
Below are the results:

The avg and max heap usage is more with ZGC compared to G1 GC.
The CPU utilization is almost same on both the clusters.
Indexing and a few search queries are performing better, a few have regressed. See comparison results below.

Metric	Task	Baseline	Contender	Diff	Unit
Cumulative indexing time of primary shards		177.148	166.374	-10.7747	min
Min cumulative indexing time across primary shard		0	0	0	min
Median cumulative indexing time across primary shard		0.00025	0	-0.00025	min
Max cumulative indexing time across primary shard		177.148	166.374	-10.7745	min
Cumulative indexing throttle time of primary shards		0	0	0	min
Min cumulative indexing throttle time across primary shard		0	0	0	min
Median cumulative indexing throttle time across primary shard		0	0	0	min
Max cumulative indexing throttle time across primary shard		0	0	0	min
Cumulative merge time of primary shards		86.9893	74.3284	-12.6609	min
Cumulative merge count of primary shards		61	60	-1
Min cumulative merge time across primary shard		0	0	0	min
Median cumulative merge time across primary shard		0	0	0	min
Max cumulative merge time across primary shard		86.9893	74.3284	-12.6609	min
Cumulative merge throttle time of primary shards		32.4893	22.9989	-9.49043	min
Min cumulative merge throttle time across primary shard		0	0	0	min
Median cumulative merge throttle time across primary shard		0	0	0	min
Max cumulative merge throttle time across primary shard		32.4893	22.9989	-9.49043	min
Cumulative refresh time of primary shards		10.1675	10.3381	0.17057	min
Cumulative refresh count of primary shards		137	123	-14
Min cumulative refresh time across primary shard		0	0	0	min
Median cumulative refresh time across primary shard		0.000616667	0	-0.00062	min
Max cumulative refresh time across primary shard		10.1669	10.3381	0.17118	min
Cumulative flush time of primary shards		3.33785	4.63167	1.29382	min
Cumulative flush count of primary shards		37	40	3
Min cumulative flush time across primary shard		0	0	0	min
Median cumulative flush time across primary shard		0.000166667	0	-0.00017	min
Max cumulative flush time across primary shard		3.33768	4.63167	1.29398	min
Store size		23.6706	23.6025	-0.06805	GB
Translog size		1.53668e-07	1.53668e-07	0	GB
Heap used for segments		0	0	0	MB
Heap used for doc values		0	0	0	MB
Heap used for terms		0	0	0	MB
Heap used for norms		0	0	0	MB
Heap used for points		0	0	0	MB
Heap used for stored fields		0	0	0	MB
Segment count		31	35	4
Min Throughput	index	56458.2	60229.2	3771	docs/s
Mean Throughput	index	59536	63671.5	4135.51	docs/s
Median Throughput	index	59405.6	62868.2	3462.66	docs/s
Max Throughput	index	64909	71426.4	6517.34	docs/s
50th percentile latency	index	1238.24	1146.45	-91.7937	ms
90th percentile latency	index	1786.89	1652.7	-134.196	ms
99th percentile latency	index	4049.46	4617.23	567.773	ms
99.9th percentile latency	index	11148.8	12673.4	1524.6	ms
99.99th percentile latency	index	12729	15107.3	2378.25	ms
100th percentile latency	index	13801.2	15481.8	1680.6	ms
50th percentile service time	index	1238.24	1146.45	-91.7937	ms
90th percentile service time	index	1786.89	1652.7	-134.196	ms
99th percentile service time	index	4049.46	4617.23	567.773	ms
99.9th percentile service time	index	11148.8	12673.4	1524.6	ms
99.99th percentile service time	index	12729	15107.3	2378.25	ms
100th percentile service time	index	13801.2	15481.8	1680.6	ms
error rate	index	0.00668047	0.00675539	7e-05	%
Min Throughput	wait-until-merges-finish	0.00231867	0.00325038	0.00093	ops/s
Mean Throughput	wait-until-merges-finish	0.00231867	0.00325038	0.00093	ops/s
Median Throughput	wait-until-merges-finish	0.00231867	0.00325038	0.00093	ops/s
Max Throughput	wait-until-merges-finish	0.00231867	0.00325038	0.00093	ops/s
100th percentile latency	wait-until-merges-finish	431281	307656	-123626	ms
100th percentile service time	wait-until-merges-finish	431281	307656	-123626	ms
error rate	wait-until-merges-finish	0	0	0	%
Min Throughput	default	3.01951	3.01443	-0.00508	ops/s
Mean Throughput	default	3.03179	3.02352	-0.00827	ops/s
Median Throughput	default	3.02892	3.0214	-0.00752	ops/s
Max Throughput	default	3.05619	3.04142	-0.01477	ops/s
50th percentile latency	default	5.30731	6.75564	1.44833	ms
90th percentile latency	default	5.67577	7.30976	1.63399	ms
99th percentile latency	default	5.91656	8.32807	2.41151	ms
100th percentile latency	default	5.94291	8.59634	2.65343	ms
50th percentile service time	default	4.1828	5.51889	1.33608	ms
90th percentile service time	default	4.29625	6.0664	1.77015	ms
99th percentile service time	default	4.50608	7.34017	2.83409	ms
100th percentile service time	default	4.64957	7.38317	2.7336	ms
error rate	default	0	0	0	%
Min Throughput	range	0.702196	0.702031	-0.00016	ops/s
Mean Throughput	range	0.703607	0.703333	-0.00027	ops/s
Median Throughput	range	0.703283	0.703035	-0.00025	ops/s
Max Throughput	range	0.706491	0.706003	-0.00049	ops/s
50th percentile latency	range	297.193	215.515	-81.6783	ms
90th percentile latency	range	298.749	219.18	-79.5687	ms
99th percentile latency	range	301.343	232.075	-69.2674	ms
100th percentile latency	range	301.713	236.73	-64.9832	ms
50th percentile service time	range	295.453	213.396	-82.0571	ms
90th percentile service time	range	296.405	217.406	-78.9986	ms
99th percentile service time	range	299.469	229.92	-69.5489	ms
100th percentile service time	range	299.941	234.429	-65.5117	ms
error rate	range	0	0	0	%
Min Throughput	distance_amount_agg	0.053605	0.0598739	0.00627	ops/s
Mean Throughput	distance_amount_agg	0.0536178	0.0598927	0.00627	ops/s
Median Throughput	distance_amount_agg	0.0536185	0.0598951	0.00628	ops/s
Max Throughput	distance_amount_agg	0.0536331	0.0599072	0.00627	ops/s
50th percentile latency	distance_amount_agg	1.37111e+06	1.22383e+06	-147286	ms
90th percentile latency	distance_amount_agg	1.73413e+06	1.54815e+06	-185979	ms
100th percentile latency	distance_amount_agg	1.81572e+06	1.62122e+06	-194498	ms
50th percentile service time	distance_amount_agg	18625.1	16709.6	-1915.49	ms
90th percentile service time	distance_amount_agg	18758.5	16737.2	-2021.33	ms
100th percentile service time	distance_amount_agg	18893.9	16887.7	-2006.25	ms
error rate	distance_amount_agg	0	0	0	%
Min Throughput	autohisto_agg	1.50854	1.50806	-0.00047	ops/s
Mean Throughput	autohisto_agg	1.5141	1.51333	-0.00077	ops/s
Median Throughput	autohisto_agg	1.51283	1.51214	-0.00069	ops/s
Max Throughput	autohisto_agg	1.52535	1.52401	-0.00135	ops/s
50th percentile latency	autohisto_agg	17.5238	20.0885	2.56468	ms
90th percentile latency	autohisto_agg	17.9258	20.6454	2.71969	ms
99th percentile latency	autohisto_agg	18.2304	22.6354	4.40501	ms
100th percentile latency	autohisto_agg	18.3398	23.169	4.8292	ms
50th percentile service time	autohisto_agg	16.0303	18.5075	2.47711	ms
90th percentile service time	autohisto_agg	16.198	19.0506	2.85262	ms
99th percentile service time	autohisto_agg	16.6732	21.0932	4.41992	ms
100th percentile service time	autohisto_agg	16.6781	21.2906	4.61245	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50972	1.50959	-0.00013	ops/s
Mean Throughput	date_histogram_agg	1.51607	1.51585	-0.00022	ops/s
Median Throughput	date_histogram_agg	1.51462	1.51442	-0.0002	ops/s
Max Throughput	date_histogram_agg	1.52894	1.52855	-0.00039	ops/s
50th percentile latency	date_histogram_agg	17.3491	19.4006	2.05152	ms
90th percentile latency	date_histogram_agg	17.8035	19.8617	2.05826	ms
99th percentile latency	date_histogram_agg	17.9989	21.5694	3.57057	ms
100th percentile latency	date_histogram_agg	18.0061	22.0181	4.01199	ms
50th percentile service time	date_histogram_agg	15.9286	17.886	1.95739	ms
90th percentile service time	date_histogram_agg	16.0704	18.0982	2.0278	ms
99th percentile service time	date_histogram_agg	16.2659	19.9985	3.73263	ms
100th percentile service time	date_histogram_agg	16.3062	20.2866	3.98039	ms
error rate	date_histogram_agg	0	0	0	%
Min Throughput	desc_sort_tip_amount	0.502468	0.502531	6e-05	ops/s
Mean Throughput	desc_sort_tip_amount	0.504058	0.504162	0.0001	ops/s
Median Throughput	desc_sort_tip_amount	0.503692	0.503787	9e-05	ops/s
Max Throughput	desc_sort_tip_amount	0.507327	0.507517	0.00019	ops/s
50th percentile latency	desc_sort_tip_amount	51.047	47.8502	-3.19679	ms
90th percentile latency	desc_sort_tip_amount	51.9189	48.7785	-3.14041	ms
99th percentile latency	desc_sort_tip_amount	53.8065	60.677	6.87047	ms
100th percentile latency	desc_sort_tip_amount	55.1832	60.9529	5.76976	ms
50th percentile service time	desc_sort_tip_amount	48.4245	45.0815	-3.34298	ms
90th percentile service time	desc_sort_tip_amount	48.9448	45.9854	-2.95939	ms
99th percentile service time	desc_sort_tip_amount	50.989	58.8204	7.83135	ms
100th percentile service time	desc_sort_tip_amount	52.3877	59.5733	7.18559	ms
error rate	desc_sort_tip_amount	0	0	0	%
Min Throughput	asc_sort_tip_amount	0.50319	0.502946	-0.00024	ops/s
Mean Throughput	asc_sort_tip_amount	0.505252	0.50485	-0.0004	ops/s
Median Throughput	asc_sort_tip_amount	0.504775	0.504411	-0.00036	ops/s
Max Throughput	asc_sort_tip_amount	0.509497	0.508767	-0.00073	ops/s
50th percentile latency	asc_sort_tip_amount	7.84473	8.39563	0.5509	ms
90th percentile latency	asc_sort_tip_amount	8.35143	8.88081	0.52938	ms
99th percentile latency	asc_sort_tip_amount	8.47489	22.1422	13.6673	ms
100th percentile latency	asc_sort_tip_amount	8.47799	29.3989	20.9209	ms
50th percentile service time	asc_sort_tip_amount	5.07141	5.63181	0.5604	ms
90th percentile service time	asc_sort_tip_amount	5.13877	5.79324	0.65447	ms
99th percentile service time	asc_sort_tip_amount	5.27634	19.4516	14.1752	ms
100th percentile service time	asc_sort_tip_amount	5.28687	26.5044	21.2175	ms
error rate	asc_sort_tip_amount	0	0	0	%

Avg Heap:

Max Heap

Avg CPU

@kkhatua @reta @sandeshkr419 @getsaurabh02 @dblock

Next I can try is big5 workload.

reta · 2024-12-10T13:55:58Z

@kkhatua @reta @sandeshkr419 @getsaurabh02 @dblock

@rishabh6788 thanks a lot for picking it up, we are long overdue on this subject (G1, ZGC, Shenandoah GC recommendations). There is an issue open on OBS side [1], may be we could parametrize the GC selection to run full suites of tests to have a holistic picture.

[1] opensearch-project/opensearch-benchmark#333

kkhatua added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 25, 2024

github-actions bot added the Search:Performance label Sep 25, 2024

github-project-automation bot added this to Search Project Board Sep 25, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Sep 25, 2024

kkhatua added Performance This is for any performance related enhancements or bugs Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. and removed Search:Performance labels Sep 25, 2024

sandeshkr419 removed the untriaged label Oct 9, 2024

getsaurabh02 added untriaged Search:Performance labels Oct 9, 2024

reta mentioned this issue Oct 9, 2024

[Feature Request] Support JDK-23 (build time and runtime) #16255

Closed

getsaurabh02 removed the untriaged label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

kkhatua commented Sep 25, 2024

sandeshkr419 commented Oct 9, 2024

gkamat commented Oct 9, 2024

rishabh6788 commented Nov 12, 2024

adityamohan93 commented Dec 7, 2024 •

edited

Loading

rishabh6788 commented Dec 10, 2024 •

edited

Loading

reta commented Dec 10, 2024

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

[Feature Request] Provide ZGC as defacto JVM garbage collector #16084

Comments

kkhatua commented Sep 25, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

sandeshkr419 commented Oct 9, 2024

gkamat commented Oct 9, 2024

rishabh6788 commented Nov 12, 2024

adityamohan93 commented Dec 7, 2024 • edited Loading

rishabh6788 commented Dec 10, 2024 • edited Loading

Avg Heap:

Max Heap

Avg CPU

reta commented Dec 10, 2024

adityamohan93 commented Dec 7, 2024 •

edited

Loading

rishabh6788 commented Dec 10, 2024 •

edited

Loading