-XX:+UseConcMarkSweepGC JAVA_OPTS slows down logstash by ~20% #2914

colinsurprenant · 2015-03-27T18:21:11Z

per #2859 we realized that using the -XX:+UseConcMarkSweepGC JAVA_OPTS slows down logstash by about 20%.

The text was updated successfully, but these errors were encountered:

colinsurprenant · 2015-03-27T18:36:19Z

I will put it as blocker for 1.5, we need to figure if using the CMS garbage collector is necessary and has benefits over using the default ParallelGC. Currently the default logstash JAVA_OPTS are:

LS_HEAP_SIZE="${LS_HEAP_SIZE:=500m}"

JAVA_OPTS="$JAVA_OPTS -Xmx${LS_HEAP_SIZE}"
JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
JAVA_OPTS="$JAVA_OPTS -Djava.awt.headless=true"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

colinsurprenant · 2015-03-27T21:39:39Z

I tried to search the git log for the rational for using these settings but did not find anything.

These settings are the same as in elasticsearch, maybe it was decided that these settings would also benefit logstash, and maybe in particular when using the embedded elasticsearch?

In any case, I do not believe we actually need to enable the CMS GC since the heap usage signature of logstash is not like with elasticsearch. ES experiences memory usage spikes and want to really minimize stop-the-world situations. Also, using embedded elasticsearch in logstash is not recommended for production and is only there for easier prototyping/testing so we should not set default GC options for that use-case.

logstash will typically have a rather stable heap usage pattern. given enough memory, I do not believe there is any benefit for using CMS GC at the cost of a ~20% overall performance drop. also, a stop-the-world situation in logstash is not critical as opposed to in elasticsearch. In fact, I think logstash would benefit from having a larger GC Young Generation heap space using either -XX:NewSize= or -XX:NewRatio= because logstash has typically a rather constant old generation heap.

thoughts?

I suggest we remove these JVM options by default and also make JAVA_OPTS overridable on the command line, for example:

JAVA_OPTS="-Xmx500m -XX:NewSize=400m" bin/logstash ...

Also, as in elasticsearch, I suggest we also set these option by default in logstash:

-Djava.net.preferIPv4Stack=true
-XX:+HeapDumpOnOutOfMemoryError

colinsurprenant · 2015-03-27T21:49:12Z

using simple pipeline performance configurations defined in #2870 I can see a ~10% performance increase between using only these JAVA_OPTS:

JAVA_OPTS="-Xmx500m"

vs

JAVA_OPTS="-Xmx500m -XX:NewSize=400m"

for these configs, using a larger youg generation heap performs better.

untergeek · 2015-03-27T23:49:59Z

This is really interesting. It's also very important to understand and document well because JVM settings could have a profound impact on those who are still trying to use the embedded Elasticsearch.

purbon · 2015-03-30T09:04:26Z

@colinsurprenant this makes complete sense to me to investigate prior to 1.5 GA, we have as @untergeek said understand better our specific GC needs, ...

Not sure if this qualify, but I see our GC needs as something that depends highly on our pipeline usage, right? I mean what might be the difference with doing lots of Groks or not? for example.

Definitely something important to dig into

purbon · 2015-04-01T17:47:49Z

We agree on having different options + having the option to overwrite this options from the client side point of view.

jsvd · 2015-04-02T11:59:55Z

Test:

bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }' | pv -War > /dev/null

JVM flags	Time
1.5.rc2 flags (CMS)	1:25.54 - 126kiB/s
No CMS flags	1:17.52 - 141kiB/s
No CMS flags + NewRatio=3	1:18.93 - 138kiB/s
No CMS flags + NewRatio=2	1:13.66 - 149kiB/s

So my proposal:

iff --git a/bin/logstash.lib.sh b/bin/logstash.lib.sh
index 1b1fc46..a6fe590 100755
--- a/bin/logstash.lib.sh
+++ b/bin/logstash.lib.sh
@@ -20,18 +20,15 @@ setup_java() {
   fi

   JAVA_OPTS="$JAVA_OPTS -Xmx${LS_HEAP_SIZE}"
-  JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
-  JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
+  JAVA_OPTS="$JAVA_OPTS -XX:NewRatio=2"
   JAVA_OPTS="$JAVA_OPTS -Djava.awt.headless=true"

-  JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
-  JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
-
   if [ ! -z "$LS_USE_GC_LOGGING" ] ; then
     JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDetails"
     JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCTimeStamps"

ph · 2015-04-02T14:01:07Z

Could we do a check with real world data? Maybe using the HOW's complete.conf?

purbon · 2015-04-02T14:01:29Z

@ph I'm doing this right now

purbon · 2015-04-02T14:02:24Z

I'm using also different datasets that I've spare in here.

ph · 2015-04-02T14:03:26Z

❤️

purbon · 2015-04-07T10:02:49Z

Hi,
I did some micro benchmarking with the generated messages and this are my numbers, please let me know if you see something strange, because I don't see a lot of benefit (in this exact case) to change GC options.

Here is my report:

CONFIG="input { generator { count => 10000000 } } output { stdout { codec => dots } }"

	AVG (s)	Diff	EPS
1	51.01		196046.9101
2	49.55	2.86%	201809.8306
3	50.62	0.77%	197569.1097
4	47.66	6.57%	209834.5245
5	51.80	-1.54%	193068.0835

CONFIG="input { generator { count => 100000000 } } output { stdout { codec => dots } }"

	AVG (s)	Diff	EPS
1	460.71		217054.7751
2	452.90	1.69%	220796.9534
3	491.92	-6.77%	203285.5003
4	430.94	6.46%	232051.1886
5	496.06	-7.67%	201587.1358

Executions are made with this set of ENV variables

Default options.
ENV['LS_HEAP_SIZE']="500m"
ENV['LS_HEAP_SIZE']="500m" ENV['LS_JAVA_OPTS']="-XX:NewRatio=2"
ENV['LS_HEAP_SIZE']="500m" ENV['LS_JAVA_OPTS']="-XX:+UseParNewGC -XX:NewRatio=2"
ENV['LS_HEAP_SIZE']="500m" ENV['LS_JAVA_OPTS']="-XX:+UseParallelGC -XX:NewRatio=2"

purbon@Ubuntu-1404-trusty-64-minimal:~/_gc_config_benchmark/reports$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

purbon@Ubuntu-1404-trusty-64-minimal:~/_gc_config_benchmark/reports$ ruby -v
jruby 1.7.16.1 (1.9.3p392) 2014-10-28 4e93f31 on OpenJDK 64-Bit Server VM 1.7.0_65-b32 +jit [linux-amd64]

and using the vendored JRuby.

This is done using master and running it with as in

bin/logstash -e $CONFIG | pv ....

so as I said, using the vendored jruby.

thoughts why do I get this numbers?

purbon · 2015-04-07T10:08:08Z

Now I'm doing some micro benchmarking with more complex configuration and datasets, as in with a set of apache logs and a set of xml documents. Will report numbers in here soon.

colinsurprenant · 2015-04-07T11:55:06Z

can you share the exact command lines you used to do the benchmarks. also exactly which LS version you used? master?

purbon · 2015-04-07T12:28:04Z

Yes, sure, updating my comment so it stays all together.

purbon · 2015-04-07T13:10:06Z

Updating the numbers as requested by @colinsurprenant so the reported number is the one given back by pv, not the timing as it's now.

jsvd · 2015-04-10T16:10:32Z

test: bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }'

Vanilla JVM flags on master:

joaoduarte@Joaos-MacBook-Pro-2 ~/projects/logstash (git)-[master] % bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }'
--1.9 -J-Xmx500m -J-XX:+UseParNewGC -J-XX:+UseConcMarkSweepGC -J-Djava.awt.headless=true -J-XX:CMSInitiatingOccupancyFraction=75 -J-XX:+UseCMSInitiatingOccupancyOnly
[ 142kiB/s] [ 142kiB/s]
bin/logstash -e   153.73s user 55.72s system 272% cpu 1:16.82 total

Disabling CMS:

joaoduarte@Joaos-MacBook-Pro-2 ~/projects/logstash (git)-[master] % bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }'
--1.9 -J-Xmx500m -J-Djava.awt.headless=true
[ 151kiB/s] [ 151kiB/s]
bin/logstash -e   144.49s user 53.03s system 270% cpu 1:12.99 total

Disabling CMS + Increasing New Gen size:

joaoduarte@Joaos-MacBook-Pro-2 ~/projects/logstash (git)-[master] % bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }'
--1.9 -J-Xmx500m -J-XX:NewRatio=3 -J-Djava.awt.headless=true
[ 155kiB/s] [ 155kiB/s]
bin/logstash -e   143.83s user 51.67s system 272% cpu 1:11.72 total

Vanilla flags again after previews tests:

joaoduarte@Joaos-MacBook-Pro-2 ~/projects/logstash (git)-[master] % bin/logstash -e 'input { generator { count => 10000000 } } output { stdout { codec => dots } }'
--1.9 -J-Xmx500m -J-XX:+UseParNewGC -J-XX:+UseConcMarkSweepGC -J-Djava.awt.headless=true -J-XX:CMSInitiatingOccupancyFraction=75 -J-XX:+UseCMSInitiatingOccupancyOnly
[ 138kiB/s] [ 138kiB/s]
bin/logstash -e   154.14s user 54.31s system 262% cpu 1:19.29 total

So --1.9 -J-Xmx500m -J-XX:NewRatio=3 -J-Djava.awt.headless=true wins!

colinsurprenant · 2015-04-16T08:47:16Z

@jsvd note that a higher value of -XX:NewRatio results in a larger old generation size. The JVM default in server mode is 2 and the biggest young generation possible when using -XX:NewRatio is to set it to -XX:NewRatio=1.

Since the only significative difference we can measure is by removing the CMS I would go ahead and remove that from the defaults but keep the rest as is. In other words, I would remove these options:

JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"
JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

colinsurprenant · 2015-04-16T08:54:51Z

relates to #2942

colinsurprenant · 2015-04-16T10:37:09Z

we decided to not put it as blocker for RC3, this will be pushed back to the next release (either 1.5.1 or 1.6.0). OTH we will merge #2942 so that we have a way to completely override JAVA_OPTS and thus remove any CMS if we wish to.

felipegs · 2015-05-23T01:32:12Z

If I use elasticsearch output, with protocol option as node, has impact on this?
Because if I do that it starts an elasticsearch inside same jvm process, but it only act as a client node of elasticsearch cluster

colinsurprenant · 2015-05-25T16:44:06Z

@felipegs no it should not. the CMS GC option will be useful when memory pressure increases and that typically happens on ES data nodes.

ppf2 · 2015-10-13T20:44:56Z

+1 on adding heap dump on oom flag like Elasticsearch, eg.

# Causes the JVM to dump its heap on OutOfMemory.
JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
# The path to the heap dump location, note directory must exists and have enough
# space for a full heap dump.
#JAVA_OPTS="$JAVA_OPTS -XX:HeapDumpPath=$ES_HOME/logs/heapdump.hprof"

splitice · 2015-10-17T05:24:30Z

By setting -Xmx450m -XX:NewSize=375m (same ratio as @colinsurprenant) we saw an increase of 15-20% on a single core 512mb ram logstash worker. Tested on 4 such nodes, all with the same results.

This is under realworld conditions and a pretty large & complex production configuration.

jordansissel · 2015-10-17T06:20:48Z

I tried to search the git log for the rational for using these settings but did not find anything.

If I recall correctly, I made this change because it was already what Elasticsearch was using.

colinsurprenant · 2016-01-13T22:24:01Z

bump. we should reassess this WRT ng-pipeline and logstash-core-event-java but I believe there are no benefit for using CMS - the Elasticsearch heap usage pattern is nowhere similar to logstash and so far all tests seems to indicate a non-negligible performance increase.

untergeek · 2016-01-13T22:38:43Z

+1

geoffroya · 2016-04-21T15:52:01Z

Hi all

I have just remove CMS and leaving default GC behaviour and -XX:NewSize, and logstash stopped garbage collecting like a fool.

I can't imagine LS v5 release without GC that can be (easily) settable, or at least documented.

For those interested in changing GC just through config, use /etc/default/logstash:

...
# Arguments to pass to java
LS_HEAP_SIZE="400m"
#LS_JAVA_OPTS="-Djava.io.tmpdir=$HOME"

JAVA_OPTS="-Djava.awt.headless=true"
JAVA_OPTS="$JAVA_OPTS -XX:NewSize=200m"
export JAVA_OPTS
# Need this because /opt/logstash/bin/logstash.lib.sh ruby_exec() waits for this var => should be configurable IMHO
export HEAP_DUMP_PATH="-XX:HeapDumpPath=/tmp/heapdump.hprof"
...

jordansissel · 2016-04-21T19:31:09Z

Based onthe evidence so far, I am +1 on changing the default GC parameters for Logstash 5.0.0 do whatever we find is best.

marek-obuchowicz · 2016-05-22T13:30:48Z

+1... CMS is supposed to offer possibly short stop-the-world phases, which are important for real-time or event-based applications (ie. get response to ES request as soon as possible). Logstash is about throughput and having longer single GC pauses with smaller total GC overhead is definitely welcome. Possibly even earlier as 5.0 - it's a small change in init scripts, does not depend on new code and could be published soon

andrewvc · 2016-05-22T16:37:00Z

I realize its late, but just to add something here, we haven't supported embedded ES for quite some time now. We pulled it out somewhere in 2.x

untergeek · 2016-05-23T15:52:14Z

The changes in #5341 should make it easy to both change this default, and also for users to change it to something they like.

jsvd · 2016-05-23T16:00:52Z

Have we run tests where a full GC is actually triggered during ingestion? It seems very reasonable that CMS is slower because it avoids full GCs, but I'd like to see how long that takes on a busy logstash instance, with the added caveat that such stops mean packet loss for plugins like UDP input.

marek-obuchowicz · 2016-05-24T08:00:16Z

Inputs without ack mechanism (like udp) will be surely negatively affected.

deepakm14 · 2018-06-19T06:33:23Z

any suggest how do i fix issue

[GC (Metadata GC Threshold) 8530310K->2065630K(31574016K), 0.3831399 secs]
[Full GC (Metadata GC Threshold) 2065630K->2053217K(31574016K), 3.5927870 secs]
[GC (Metadata GC Threshold) 8061486K->2076192K(31574016K), 0.0096316 secs]
[Full GC (Metadata GC Threshold) 2076192K->2055722K(31574016K), 0.9376524 secs]
[GC (Metadata GC Threshold) 8765230K->2100440K(31574016K), 0.0150190 secs]
[Full GC (Metadata GC Threshold) 2100440K->2077052K(31574016K), 4.1662779 secs]

remtcsdev · 2019-04-11T22:54:51Z

Seems UseConcMarkSweepGC is to be removed shortly, and currently produces this warning on Logstash startup on Java 11.

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

colinsurprenant mentioned this issue Mar 27, 2015

simple pipeline config is slower on 1.5rc2 than 1.4.2 #2859

Closed

colinsurprenant added the blocker label Mar 27, 2015

colinsurprenant added this to the v1.5.0 milestone Mar 27, 2015

colinsurprenant self-assigned this Mar 27, 2015

colinsurprenant added the enhancement label Mar 27, 2015

purbon assigned purbon and unassigned colinsurprenant Apr 1, 2015

purbon mentioned this issue Apr 2, 2015

JAVA_OPTS & LS_JAVA_OPTS parsing strategy #2942

Closed

suyograo mentioned this issue Apr 6, 2015

Display warning when embedded option is used logstash-plugins/logstash-output-elasticsearch#99

Closed

colinsurprenant removed the blocker label Apr 16, 2015

colinsurprenant modified the milestones: v1.5.1, v1.5.0 Apr 16, 2015

tbragin added the v1.5.1 label Jun 10, 2015

tbragin removed this from the v1.5.1 milestone Jun 10, 2015

suyograo added v1.6.0 and removed v1.5.1 labels Jun 17, 2015

suyograo removed the v1.6.0 label Sep 1, 2015

purbon added the work in progress label Sep 24, 2015

colinsurprenant added v2.3.0 and removed work in progress labels Jan 13, 2016

suyograo removed the v2.3.0 label Feb 28, 2016

colinsurprenant assigned colinsurprenant and unassigned purbon and colinsurprenant Feb 17, 2017

-XX:+UseConcMarkSweepGC JAVA_OPTS slows down logstash by ~20% #2914

-XX:+UseConcMarkSweepGC JAVA_OPTS slows down logstash by ~20% #2914

Comments

colinsurprenant commented Mar 27, 2015

colinsurprenant commented Mar 27, 2015

colinsurprenant commented Mar 27, 2015

colinsurprenant commented Mar 27, 2015

untergeek commented Mar 27, 2015

purbon commented Mar 30, 2015

purbon commented Apr 1, 2015

jsvd commented Apr 2, 2015

ph commented Apr 2, 2015

purbon commented Apr 2, 2015

purbon commented Apr 2, 2015

ph commented Apr 2, 2015

purbon commented Apr 7, 2015

purbon commented Apr 7, 2015

colinsurprenant commented Apr 7, 2015

purbon commented Apr 7, 2015

purbon commented Apr 7, 2015

jsvd commented Apr 10, 2015

colinsurprenant commented Apr 16, 2015

colinsurprenant commented Apr 16, 2015

colinsurprenant commented Apr 16, 2015

felipegs commented May 23, 2015

colinsurprenant commented May 25, 2015

ppf2 commented Oct 13, 2015

splitice commented Oct 17, 2015

jordansissel commented Oct 17, 2015

colinsurprenant commented Jan 13, 2016

untergeek commented Jan 13, 2016

geoffroya commented Apr 21, 2016

jordansissel commented Apr 21, 2016

marek-obuchowicz commented May 22, 2016

andrewvc commented May 22, 2016

untergeek commented May 23, 2016

jsvd commented May 23, 2016

marek-obuchowicz commented May 24, 2016

deepakm14 commented Jun 19, 2018

remtcsdev commented Apr 11, 2019