JAXP-intensive workload on OpenJ9 seems to be about 5% slower than HotSpot #6642

kgibm · 2019-07-31T15:01:42Z

Java -version output

OpenJ9 -version:

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-b10)
Eclipse OpenJ9 VM (build openj9-0.15.1, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20190717_368 (JIT enabled, AOT enabled)
OpenJ9   - 0f66c6431
OMR      - ec782f26
JCL      - f147086df1e based on jdk8u222-b10)

HotSpot -version:

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_222-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.222-b10, mixed mode)

Summary of problem

A customer observes about a 5% performance difference between J9 and HotSpot (particularly for IBM Java, but the same is seen with OpenJ9). The workload is JAXP-heavy (especially XPath and XSLT) and we may have recreated the problem with the attached standalone Java microbenchmark which shows AdoptOpenJDK+J9 is about 3.5% slower than AdoptOpenJDK+HotSpot even after trying various J9 tuning (the best seemingly being -Xtrace:none -Xverify:none). Verbose garbage collection shows the proportion of time in J9 is ~1.5% and in HotSpot is ~1.6%, so GC doesn't seem to be the issue.

Notes:

Baseline is -Xms1024 -Xmx1024m.
Each row is an independent experiment with the change from baseline noted in the second column. That change is only made on J9, and the HotSpot results are just a re-run of baseline.
Each experiment is run three times for each JVM and the averages are the third-to-last and second-to-last columns.
The last column is the OpenJ9 relative increase compared to HotSpot, calculated from the averages.

Test results:

#	Test	OpenJ9 1 (s)	HotSpot 1 (s)	OpenJ9 2 (s)	HotSpot 2 (s)	OpenJ9 3 (s)	HotSpot 3 (s)	OpenJ9 Average (s)	HotSpot Average (s)	OpenJ9 Relative Increase
1	Baseline	55.98	49.34	54.89	48.95	54.12	50.41	55.00	49.57	10.95%
2	-Xaggressive	57.38	51.44	54.65	49.09	54.48	50.01	55.50	50.18	10.61%
3	(2nd run) -Xshareclasses:name=shr	58.62	49.67	57.28	49.24	57.29	48.94	57.73	49.28	17.14%
4	(4th run) -Xshareclasses:name=shr -Xscmx400M '-Xjit:dontDowngradeToCold,useHigherMethodCounts,forceAOT' '-Xaot:dontDowngradeToCold,useHigherMethodCounts,forceAOT'	61.76	50.72	59.67	49.09	59.55	49.70	60.33	49.84	21.05%
5	-Xquickstart	59.62	48.94	60.06	48.93	58.87	49.18	59.52	49.02	21.42%
6	-XcompilationThreads1	55.33	48.95	56.97	48.88	55.01	48.71	55.77	48.85	14.17%
7	-Xtrace:none	57.26	49.69	54.62	49.41	54.22	49.49	55.37	49.53	11.78%
8	-Xverify:none	51.59	49.38	51.93	48.96	53.21	48.74	52.24	49.03	6.56%
9	-Xtrace:none -Xverify:none	51.64	49.57	51.13	49.27	49.89	48.76	50.89	49.20	3.43%

Longer-running tests (e.g. for JIT) based on test 9 were also run and didn't help and only became worse:

#	Loops	OpenJ9 (s)	HotSpot (s)	OpenJ9 Relative Increase
10	100000	525.3	453.5	15.83%
11	1000000	5,676.03	4,455.77	27.39%

The sample program is jaxpperformance.jar in the attached zip (its source is in the JAXPPerformanceCode directory - ultimately just JAXPPerformance.java).

The simplest execution of the sample program is:

time java -Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl -Djavax.xml.xpath.XPathFactory=com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl -Djavax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/xpath/dom=com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl -Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory -Djavax.xml.datatype.DatatypeFactory=com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl -Xms1024m -Xmx1024m -jar jaxpperformance.jar 100000

(Along with -XX:+UseContainerSupport on J9 if running in Docker.)

Other notes:

Nmon was run for the duration of all tests and the output is in the zip.
Verbose garbage collection was only run for the first iteration of both baselines and the files are in the zip.
All tests' output are in tests.txt (in particular, including the timestamp when the test started if you want to correlate to nmon). Results are only captured for columns 3 and 4. The second and third iterations of each experiment were run after the first round of tests.
After the above tests were all run, the J9 baseline was run again with HealthCenter (using this technique) and the hcd is in the zip.
The -Djavax.xml* options aren't strictly needed, but avoids any potential pointless factory lookups which we had seen during some benchmarks.
The tests were run on Fedora Linux 30 running in Docker on top of a MacBook Pro 2017 with low background activity verified in Activity Monitor.
The linperf_RESULTS.tar.gz is from a post-run of J9 baseline plus linperf.sh. A system dump was captured during this test if needed. Additional javacores from that test were also excluded from the upload to fit into Github upload requirements.

The Docker container sees 4 CPU core threads with these specs:

model name	: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
cache size	: 8192 KB

Diagnostic files

j9_hotspot_jaxp_diff.zip

The text was updated successfully, but these errors were encountered:

andrewcraik · 2019-07-31T15:15:32Z

What is the setting for transparent huge pages on the machine in question? Testing with transparent huge pages set to always may help performance.

kgibm · 2019-07-31T15:25:15Z

What is the setting for transparent huge pages on the machine in question? Testing with transparent huge pages set to always may help performance.

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

I'll try always and report back...

kgibm · 2019-07-31T15:44:01Z

@andrewcraik Re-ran test 9 with transparent_hugepage=always and the relative difference is still about 5%:

#	Test	OpenJ9 1 (s)	HotSpot 1 (s)	OpenJ9 2 (s)	HotSpot 2 (s)	OpenJ9 3 (s)	HotSpot 3 (s)	OpenJ9 Average (s)	HotSpot Average (s)	OpenJ9 Relative Increase
12	Test 9 w/ transparent_hugepage=always	51.14	48.46	50.04	47.67	49.88	47.7	50.35	47.94	5.03%

[was@1c12f673e1f5 ~]$ echo always | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
always
[was@1c12f673e1f5 ~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

FYI, memory in the container:

[was@1c12f673e1f5 ~]$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
MemTotal:       12281300 kB
MemFree:        10901932 kB
MemAvailable:   11377204 kB
Buffers:           53748 kB
Cached:           646036 kB
SwapCached:            0 kB
Active:           950964 kB
Inactive:         298696 kB
Active(anon):     538880 kB
Inactive(anon):    15176 kB
Active(file):     412084 kB
Inactive(file):   283520 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048572 kB
SwapFree:        1048572 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:        547596 kB
Mapped:           238120 kB
Shmem:              4184 kB
Slab:              63568 kB
SReclaimable:      34548 kB
SUnreclaim:        29020 kB
KernelStack:       10352 kB
PageTables:         8020 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     7189220 kB
Committed_AS:    3367240 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
AnonHugePages:     38912 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       54700 kB
DirectMap2M:     5187584 kB
DirectMap1G:     9437184 kB

andrewcraik · 2019-07-31T15:46:19Z

@kgibm that is still a 5% improvement if I am reading this correctly - eg comparing 12 to 1 which is the effect of the change?

andrewcraik · 2019-07-31T15:51:16Z

note that OpenJ9 plans to better support madvise once we have investigated #6156 which is targeted to OpenJ9 0.16 at the present time.

kgibm · 2019-07-31T15:51:38Z

@andrewcraik The last column is the relative difference. So OpenJ9 is 5% slower than HotSpot.

I haven't been comparing tests to each other - I'm just comparing the J9 experimental change to HotSpot baseline. (Since the HotSpot baseline test is always the same, we could compare all the tests together, but I avoid that because I'd rather compare tests run at approximately the same time to each other to reduce any time-based variability). I treat each test independently: three runs for OpenJ9, three runs for HotSpot, take the average of each and then calculate the relative difference: (B-A)/A where A=HotSpot, B=J9.

andrewcraik · 2019-07-31T15:54:55Z

@kgibm right what I am saying is that taking the baseline configuration the delta was 10% (eg configuration 1). Changing the transparent huge page setting to always narrowed that gap to 5% - eg OpenJ9 is still slower, but less so than in configuration 1. That performance improvement is expected to be delivered in OpenJ9 for 0.16. There is a 5% delta to still be investigated. I am interested to see that -Xverify:none had an impact - there was and is work being done to improve the performance of the verification logic some of the gains from that will likely also be available in 0.16 - FYI @DanHeidinga since he runs the team of folks who are helping with that work.

andrewcraik · 2019-07-31T16:01:07Z

FYI @mpirvu due to the AOT numbers above which seem 'interesting'

kgibm · 2019-07-31T16:02:11Z

@andrewcraik Oh I see what you mean. This transparent_hugepage=always test 12 was based on test 9, so I ran it with -Xtrace:none -Xverify:none as well. Test 9 was 7% faster than test 1, but test 12 was only 5% faster than test 1, so it doesn't seem that transparent_hugepage=always had any improvement (although of course sample sizes here are small and variability may be high, so I'd put it into the error range). I'll run a test with transparent_hugepage=always based on test 1...

kgibm · 2019-07-31T16:03:17Z

BTW, test ideas 2-9 were just random ideas I grabbed from the air :) Just trying to be thorough before opening the issue.

kgibm · 2019-07-31T16:25:32Z

@andrewcraik I ran test 1 with transparent_hugepage=always and it didn't seem to give any benefit (J9 about 10% slower). It seems the major benefit is from -Xtrace:none -Xverify:none.

I also re-did the whole spreadsheet to make it clear which test each run is based on. I also added a t-test column (two-tailed, heteroscedastic) and most of them are p < 0.05 (in other words, the difference between averages is statistically "significant"). Note that I re-arranged the columns compared to the original table to make it easier to run the t-test.

#	Test	OpenJ9 1 (s)	OpenJ9 2 (s)	OpenJ9 3 (s)	HotSpot 1 (s)	HotSpot 2 (s)	HotSpot 3 (s)	OpenJ9 Average (s)	HotSpot Average (s)	OpenJ9 Relative Increase	t-test
1	Baseline (-Xmx1G -Xmx1G; 100000 loops)	55.98	54.89	54.12	49.34	48.95	50.41	55.00	49.57	10.95%	0.00
2	Test 1 w/ -Xaggressive	57.38	54.65	54.48	51.44	49.09	50.01	55.50	50.18	10.61%	0.01
3	(2nd run) Test 1 w/ -Xshareclasses:name=shr	58.62	57.28	57.29	49.67	49.24	48.94	57.73	49.28	17.14%	0.00
4	(4th run) Test 1 w/ -Xshareclasses:name=shr -Xscmx400M '-Xjit:dontDowngradeToCold,useHigherMethodCounts,forceAOT' '-Xaot:dontDowngradeToCold,useHigherMethodCounts,forceAOT'	61.76	59.67	59.55	50.72	49.09	49.7	60.33	49.84	21.05%	0.00
5	Test 1 w/ -Xquickstart	59.62	60.06	58.87	48.94	48.93	49.18	59.52	49.02	21.42%	0.00
6	Test 1 w/ -XcompilationThreads1	55.33	56.97	55.01	48.95	48.88	48.71	55.77	48.85	14.17%	0.01
7	Test 1 w/ -Xtrace:none	57.26	54.62	54.22	49.69	49.41	49.49	55.37	49.53	11.78%	0.02
8	Test 1 w/ -Xverify:none	51.59	51.93	53.21	49.38	48.96	48.74	52.24	49.03	6.56%	0.01
9	Test 1 w/ -Xtrace:none -Xverify:none	51.64	51.13	49.89	49.57	49.27	48.76	50.89	49.20	3.43%	0.07
10	Test 9 w/ 1000000 loops	525.3			453.5			525.30	453.50	15.83%
11	Test 9 w/ 10000000 loops	5,676.03			4,455.77			5676.03	4455.77	27.39%
12	Test 9 w/ transparent_hugepage=always	51.14	50.04	49.88	48.46	47.67	47.7	50.35	47.94	5.03%	0.01
13	Test 1 w/ transparent_hugepage=always	51.83	52.86	53.22	47.69	47.77	47.78	52.64	47.75	10.24%	0.01

andrewcraik · 2019-07-31T16:56:06Z

@kgibm thanks for the clarification - I am surprised about the overheads seemingly caused by -Xtrace:none -Xverify:none. Would it be possible to get a perf record of configs 1 and 9 during iteration 3 on your setup with -Xjit:perfTool specified? This will show how time is distributed between the various JVM components etc.

andrewcraik · 2019-07-31T16:57:12Z

FYI @vijaysun-omr due to general interest in perf deltas.

kgibm · 2019-07-31T17:00:49Z

@andrewcraik I was worried somebody would ask for perf :) Unfortunately, Docker (at least on Mac) doesn't support perf. I tried running the container with --privileged and it didn't help. I do have a Thinkpad with RHEL on it so I'll see if I can dust that off today.

Note also that I'm traveling to another customer tomorrow for a week so I'll be mostly unavailable, but I've attached the standalone program in the original issue description if others want to try to reproduce locally.

kgibm · 2019-07-31T17:01:44Z

Also, you already mentioned this, but just for others that might miss this, it would certainly be interesting to understand why -Xtrace:none -Xverify:none are helping so much (btw, that was a suggestion from Gary DeVal to try), but the bigger issue for the customer is the remaining ~5% difference.

mpirvu · 2019-07-31T17:13:22Z

AOT is supposed to help only when many methods need to be compiled in a short period of time. This benchmark runs for 50+ seconds and I don't think that the JAXP benchmark has that many methods to compile.
I am also surprised by the improvement brought in by -Xtrace:none -Xverify:none. These two are useful for short running apps which is not the case here.

andrewcraik · 2019-07-31T17:15:05Z

I think understanding the -Xtrace:none and -Xverify:none issues will help inform where the rest of the delta is coming from.

andrewcraik · 2019-07-31T19:54:59Z

@kgibm To continue the discussion of the OpenJ9 problem - I have access to a linux perf setup so I'll see if I can get the test case running on there. I'll see if I can reproduce the same kind of gaps as you see in your container and see if a profile shows anything useful. If you manage to get any profiling data or find out anything else do let me know since there are some strange things in play seemingly

andrewcraik · 2019-08-01T17:02:16Z

So I have the test running on my setup and I can see a delta between HotSpot and OpenJ9 even on bare metal (I removed the container support option etc so I'm just looking at the native machine performance to simplify things a bit). I can also see that -Xverify:none and -Xtrace:none are able to save about 2/3 of the gap I see. The profile does show hot and scorching compilations. The hotest methods seems to be StringBuilder's ensureCapacityImpl so there is a lot of string building going on (makes sense based on the code).

Going to dig a bit more at what is going on etc. Still some mysteries to solve.

kgibm · 2019-08-01T17:13:08Z

Great to hear that it's reproducible, thanks Andrew!

andrewcraik · 2019-08-08T17:22:20Z

Has it been 7 days already? We had a couple of days off here in Canada so I guess it has...

Anyway I have still been looking at this to try and figure out what is going on. I've been doing some experiments with:

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-201907290413-b10)
Eclipse OpenJ9 VM (build master-e4050392f, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20190729_373 (JIT enabled, AOT enabled)
OpenJ9   - e4050392f
OMR      - 8acafad8
JCL      - 46e5e62038e based on jdk8u222-b10)

So some configurations and some notes - all runs are averages of 3 on a skylake linux perf box. I'm pinning to 4 cores.

config	time	notes
baseline	48.09	this with default shareclasses for bootstrap classes
Xshareclasses:none	50.07	share classes helps
Xverify:none	45.41	verify:none boosts performance
Xtrace:none	48.208	no affect from trace:none
Xverify:none & Xtrace:none	46.08	benefit of verify:none is not increased by trace:none
TR_OptimizeForConstantLengthArrayCopy=1 & Xverify:none	45.83	no affect from constant length arraycopy opts (one hunch for the delta)
Xverify:none & Xquickstart	52.25	the throughput tradeoff with quickstart is not worth it

The fastest configuration in my testing so far is a latest JVM with shareclasses enabled and -Xverify:none.

There remains about 5% to the HotSpot performance on my setup. I need to now build some test JITs to test the affect of enabling nextGenHCR during startup as well as the new JProfiling implementation.

I did do a quick check of the end to end profile for the benchmark. There is a non-trivial amount of time in the bytecode interpreter loop (a bit under 1/3 of the profile). There is a reasonable amount of time consumed by JIT compilation and the bulk of the remainder is in JIT code. Top method looked to be StringBuilder.append (hence the constant arraylength optimizations above).
`-

DanHeidinga · 2019-08-08T17:38:35Z

@andrewcraik Can you also do a run with -verbose:class to see if there is significant classloading that continues to occur throughout? Does JAXP generate classes at runtime? Is there something we're doing that Hotspot isn't that causes them to be continually regenerated?

andrewcraik · 2019-08-09T21:05:09Z

So I have done a few more experiments. Enabling nextGenHCR during startup doesn't seem to make much of a difference to the performance. Using the new JProfiling implementation for high optimization compiles that @r30shah seems to cut the time to around 44.5 seconds which helps, but there is still more on the table. More investigation to come.

@DanHeidinga Looks like there is a lot of class loading going on for part of the run - many many MANY instances of GregorSamsa being loaded - looks to be related to https://xml.apache.org/xalan-j/xsltc/xsltc_native_api.html. Something about stylesheet processing. Perhaps it is our class loading performance that is letting us down? I'll have to dig at that next week. HotSpot appears to load a similar number of dynamically generated classes.

hzongaro · 2019-08-12T02:57:25Z

many many MANY instances of GregorSamsa being loaded

FYI. The default class name XSLTC uses for the class it generates for an XSLT stylesheet is GregorSamsa. That was the case both with the XSLTC native API that Andrew mentions and also under the covers when using XSLTC as part of a JAXP implementation.

If the customer's code is recompiling the XSLT stylesheet each time it's used, it will generate a new set of classes each time - those classes will be generated, loaded and the methods JIT compiled, as required, and old classes dropped on the floor, even if the XSLT stylesheet is not changing. At least, that's how things stood when I was part of the team working on Apache-Xalan (including XSLTC) and IBM's JAXP implementation. Looking at the benchmark code, I see it is repeatedly calling TransformerFactory.newTemplates() to create a Templates object, and then calling Templates.newTransformer(), so that appears to be what it is doing - recompiling the XSLT stylesheet each time it's used.

We used to recommend to customers that they call TransformerFactory.newTemplates() once, and then create a Transformer object from that each time they needed to apply the stylesheet to an XML input document. That avoids repeatedly recompiling the same XSLT stylesheet into new classes each time. That would yield much better performance with both IBM's JDK and Hotspot, back in those days - and likely with OpenJ9 today, and would probably result in a more meaningful comparison. Otherwise, it's a bit like recompiling and reloading a method every time you need to run it.

Of course, if the real scenario that the benchmark is modelling involves a situation where XSLT stylesheets are really used just once -- perhaps because the stylesheets themselves are dynamically generated -- then calling TransformerFactory.newTemplates() might better match what the customer's actual code needs to do.

andrewcraik · 2019-08-12T12:43:18Z

So looking at the benchmark as attached previously the static void run method is doing TransformerFactory.newInstance() is done once. Inside the static void benchmark layer there is a new template being on each iteration of the core loop. I am going to experiment by moving this logic out of the loop and allowing the same transformer to be reused. This would simulate the code transform being suggested by @hzongaro.

@hzongaro thank you for the insights into the XML parser - neither @DanHeidinga nor I are experts on the inner details as you are.

andrewcraik · 2019-08-12T12:53:38Z

So I can confirm that moving the lines:

final Templates xsltTemplate = transFact.newTemplates(xsltSource);
final Transformer transformer = xsltTemplate.newTransformer();

out of the benchmark function's loop stops the huge amount of class loading. The removal makes the difference between -Xverify:none and default hard to measure.

DanHeidinga · 2019-08-12T13:06:37Z

@andrewcraik Do you have profiles of the application? I'd like to see both which java calls are hot and which VM calls are hot to see if there's an obvious bottleneck in the classload path.

andrewcraik · 2019-08-12T13:15:54Z

With the modified benchmark the total runtime has dropped below 12s - we are now dealing with quite a short running benchmark if run in the same configuration which puts the benchmark into a domain where startup performance is important.

If I double the work to keep the execution time over 10s to minimize the start-up/ramp-up artifacts then I see the following:

config	runtime
openjdk	13.03
openj9	10.40
openj9 + shareclasses	10.12
openj9 + jProfiling	10.79
openj9 + jProfiling + shareclasses	9.67

Now if I drop the data set size back to the original 100000:

config	runtime
openjdk	7.37
openj9 + jProfiling + shareclasses	9.39

The difference in work makes class load / startup / rampup dominate. Adding -Xverify:none has no measurable impact at the original data set size. -Xquickstart hurts performance since we do need steady-state performance as well as startup in this scenario - it is end to end with high opt compiles required (as evidence by the gains from JProfiling).

We will keep having a look at the class loading aspect, but @kgibm I'm wondering if the benchmark is representative of the actual workload? Is the original workload recreating templates / transformers all the time? Could a Java code optimization get them into a mode where the OpenJ9 performance is better (or is that the mode they are in - in which case the benchmark isn't simulating the bottleneck they are seeing)?

@DanHeidinga I do have profiles - I'll pass them to you offline so you can study them to see what you find since they are a bit unweildy to post here.

andrewcraik · 2019-08-12T13:19:52Z

Part of the problem may just be that we have a lot of code running in the interpreter if each of the stylesheets is created and a lot of the code is only run once.

kgibm · 2019-08-12T13:27:55Z

Interesting finds.

@kgibm I'm wondering if the benchmark is representative of the actual workload? Is the original workload recreating templates / transformers all the time?

It's quite possible that this benchmark is not representative. I wrote this code from scratch based on HealthCenter profiling of the actual workload and then mimicking some of the XML document structure and XPaths used, but it's quite possible that the application/stack product that runs the actual workload does not call newTemplates/newTransformer; I will ask to run some trace on the actual workload (I'm presuming -Xtrace is the easiest way to check if they're calling these methods after warmup).

andrewcraik · 2019-08-12T13:33:02Z

@kgibm an -Xverbose:class will show how much class loading is going on though there may be a better -Xtrace option for that - @DanHeidinga thoughts? A perf record of steady state -Xjit:perfTool would show where time is really going in the system - an -Xtrace for method entry would be a poor mans approximation of that I think. @DanHeidinga again any additional thoughts on trace to see if the benchmark is representative?

DanHeidinga · 2019-08-12T13:51:08Z

If the source is available, I'd start by greping the code base for TransformerFactory.newTemplates() calls as that's the context that will determine if the XLST stylesheet is allowed to change or not. If that shows the style sheet doesn't change (or mostly doesn't change), it provides the opportunity to cache the Templates.newTransformer() objects.

Otherwise, I'd go with @andrewcraik's -Xverbose:class as a good starting point and then maybe narrow it with something like this which will dump a stack trace on the call:

"-Xtrace:none,maximal=mt,methods=TransformerFactory.newTemplates,trigger=method{TransformerFactory.newTemplates,jstacktrace}"

Note, the method names are not complete. They need to be prepended with the correct java package using / instead of .

kgibm · 2019-08-12T14:55:48Z

Thanks all, I'll ask for that tracing (first -verbose:class + -Xtrace, and if that doesn't show heavy classloading after warmup, then perf) and report back (a huge stack product is involved on top of WAS, so a grep won't be easy).

kgibm · 2019-08-12T15:14:08Z

For reference, I will propose:

-Xverify:none
-Xverbose:class
-Xtrace:none,maximal=mt,methods=com/sun/org/apache/xalan/internal/xsltc/trax/TransformerFactoryImpl.newTemplates,trigger=method{com/sun/org/apache/xalan/internal/xsltc/trax/TransformerFactoryImpl.newTemplates,jstacktrace}
-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl
-Djavax.xml.xpath.XPathFactory=com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl
-Djavax.xml.xpath.XPathFactory:http://java.sun.com/jaxp/xpath/dom=com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl
-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory
-Djavax.xml.datatype.DatatypeFactory=com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl

kgibm · 2020-02-18T18:04:28Z

Thanks for everyone's time on this but we finally re-created the full customer environment internally, ran the above trace, and there is only a single call to TransformerFactory.newTemplates so this standalone reproduction was a false positive.

kgibm added the userRaised label Jul 31, 2019

pshipton added the perf label Sep 16, 2019

planetf1 mentioned this issue Nov 4, 2019

JVMVRFY038 invokespecial on invalid target; class=die/verwandlung/GregorSamsa, #7684

Closed

kgibm closed this as completed Feb 18, 2020

JAXP-intensive workload on OpenJ9 seems to be about 5% slower than HotSpot #6642

JAXP-intensive workload on OpenJ9 seems to be about 5% slower than HotSpot #6642

Comments

kgibm commented Jul 31, 2019 • edited Loading

Java -version output

Summary of problem

Diagnostic files

andrewcraik commented Jul 31, 2019

kgibm commented Jul 31, 2019

kgibm commented Jul 31, 2019 • edited Loading

andrewcraik commented Jul 31, 2019

andrewcraik commented Jul 31, 2019

kgibm commented Jul 31, 2019

andrewcraik commented Jul 31, 2019

andrewcraik commented Jul 31, 2019

kgibm commented Jul 31, 2019

kgibm commented Jul 31, 2019

kgibm commented Jul 31, 2019 • edited Loading

andrewcraik commented Jul 31, 2019

andrewcraik commented Jul 31, 2019

kgibm commented Jul 31, 2019

kgibm commented Jul 31, 2019

mpirvu commented Jul 31, 2019

andrewcraik commented Jul 31, 2019 • edited Loading

andrewcraik commented Jul 31, 2019

andrewcraik commented Aug 1, 2019

kgibm commented Aug 1, 2019

andrewcraik commented Aug 8, 2019

DanHeidinga commented Aug 8, 2019

andrewcraik commented Aug 9, 2019

hzongaro commented Aug 12, 2019

andrewcraik commented Aug 12, 2019

andrewcraik commented Aug 12, 2019

DanHeidinga commented Aug 12, 2019

andrewcraik commented Aug 12, 2019

andrewcraik commented Aug 12, 2019

kgibm commented Aug 12, 2019

andrewcraik commented Aug 12, 2019

DanHeidinga commented Aug 12, 2019

kgibm commented Aug 12, 2019

kgibm commented Aug 12, 2019 • edited Loading

kgibm commented Feb 18, 2020

kgibm commented Jul 31, 2019 •

edited

Loading

kgibm commented Jul 31, 2019 •

edited

Loading

kgibm commented Jul 31, 2019 •

edited

Loading

andrewcraik commented Jul 31, 2019 •

edited

Loading

kgibm commented Aug 12, 2019 •

edited

Loading