Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVMVRFY038 invokespecial on invalid target; class=die/verwandlung/GregorSamsa, #7684

Closed
planetf1 opened this issue Nov 4, 2019 · 155 comments
Closed

Comments

@planetf1
Copy link

planetf1 commented Nov 4, 2019

Java -version output

openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.5+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.17.0, JRE 11 Mac OS X amd64-64-Bit Compressed References 20191016_371 (JIT enabled, AOT enabled)
OpenJ9 - 77c1cf7
OMR - 20db4fb
JCL - 2a7af5674b based on jdk-11.0.5+10)

Summary of problem

When building ODPi egeria (https://github.com/odpi/egeria) master, the build fails on a maven step using the Apache RAT plugin 0.13 with

An API incompatibility was encountered while executing org.apache.rat:apache-rat-plugin:0.13:check: java.lang.VerifyError: JVMVRFY038 invokespecial on invalid target; class=die/verwandlung/GregorSamsa, method=template$dot$0(Lcom/sun/org/apache/xalan/internal/xsltc/DOM;Lcom/sun/org/apache/xml/internal/dtm/DTMAxisIterator;Lcom/sun/org/apache/xml/internal/serializer/SerializationHandler;I)V, pc=698
[ERROR] Exception Details:
[ERROR] Location:
[ERROR] die/verwandlung/GregorSamsa.template$dot$0(Lcom/sun/org/apache/xalan/internal/xsltc/DOM;Lcom/sun/org/apache/xml/internal/dtm/DTMAxisIterator;Lcom/sun/org/apache/xml/internal/serializer/SerializationHandler;I)V @698: JBinvokespecial
[ERROR] Reason:
[ERROR] The method invoked via invokespecial is invalid.

This was when attempting to reproduce #7539 (my environment doesn't yet have the fix), but I had also cleaned up my build environment by removing all of my ~/.m2 - as well as pulled current updates from master. I had previously only seen this failure with a nightly build

Diagnostic files

Full log from the exceptions is posted to https://gist.github.com/f640872cc6608793e8988f54c68363d7

@DanHeidinga
Copy link
Member

@planetf1 Do you have the .class file that causes the verifyerror? We need to see the bytecodes to see why the check in the verifier is failing.

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

The maven dependency I'm using is from https://repo1.maven.org/maven2/org/apache/rat/apache-rat-plugin/0.13/ ie with the jar at https://repo1.maven.org/maven2/org/apache/rat/apache-rat-plugin/0.13/apache-rat-plugin-0.13.jar

but I'm unclear which class specifically you need ?

@DanHeidinga
Copy link
Member

The VerifyError is flagging a problem in:

die/verwandlung/GregorSamsa.template$dot$0(Lcom/sun/org/apache/xalan/internal/xsltc/DOM;Lcom/sun/org/apache/xml/internal/dtm/DTMAxisIterator;Lcom/sun/org/apache/xml/internal/serializer/SerializationHandler;I)V @698: JBinvokespecial

So I need the die/verwandlung/GregorSamsa class so I can look at the bytecodes for the template$dot$0(Lcom/sun/org/apache/xalan/internal/xsltc/DOM;Lcom/sun/org/apache/xml/internal/dtm/DTMAxisIterator;Lcom/sun/org/apache/xml/internal/serializer/SerializationHandler;I)V method

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

I found a few hits like https://stackoverflow.com/questions/49758534/java-lang-verifyerror-class-gregorsamsa-method-illegal-target-of-jump - or more helpfully perhaps http://www.jguru.com/faq/view.jsp?EID=1340163 ie this is in the XLST processor which I assume is bundled with the jvm?

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

This is an AdoptOpenJDK build

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

also some explanations at #6642

@DanHeidinga
Copy link
Member

I found a few hits like https://stackoverflow.com/questions/49758534/java-lang-verifyerror-class-gregorsamsa-method-illegal-target-of-jump

That link shows an issue with branch/jump targets and should be fixed in JDK10. According to the java -version above, you're using JDK11 so it may be related to the XLST processor but shouldn't be the same issue.

We need the bytecode to figure out what's going on

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

Can you recommend how to get the byte code? Any ideas? This class is being created dynamically by the XSLT processing as far as I can see (an area I'm unfamiliar with)

@DanHeidinga
Copy link
Member

I don't know enough about the XSLT processing to know how to dump the generated files. @hzongaro Are you aware of a -D define or other option to have the XSLT transformer dump the generated classes to disk?

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

Another observation. The exact stage of the build the failure happens on varies. Sometimes it may get all the way through (supposition - since I originally only noticed this on a nightly). For example I ran now and it got 50% further. Still failing with the same report, managing RAT.

Our 'master' code is changing - a few updates constantly through the day, but it may also point to odd corruption perhaps?

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

Only seen on MacOS so far. trying same code/version on linux, but not seen yet.

@hzongaro
Copy link
Member

hzongaro commented Nov 4, 2019

Dan @DanHeidinga, I'm not aware of a -D option or other JVM option that can be specified to have the XSLT processor dump the generated classes. The only way I'm aware of doing it is programmatically.

There is a TransformerFactory.setAttribute method that can be used to specify settings on the XSLT processor. In particular, with XSLTC, there are attributes that can be used to indicate that generated classes should be written to a particular directory or a jar file - in particular generate-translet, destination-directory, jar-name.

The difficulty of course is that @planetf1 might not have access to the source that's processing the XSLT stylesheet in order to set those attributes on the TransformerFactory instance. If they do not, they might have to do something like override the default TransformerFactory implementation by specifying -Djavax.xml.transform.TransformerFactory=<TransformerFactoryImplemetationClass>. This implementation could delegate to com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl, but would also call setAttribute on the underlying implementation to request that the classes by written out, using some sort of sequence number on the destination directories and/or jar names.

@planetf1
Copy link
Author

planetf1 commented Nov 4, 2019

Thanks for the update, but I think that will be rather tricky. Technically all of this will be open source but it involves the implementation of the rat plugin - not familiar, time...

The unfortunate issue is I was hoping to ensure we can run with J9, but I have hit a variety of issues - being worked through I now, but for now I think I'll have to recommend sticking with hotspot.

@pshipton
Copy link
Member

pshipton commented Nov 4, 2019

Can we dump a core file when the VerifyError occurs and get the bytecode that way?

"-Xdump:system:events=systhrow,filter=java/lang/VerifyError,request=exclusive+prepwalk"

@planetf1
Copy link
Author

planetf1 commented Nov 5, 2019

Typically I cannot get a failure this morning. I've set MAVEN_OPTS with the above, so if/when I get a failure hopefully I'll find a core (also added a ulimit -c unlimited in my build invocation).

@planetf1
Copy link
Author

planetf1 commented Nov 5, 2019

After around 8 attempts I had another failure
with

MAVEN_OPTS=-Xdump:system:events=systhrow,filter=java/lang/VerifyError,request=exclusive+prepwalk
10:16:06,717 [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.13:check (rat-check) on project it-infrastructure-spring: Could not load the translet class 'die.verwandlung.GregorSamsa'.: JVMCFRE068 class name is invalid; class=, offset=0 -> [Help 1]

Also ulimit -a shows

-c: core file size (blocks)         unlimited

However I can't see any core file in the current directory from where the sw was launched, nor the relevant maven directory nor indeed anywhere down the tree

@DanHeidinga
Copy link
Member

Can you set the OPENJ9_JAVA_OPTIONS environment variable to "-Xdump:system:events=systhrow,filter=java/lang/VerifyError,request=exclusive+prepwalk"? I'm not sure how MAVEN_OPTS gets passed through

@planetf1
Copy link
Author

planetf1 commented Nov 5, 2019

I will try that too, though I've used it for other JVM options and it appears to pass through fine (or through JAVA_TOOL_OPTIONS). But will set OPENJ9_JAVA_OPTIONS

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

Needed to do a build today and hit the issue again - those options are set but I still see no evidence of a core?

15:23:36,466 [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.13:check (rat-check) on project governance-daemon-connectors: Could not load the translet class 'die.verwandlung.GregorSamsa'.: JVMCFRE068 class name is invalid; class=, offset=0 -> [Help 1]
15:23:36,467 [ERROR]
15:23:36,467 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
15:23:36,467 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
15:23:36,467 [ERROR]
15:23:36,467 [ERROR] For more information about the errors and possible solutions, please read the following articles:
15:23:36,467 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
15:23:36,467 [ERROR]
15:23:36,467 [ERROR] After correcting the problems, you can resume the build with the command
15:23:36,468 [ERROR]   mvn <args> -rf :governance-daemon-connectors
➜  egeria git:(master) find . -name core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/bower/lib/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/sinon/lib/sinon/util/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/core-js/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/core-js/library/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/polymer-analyzer/lib/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/@babel/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/web-component-tester/node_modules/sinon/lib/sinon/util/core
./open-metadata-implementation/user-interfaces/access-services-user-interface/tools/node_modules/polymer-cli/node_modules/rx/ts/core
➜  egeria git:(master) env | grep J9
OPENJ9_JAVA_OPTIONS=-Xdump:system:events=systhrow,filter=java/lang/VerifyError,request=exclusive+prepwalk
➜  egeria git:(master) find . -name '*javacore*'
➜  egeria git:(master)

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

➜  egeria git:(master) ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       2784
-n: file descriptors                256

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

New information - I just hit the same issue on linux (Ubuntu 19.10) - though I did not have the same environment set there to capture debug:

Again

15:30:35,183 [ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.13:check (rat-check) on project platform-services: Could not load the translet class 'die.verwandlung.GregorSamsa'.: JVMCFRE068 class name is invalid; class=, offset=0 -> [Help 1]
➜  egeria git:(master) java -version
openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.5+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.17.0, JRE 11 Linux amd64-64-Bit Compressed References 20191016_358 (JIT enabled, AOT enabled)
OpenJ9   - 77c1cf708
OMR      - 20db4fbc
JCL      - 2a7af5674b based on jdk-11.0.5+10)

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

No sign of cores on linux either

@pshipton
Copy link
Member

pshipton commented Nov 6, 2019

Try just -Xdump:system:events=systhrow,request=exclusive+prepwalk without the filter. I don't think any other exceptions would be thrown from the VM normally, and the class name is invalid may be a ClassFormatError instead of a VerifyError.

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

Tried that, and this time my maven build failed in 27s which is pretty impressive. Same error. No sign of a core. This was on ubuntu. Clearly the RAT plugin is the cause, yet I've never seen this with hotspot, nor with earlier J9 builds which I've been using intermittently since macOS came out from adoptopenjdk. The version of the rat plugin is current, and has been around for about a year.

@DanHeidinga
Copy link
Member

JVMCFRE068 class name is invalid;

This is a ClassFormatError. Can you try with "-Xdump:system:events=systhrow,filter=java/lang/ClassFormatError,request=exclusive+prepwalk" in the OPENJ9_JAVA_OPTIONS?

We may need to hack a VM that dumps the classfile on a verifyerror/classformaterror....

@pshipton
Copy link
Member

pshipton commented Nov 6, 2019

I expect that if -Xdump:system:events=systhrow,request=exclusive+prepwalk didn't work, then the version with the filter won't work either. The one without the filter should dump a core for any systhrow.

Another possibility is the core is going to an unexpected place. If the -Xdump command triggers, there are messages printed to stderr like the following.

JVMDUMP039I Processing dump event "throw", detail "java/lang/ClassNotFoundException" at 2019/11/06 16:46:23 - please wait.
JVMDUMP032I JVM requested System dump using 'peter/openj9/jdk8u212-b04/core.20191106.164623.20183.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.20200.

JVMDUMP010I System dump written to peter/openj9/jdk8u212-b04/core.20191106.164623.20183.0001.dmp
JVMDUMP013I Processed dump event "throw", detail "java/lang/ClassNotFoundException".

@planetf1
Copy link
Author

planetf1 commented Nov 6, 2019

I don't see any messages like this, but of course I am running a maven build composed of 200+ modules, and within that the rat plugin is being invoked >200 times. The failure seems to happen randomly. I have no idea if anything about that environment is masking properties sent to the jvm. So there's a fair gap between the user perceived problem and the nitty gritty in the jvm. I can try to repeatedly call rat directly, but unsure I'll be able to repro.

I could try to go back to 11.0.4 and ensure the issue doesn't happen there, again. Would that help given any thoughts on changes going to 11.0.5 (or the packaging from adoptopenjdk)

@planetf1 planetf1 closed this as completed Nov 6, 2019
@fjeremic
Copy link
Contributor

I've been speaking with @cathyzhyi offline about this. We have limited the issue down to method handle inlining in the inliner (optimization id=14). Yi is the subject matter expert in this area and will continue the investigation where I left off. Hopefully we can get to the bottom of this one soon as we are inching closer.

@DanHeidinga
Copy link
Member

@cathyzhyi if you comment in this issue, I can assign it to you

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Mar 25, 2020

@cathyzhyi if you comment in this issue, I can assign it to you

@DanHeidinga please assign to me thanks!

@pshipton
Copy link
Member

We're past Milestone 2 for the 0.20.0 release, moving this forward to the next release.
If a low risk fix does become available in the next couple of weeks, please raise it for consideration.

@loriadi
Copy link

loriadi commented Mar 30, 2020

Any new news, Cathy?

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Mar 31, 2020

So the problem is very likely to be related to incorrect inlining and around 500 hundred methods are inlined. I've managed to find the inlining branch that's responsible for the messed up constant pool entry which still involves 100 methods, unfortunately. Will keep digging through the coredumps to narrow down to a smaller subset of problematic inlined methods.

@loriadi
Copy link

loriadi commented Apr 1, 2020

Thanks for the update!

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Apr 2, 2020

The problem happens because runMacro can expand the following code in java/lang/invoke/FilterArgumentsHandle.invokeExact_thunkArchetype_X into multiple invokeExact calls https://github.com/eclipse/openj9/blob/06fa4ce85ed4632d1723a1ad346b8e00e9269b60/jcl/src/java.base/share/classes/java/lang/invoke/FilterArgumentsHandle.java#L86-L89. However, our inliner currently can't handle the case where multiple callsites sharing the same bytecode info.

In this case, the 3 arguments filter argument handle wants to filter are the class_name method_name and signature string objects used to create the interfaceMethodRef CP entry. The prex argue info is created for the class_name callsites but mixed up with the callNode from signature because the bytecode info are the same, resulting in signature corrupting temps slots of class_name which explains the string appearing in wrong constant pool entry.

Will try with a tactical fix first. To fix inliner properly to be able to handle different calls with the same bytecode info needs more consideration.

@loriadi
Copy link

loriadi commented Apr 2, 2020

So excited that you figured this out, Cathy!!

@pshipton
Copy link
Member

pshipton commented Apr 2, 2020

I'm going to tentatively move this back to the 0.20.0 milestone, hoping that a fix will be available and can go into the release.

@vij-singh
Copy link

@cathyzhyi Any update on the tactical fix? Many thanks!

@cathyzhyi
Copy link
Contributor

cathyzhyi commented Apr 7, 2020

@cathyzhyi Any update on the tactical fix? Many thanks!

@vij-singh After more investigation, I find that this problem should only happen when using optiondisableMethodHandleInlineWithoutPeeking to disable without peeking. The without Peeking version was delivered early Jan so by default builds after Jan shouldn't have this issue. However, given disableMethodHandleInlineWithoutPeeking is supposed to be a workaround when the without peeking doesn't work we should still fix it. Fixes are still under testing: #9158 and eclipse-omr/omr#5035

@pshipton
Copy link
Member

pshipton commented Apr 8, 2020

It's too late to put any fixes into the 0.20.0 release, moving forward.

@andrewcraik
Copy link
Contributor

fixes should be merged

@pshipton
Copy link
Member

@andrewcraik @cathyzhyi if the fixes are merged, can this be closed? The OMR change has promoted.

@andrewcraik
Copy link
Contributor

Yes this should be fixed - closing - please reopen if further issues are seen once the update has been picked up.

@vij-singh
Copy link

Great news - thanks folks!
@loriadi Lori - just making sure you saw this

@loriadi
Copy link

loriadi commented Apr 15, 2020

Thank you everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests