-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jdk11: jtreg java/util/concurrent/ArrayBlockingQueue/WhiteBox.java crashes on MacOS #5988
Comments
@andrew-m-leonard I see a couple of different crashes in the Jenkins output. Are you able to capture the javacores / system cores / etc from the crashes to aid the investigations? |
13:26:32 [2019-06-03 08:26:32,170] Agent[1]: stderr: 12:26:32.170 0x8913300 omrport.359 * ** ASSERTION FAILED ** at ../../omr/port/common/omrmemtag.c:145: ((memoryCorruptionDetected)) |
@DanHeidinga I'm not aware of a way of getting hyc-runtimes dumps? @AdamBrousseau does the worspace get cleared? |
This might be the same issue reported at #5399, particularly #5399 (comment). |
Dumps are in the archived openjdk_test_output.tar.gz file uploaded to artifactory. |
Similar issue on Assertion failure:
|
In the cores from the And another with a crash in
|
From the javacore:
Registers
so rdx is the bad value. Likely the Need to get this into ddr and see if the avl tree is corrupt |
While investigating a crash in eclipse-openj9/openj9#5988, I had to look at the AVL code and there were cases where it wasn't clear all the variables had been initialized before they were read. This patch makes initialization explicit and ensures variables are initialized at their declaration site. This patch is code cleanup to make it easier to reason about the AVL code. Signed-off-by: Dan Heidinga <[email protected]>
I've opened a PR at OMR to make it easier to reason about the AVL code (eclipse-omr/omr#3944) and while it makes it easier to read the code, it won't fix this issue. |
While investigating a crash in eclipse-openj9/openj9#5988, I had to look at the AVL code and there were cases where it wasn't clear all the variables had been initialized before they were read. This patch makes initialization explicit and ensures variables are initialized at their declaration site. This patch is code cleanup to make it easier to reason about the AVL code. Signed-off-by: Dan Heidinga <[email protected]>
It doesn't look like we'll have a solution ready for this in the 0.15 release. |
@theresa-m Can you update this with how the investigation is going? |
The investigation is ongoing. Still don't have a fix in mind. |
Hey @dmitripivkine ran into this going through one of the possible traces for this issue. This slot is being treated as J9Object in the code but clearly contains UTF8 string data. Is that normal behavior?
|
If you check caller of
I guess there is a corruption in the hash table. Please let me know if you need help, I can take a look to system core. |
@theresa-m kindly generated a few system cores using latest JVM, so DDR for Mac now works. |
@theresa-m had mentioned that backing out this PR prevents the failures: #5301 but never of us have found the smoking gun on why this triggers the problem |
The problematic segment written out-of-border is a memory allocated for ROM class for |
There is an example of debugging for one of cores (ask me for details if it is not there):
As far as it is crash we should see crash record:
From here we can see not only registers but also crashing point in the code
(The one of reasons I selected this core for debugging is all java threads stacks are walkable at safe point. Another reason I am a GC Team member so I more familiar with GC structures). GC executes Clearable phase (remove references to dead objects from JVM structures. The crash occur an attempt to iterate through String Table, so looks like something is wrong here.
As you see both of attempts failed and we can reasonably guess the reason is the same as for crash - something in String Table infrastructure is broken.
We are on the right way, you can find an address of String Table in R12 in crash info.
We need to inspect correctness of each of this hash table but from our shortcut we know that problematic hash table is
Hash table can be converted to AVL tree bot not in this case (
The easiest way to inspect J9Pool is just walk it:
Oops... looking to pool puddle list:
Trying to get first puddle:
This does not look correct. Looking to raw memory:
This is not a valid pool puddle. And obviously original crash occur here: we can see value we crashed at There are two possibilities at this point:
There is a strong evidence that we have second case. OpenJ9 uses header and footer tags for allocated memory. Header has eye-catcher
This tells us that footer eye-catcher is expected to be at
The data written over looks "contiguous" so there is reasonable guess that chunk of memory allocated lower has been overwritten out-of-bound. Lets try to figure out what it is. We need to find header tag for this allocation. There it is:
Is there somebody around to keep reference to this memory?
The
and problematic pointer is a pointer to ROM class. So the question is what is special in |
TODO: determine if this issue has previously shipped or not |
Given this was discovered Jun 3, the problem is in the 0.15 release which occurred in July. Moving to the next milestone. |
This should now be fixed, the test can be unexcluded. |
I've opened this pr to reinclude the test adoptium/aqa-tests#1350 |
Added a note in PR 1350, is it also to be reincluded for jdk13? as its excluded in that version also, will merge PR once that is updated or clarified |
Failure link
https://hyc-runtimes-jenkins.swg-devops.com/view/Test_grinder/job/Grinder_Advanced/10/console
If the link is not public, instead include
12:52:16 openjdk version "11.0.3" 2019-04-16
12:52:16 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
12:52:16 Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.14.3, JRE 11 Mac OS X amd64-64-Bit Compressed References 20190531_242 (JIT enabled, AOT enabled)
12:52:16 OpenJ9 - b8ab016
12:52:16 OMR - b56045d2
12:52:16 JCL - dcdc97f9dc based on jdk-11.0.3+7)
Optional info
Failure output
The text was updated successfully, but these errors were encountered: