-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
The following JVM options are set: JVM is bundled jvm.otions has these: |
Sometimes this brings down the cluster, sometimes the cluster appears to recover. It probably depends on exactly where this exception happens. Here are some snippets of stacktraces that we see:
In one particular case, I see hundreds of these, all appearing around the same time:
|
Linking the JDK issue: https://bugs.openjdk.org/browse/JDK-8329528 |
We are having same behavior of crashes and restarts , upgraded from 8.9.0
Any known workarounds that may reduce the impact? |
Since this seems to be very likely a JDK issue: as a workaround, when it's possible (i.e. self-hosted clusters), would it make sense to use a local JDK21 installation in place of the bundled JDK22? |
@ldematte Thank you for getting back, this is what I actually did, built a custom docker image with JDK 21.0.3 ( beta) which is the release that has a potential fix https://bugs.openjdk.org/browse/JDK-8319548 .. Still observing the outcome. |
Ah this is very interesting. To confirm: the issue is still happening even with JDK 21.0.3, correct? To be precise, since there are multiple JDK vendors, can you please post the output of Additionally, can you please post the stacktraces, even if the same (some times there is some small differences, and also differences with the failure sites) |
@ChrisHegarty, We are currently observing if this fix prevents the crash from happening, until the moment no crashes, but this is due to minimal load on the cluster ... load time is about to start and generally midday is where things go nasty. So will keep you posted whether it works or breaks. I used Adoptium Nightly build
If new traces happen will post here as well. |
We have seen similar stack trace happening through the
|
Other stack traces that may have led to data corruption (Lucene segments files corrupted):
|
It seems possible that the elasticsearch benchmarks are running into the same underlying JDK 22 bug. [EDIT: remove inaccessible link] |
@ChrisHegarty: above link for a elasticsearch-benchmarks repo does not work (404) and I can't find the correct repo myselves. Could you fix it? |
Unfortunately, (and my mistake), the aforementioned link is not public, sorry. There is little new information there anyway. What I found there is an interesting hs_err_pidxxxxx.log, which I subsequently attached to the OpenJDK Jira issue. Additionally, the fact that the crash was observed shortly after the upgrade to JDK 22 helps us to confirm that it is indeed specific to the upgrade to JDK 22. |
@ChrisHegarty i can confirm that moving to JDK 21.0.3 solves the issue and actually gives a much better and more stable performance.. JDK 22 is just nasty. |
Thanks for confirming that a downgrade of the JDK (from 22 to 21.x) does not encounter the issue. I want to note that 21.0.3 is currently in Early Access (not yet GA'ed). For Elasticsearch, we're planning on downgrading (back) to JDK 21.0.2.
Yes, this is indeed a nasty bug. It's likely impact is much wider than Elastic. |
Hi, team! Did the downgrade back to JDK 21.0.2 happen in 8.13.2? |
@jesslm , yes, the docker images of Elasticsearch v 8.13.2 were downgraded to JDK 21.0.2 .. |
What about for Elasticsearch Service? |
This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates elastic#108571 relates elastic#106987
This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987
This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates elastic#108571 relates elastic#106987
This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates elastic#108571 relates elastic#106987
This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987
* Update bundled JDK to Java 22 (again) (#108654) This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987 * copy main openjdk toolchain resolver * use 2 lines for workaround * fix test * update adoptium test
Story #12345: Ultimate COTS upgrade II * Upgrade MongoDB 7.0.7 -> 7.0.8 * Upgrade ElasticSearch 7.17.19 -> 7.17.20 * Resolve issue: elastic/elasticsearch#106987 * Upgrade Prometheus & Exporters See merge request vitam/vitam!10009
Looks like this issue was reintroduced in later 7.17.x by #108654 I have a cluster on 7.17.22 that randomly crashed with:
|
#108654 made its way into 7.17.22, so if you cluster is on 7.17.20, it cannot possibly be it. |
@ldematte My apologies it's a typo, it's indeed 7.17.22
And I do use the bundled java version which is:
|
This is very strange :/ |
@panthony can you verify in the ES logs that you can see |
I cannot. This issue should not be present when either
|
@ldematte If the log is supposed to be present somewhere in "/var/log/elasticsearch/" it's nowhere to be found Edit: I do not see this change on the VM where ElasticSearch is deployed: The file
I'll try to see why, thanks for your help. Edit 2: FYI the original file from ES was replaced by another version that had slight tweaks in it, when ES was upgraded for security fixes there was no diff made around this file to see if there was any important changes. 🤦🏻 Edit 3: For the sake of completeness, the actual fix is: 6f20cba#diff-93b9226e55b0c23873222857eac0940b5d8ae09d28d3bbf1a55e6d8a73133ba7 When set on a single line it crashes with |
Thanks @panthony for the update! Btw, log location changes based on configuration, distribution, etc. |
…er JDK versions as described here: elastic/elasticsearch#106987
After upgrading Elasticsearch from 8.12.2 to 8.13.0, we see random nodes failure with the following message:
It happens intermittently with all nodes and the service stops after this.
Looking into the logs, the exception seems to happen for different tasks(first one was a refresh and this one is a write operation)
The text was updated successfully, but these errors were encountered: