JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987

ChrisHegarty · 2024-04-02T08:53:00Z

After upgrading Elasticsearch from 8.12.2 to 8.13.0, we see random nodes failure with the following message:

[2024-03-31T00:01:29,450][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [xxx] fatal error in thread [elasticsearch[xxx][write][T#7]], exiting
java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.concurrent.locks.Lock
at org.elasticsearch.common.util.concurrent.ReleasableLock.acquire(ReleasableLock.java:43) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.index.translog.Translog.add(Translog.java:578) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:1223) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1072) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:997) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:915) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:378) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:235) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:305) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:151) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:216) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.13.0.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1570) ~[?:?

It happens intermittently with all nodes and the service stops after this.

Looking into the logs, the exception seems to happen for different tasks(first one was a refresh and this one is a write operation)

[2024-04-01T15:21:46,691][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [xxx] fatal error in thread [elasticsearch[xxx][refresh][T#2]], exiting
java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Collection
	at org.apache.lucene.index.ReadersAndUpdates.getNumDVUpdates(ReadersAndUpdates.java:168) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.ReaderPool.anyDocValuesChanges(ReaderPool.java:384) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:5776) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:455) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:133) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.index.engine.Engine.refreshNeeded(Engine.java:1093) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.lambda$scheduledRefresh$47(IndexShard.java:3919) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:356) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3915) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:998) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1134) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:137) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.13.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) ~[?:?]

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-04-02T08:53:24Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

ChrisHegarty · 2024-04-02T09:50:33Z

The following JVM options are set:

JVM is bundled jvm.otions has these:
-Xmx28g
-Xms28g
-XX:+UseG1GC
--add-modules=jdk.incubator.vector
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=logs/hs_err_pid%p.log
-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,level,pid,tags:filecount=32,filesize=64m

ChrisHegarty · 2024-04-02T13:27:52Z

Sometimes this brings down the cluster, sometimes the cluster appears to recover. It probably depends on exactly where this exception happens.

Here are some snippets of stacktraces that we see:

java.lang.ClassCastException: class Ljdk.internal.vm.FillerArray; cannot be cast to class
  java.nio.ByteBuffer (Ljdk.internal.vm.FillerArray; and java.nio.ByteBuffer are in module java.base of loader 'bootstrap') 
  at [email protected]/io.netty.buffer.PoolChunk.allocate(PoolChunk.java:354)  
  at [email protected]/io.netty.buffer.PoolChunkList.allocate(PoolChunkList.java:108) 
  at [email protected]/io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204)
...

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the
  requested interface java.util.concurrent.locks.Lock
  at org.elasticsearch.common.util.concurrent.ReleasableLock.acquire(ReleasableLock.java:43) ~[elasticsearch-8.13.0.jar:?]
  at org.elasticsearch.index.translog.Translog.add(Translog.java:578) ~[elasticsearch-8.13.0.jar:?]
  at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:1223) ~[elasticsearch-8.13.0.jar:?]
  at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1072) ~[elasticsearch-8.13.0.jar:?]
  ...

In one particular case, I see hundreds of these, all appearing around the same time:

java.lang.ClassCastException: class Ljdk.internal.vm.FillerArray; cannot be cast to class 
  org.elasticsearch.index.engine.LiveVersionMap$VersionLookup (Ljdk.internal.vm.FillerArray; is in module java.base of loader 'bootstrap'; org.elasticsearch.index.engine.LiveVersionMap$VersionLookup is in module [email protected] of loader 'app')
  at [email protected]/co.elastic.elasticsearch.stateless.engine.StatelessLiveVersionMapArchive.getRamBytesUsed(StatelessLiveVersionMapArchive.java:156)
  at [email protected]/org.elasticsearch.index.engine.LiveVersionMap.ramBytesUsedForRefresh(LiveVersionMap.java:483)
  at [email protected]/org.elasticsearch.index.engine.InternalEngine.getIndexBufferRAMBytesUsed(InternalEngine.java:2573)
  at [email protected]/org.elasticsearch.index.shard.IndexShard.getIndexBufferRAMBytesUsed(IndexShard.java:2355)
...

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the
  requested interface java.util.Map
  at [email protected]/org.apache.lucene.index.ReadersAndUpdates.getNumDVUpdates(ReadersAndUpdates.java:168)
  at [email protected]/org.apache.lucene.index.ReaderPool.anyDocValuesChanges(ReaderPool.java:384)
  at [email protected]/org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:5776)
  at [email protected]/org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:455)
  at [email protected]/org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:133)
  at [email protected]/org.elasticsearch.index.engine.Engine.refreshNeeded(Engine.java:1093)
  ...

ChrisHegarty · 2024-04-02T13:35:51Z

Linking the JDK issue: https://bugs.openjdk.org/browse/JDK-8329528

aydasraf · 2024-04-03T08:57:10Z

We are having same behavior of crashes and restarts , upgraded from 8.9.0

[2024-04-03T08:08:49,474][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [prod-elasticsearch-hot-tier-2] fatal error in thread [elasticsearch[prod-elasticsearch-hot-tier-2][refresh][T#7]], exiting
java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Map
	at org.apache.lucene.index.ReadersAndUpdates.getNumDVUpdates(ReadersAndUpdates.java:168) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.ReaderPool.anyDocValuesChanges(ReaderPool.java:384) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.IndexWriter.nrtIsCurrent(IndexWriter.java:5776) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:455) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.FilterDirectoryReader.isCurrent(FilterDirectoryReader.java:133) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.index.engine.Engine.refreshNeeded(Engine.java:1093) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.lambda$scheduledRefresh$47(IndexShard.java:3919) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:356) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3915) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:998) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1134) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:137) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.13.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) ~[?:?]

Any known workarounds that may reduce the impact?

ldematte · 2024-04-04T08:11:03Z

Since this seems to be very likely a JDK issue: as a workaround, when it's possible (i.e. self-hosted clusters), would it make sense to use a local JDK21 installation in place of the bundled JDK22?

aydasraf · 2024-04-04T08:15:16Z

@ldematte Thank you for getting back, this is what I actually did, built a custom docker image with JDK 21.0.3 ( beta) which is the release that has a potential fix https://bugs.openjdk.org/browse/JDK-8319548 .. Still observing the outcome.

ChrisHegarty · 2024-04-04T08:20:51Z

@ldematte Thank you for getting back, this is what I actually did, built a custom docker image with JDK 21.0.3 ( beta) which is the release that has a potential fix https://bugs.openjdk.org/browse/JDK-8319548 .. Still observing the outcome.

Ah this is very interesting.

To confirm: the issue is still happening even with JDK 21.0.3, correct? To be precise, since there are multiple JDK vendors, can you please post the output of java -version of this JDK.

Additionally, can you please post the stacktraces, even if the same (some times there is some small differences, and also differences with the failure sites)

aydasraf · 2024-04-04T08:26:40Z

@ChrisHegarty, We are currently observing if this fix prevents the crash from happening, until the moment no crashes, but this is due to minimal load on the cluster ... load time is about to start and generally midday is where things go nasty. So will keep you posted whether it works or breaks.

I used Adoptium Nightly build

openjdk version "21.0.3-beta" 2024-04-16
OpenJDK Runtime Environment Temurin-21.0.3+7-202403202002 (build 21.0.3-beta+7-ea)
OpenJDK 64-Bit Server VM Temurin-21.0.3+7-202403202002 (build 21.0.3-beta+7-ea, mixed mode, sharing)

If new traces happen will post here as well.

romain-chanu · 2024-04-05T02:39:32Z

We have seen similar stack trace happening through the pruneDeletedTombstones method:

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Map
	at org.elasticsearch.index.engine.LiveVersionMap.pruneTombstones(LiveVersionMap.java:437) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.pruneDeletedTombstones(InternalEngine.java:2378) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.maybePruneDeletes(InternalEngine.java:1924) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.shard.IndexShard.lambda$scheduledRefresh$47(IndexShard.java:3940) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:356) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3915) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:998) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1134) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:137) ~[elasticsearch-8.13.1.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.13.1.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) ~[?:?]

romain-chanu · 2024-04-05T07:25:37Z

Other stack traces that may have led to data corruption (Lucene segments files corrupted):

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Map
	at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.terms(PerFieldPostingsFormat.java:353) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.CodecReader.terms(CodecReader.java:132) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.FilterLeafReader.terms(FilterLeafReader.java:415) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.<init>(PerThreadIDVersionAndSeqNoLookup.java:68) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.lucene.uid.PerThreadIDVersionAndSeqNoLookup.<init>(PerThreadIDVersionAndSeqNoLookup.java:111) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.getLookupState(VersionsAndSeqNoResolver.java:66) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.lucene.uid.VersionsAndSeqNoResolver.timeSeriesLoadDocIdAndVersion(VersionsAndSeqNoResolver.java:140) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.resolveDocVersion(InternalEngine.java:1021) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.planIndexingAsPrimary(InternalEngine.java:1333) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.indexingStrategyForOperation(InternalEngine.java:1310) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:1172) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:1072) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:997) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:915) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:378) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:235) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:305) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:151) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:79) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:216) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:984) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.13.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) ~[?:?]

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Collection
	at org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:554) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:251) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3982) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:598) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:48) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:27) ~[elasticsearch-8.13.0.jar:?]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:461) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:441) ~[elasticsearch-8.13.0.jar:?]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.10.0.jar:?]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.10.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:2047) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.lambda$maybeRefresh$8(InternalEngine.java:2020) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:270) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:2020) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.lambda$scheduledRefresh$47(IndexShard.java:3935) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:356) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3915) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:998) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1134) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:137) ~[elasticsearch-8.13.0.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917) ~[elasticsearch-8.13.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) ~[?:?]

ChrisHegarty · 2024-04-05T10:31:48Z

It seems possible that the elasticsearch benchmarks are running into the same underlying JDK 22 bug. [EDIT: remove inaccessible link]

tschatzl · 2024-04-05T11:29:04Z

@ChrisHegarty: above link for a elasticsearch-benchmarks repo does not work (404) and I can't find the correct repo myselves. Could you fix it?

ChrisHegarty · 2024-04-05T11:39:15Z

above link does not work and I can't find the correct repo myselves. Could you fix it?

Unfortunately, (and my mistake), the aforementioned link is not public, sorry. There is little new information there anyway. What I found there is an interesting hs_err_pidxxxxx.log, which I subsequently attached to the OpenJDK Jira issue.

Additionally, the fact that the crash was observed shortly after the upgrade to JDK 22 helps us to confirm that it is indeed specific to the upgrade to JDK 22.

aydasraf · 2024-04-05T11:43:36Z

@ChrisHegarty i can confirm that moving to JDK 21.0.3 solves the issue and actually gives a much better and more stable performance.. JDK 22 is just nasty.

ChrisHegarty · 2024-04-05T11:51:41Z

i can confirm that moving ro JDK 21.0.3 solves the issue

Thanks for confirming that a downgrade of the JDK (from 22 to 21.x) does not encounter the issue. I want to note that 21.0.3 is currently in Early Access (not yet GA'ed). For Elasticsearch, we're planning on downgrading (back) to JDK 21.0.2.

and actually gives much more better and stable performance.. JDK 22 is nasty

Yes, this is indeed a nasty bug. It's likely impact is much wider than Elastic.

jesslm · 2024-04-10T15:02:59Z

Hi, team! Did the downgrade back to JDK 21.0.2 happen in 8.13.2?

aydasraf · 2024-04-10T15:11:53Z

Hi, team! Did the downgrade back to JDK 21.0.2 happen in 8.13.2?

@jesslm , yes, the docker images of Elasticsearch v 8.13.2 were downgraded to JDK 21.0.2 ..

jesslm · 2024-04-10T15:35:07Z

What about for Elasticsearch Service?

ChrisHegarty · 2024-04-10T15:43:47Z

https://www.elastic.co/blog/elastic-stack-8-13-2-released

Downgrade the bundled JDK to JDK 21.0.2 PRs:

This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates elastic#108571 relates elastic#106987

This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987

This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates elastic#108571 relates elastic#106987

This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987

* Update bundled JDK to Java 22 (again) (#108654) This commit re-bumps the bundled JDK to Java 22 now that we have a tested workaround for the G1GC bug (https://bugs.openjdk.org/browse/JDK-8329528). relates #108571 relates #106987 * copy main openjdk toolchain resolver * use 2 lines for workaround * fix test * update adoptium test

…7.20 Resolve issue: elastic/elasticsearch#106987

Story #12345: Ultimate COTS upgrade II * Upgrade MongoDB 7.0.7 -> 7.0.8 * Upgrade ElasticSearch 7.17.19 -> 7.17.20 * Resolve issue: elastic/elasticsearch#106987 * Upgrade Prometheus & Exporters See merge request vitam/vitam!10009

panthony · 2024-07-31T06:17:17Z

Looks like this issue was reintroduced in later 7.17.x by #108654

I have a cluster on 7.17.22 that randomly crashed with:

java.lang.IncompatibleClassChangeError: Class Ljdk.internal.vm.FillerArray; does not implement the requested interface java.util.Collection
	at java.util.Collections$UnmodifiableCollection.stream(Collections.java:1131) ~[?:?]
	at org.elasticsearch.index.seqno.ReplicationTracker.getRetentionLeases(ReplicationTracker.java:250) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.getRetentionLeases(IndexShard.java:2638) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.syncRetentionLeases(IndexShard.java:2756) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.IndexService.lambda$sync$19(IndexService.java:967) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.lambda$runUnderPrimaryPermit$26(IndexShard.java:3496) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$23(IndexShard.java:3450) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:219) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:253) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:199) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3421) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3409) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.shard.IndexShard.runUnderPrimaryPermit(IndexShard.java:3499) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.IndexService.sync(IndexService.java:967) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.IndexService.syncRetentionLeases(IndexService.java:951) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.IndexService.access$900(IndexService.java:102) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.index.IndexService$AsyncRetentionLeaseSyncTask.runInternal(IndexService.java:1141) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:133) ~[elasticsearch-7.17.22.jar:7.17.22]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) ~[elasticsearch-7.17.22.jar:7.17.22]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1570) [?:?]

ldematte · 2024-07-31T07:40:08Z

#108654 made its way into 7.17.22, so if you cluster is on 7.17.20, it cannot possibly be it.
Also, 7.17.20 downgraded the JDK to version 21.0.2, the last known version that was not affected by the bug.
Are you running with the bundled JDK, or using your own Java version?
If the former, can you please double check the ES version?
If the latter, can you check your Java version, and ensure is not one of those affected by https://bugs.openjdk.org/browse/JDK-8329528?

panthony · 2024-07-31T09:33:56Z

@ldematte My apologies it's a typo, it's indeed 7.17.22

{
  "name" : "xx",
  "cluster_name" : "xx",
  "cluster_uuid" : "P_JgtuvtRFSDKbuk-JdbaQ",
  "version" : {
    "number" : "7.17.22",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "38e9ca2e81304a821c50862dafab089ca863944b",
    "build_date" : "2024-06-06T07:35:17.876121680Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.3",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And I do use the bundled java version which is:

/usr/share/elasticsearch/jdk/bin/java --version
openjdk 22.0.1 2024-04-16
OpenJDK Runtime Environment (build 22.0.1+8-16)
OpenJDK 64-Bit Server VM (build 22.0.1+8-16, mixed mode, sharing)

ldematte · 2024-08-01T08:27:54Z

This is very strange :/
I see the workaround was backported to 7.17 too (#108631) as well as the re-upgrade to JDK 22 (#108689)
@ChrisHegarty can you think of anything that can explain this?

ldematte · 2024-08-01T08:29:19Z

@panthony can you verify in the ES logs that you can see -XX:+UnlockDiagnosticVMOptions -XX:G1NumCollectionsKeepPinned=10000000 in the java options (should be very early in the logs after startup).

ChrisHegarty · 2024-08-01T11:42:58Z

This is very strange :/ I see the workaround was backported to 7.17 too (#108631) as well as the re-upgrade to JDK 22 (#108689) @ChrisHegarty can you think of anything that can explain this?

I cannot. This issue should not be present when either

when on a release < JDK 22.0.2 and the correct JVM flags are set (as above), OR
when on a release >= JDK 22.0.2.

panthony · 2024-08-01T11:50:05Z

@ldematte If the log is supposed to be present somewhere in "/var/log/elasticsearch/" it's nowhere to be found

Edit:

I do not see this change on the VM where ElasticSearch is deployed:

https://github.com/elastic/elasticsearch/pull/108631/files#diff-93b9226e55b0c23873222857eac0940b5d8ae09d28d3bbf1a55e6d8a73133ba7

The file /etc/elasticsearch/jvm.options ends with:

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

I'll try to see why, thanks for your help.

Edit 2:

FYI the original file from ES was replaced by another version that had slight tweaks in it, when ES was upgraded for security fixes there was no diff made around this file to see if there was any important changes. 🤦🏻

Edit 3:

For the sake of completeness, the actual fix is:

6f20cba#diff-93b9226e55b0c23873222857eac0940b5d8ae09d28d3bbf1a55e6d8a73133ba7

When set on a single line it crashes with UnlockDiagnosticVMOptions -XX:G1NumCollectionsKeepPinned=10000000

ldematte · 2024-08-02T11:00:05Z

Thanks @panthony for the update!
It seems like you found the root cause indeed; changes to these configuration files are always kind of risky, they might break things like in this case, as we have no reasonable way to "merge" them.

Btw, log location changes based on configuration, distribution, etc.
Here you can find a summary of were to expect them by default: https://www.elastic.co/guide/en/elasticsearch/reference/current/logging.html
Or even better, you can call _nodes/settings?pretty=true and look at path.logs

…er JDK versions as described here: elastic/elasticsearch#106987

ChrisHegarty added :Core/Infra/Core Core issues without another label jvm bug Team:Core/Infra Meta label for core/infra team labels Apr 2, 2024

ChrisHegarty changed the title ~~ElasticsearchUncaughtExceptionHandler exception after upgrading to 8.13.0~~ java.lang.ClassCastException: class Ljdk.internal.vm.FillerArray; cannot be cast to class, after upgrading to 8.13.0 Apr 2, 2024

ChrisHegarty mentioned this issue Apr 5, 2024

Downgrade the bundled JDK to JDK 21.0.2 #107137

Merged

benwtrent mentioned this issue Apr 5, 2024

Strange Segment State after encoutering JDK22 bug apache/lucene#13275

Closed

ChrisHegarty changed the title ~~java.lang.ClassCastException: class Ljdk.internal.vm.FillerArray; cannot be cast to class, after upgrading to 8.13.0~~ Elasticsearch 8.13 encounters a JDK G1 bug and crashes with references [in]to jdk.internal.vm.FillerArray Apr 5, 2024

ChrisHegarty changed the title ~~Elasticsearch 8.13 encounters a JDK G1 bug and crashes with references [in]to jdk.internal.vm.FillerArray~~ JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 Apr 5, 2024

ChrisHegarty closed this as completed Apr 10, 2024

ChrisHegarty mentioned this issue Apr 27, 2024

Evaluate Early Access builds of JDK 22.0.2 #107980

Open

3 tasks

rjernst mentioned this issue May 15, 2024

Update bundled JDK to Java 22 (again) #108654

Merged

rjernst mentioned this issue May 15, 2024

Update bundled JDK to Java 22 (again) (#108654) #108689

Merged

vitam-prg pushed a commit to ProgrammeVitam/vitam that referenced this issue May 24, 2024

Story #12345: Upgrade Elasticsearch Stack from version 7.17.19 to 7.1…

b894e0f

…7.20 Resolve issue: elastic/elasticsearch#106987

luan-n-nguyen mentioned this issue Oct 31, 2024

Use openjdk 11 for hmftools-redux and update hmftools-sage jar bioconda/bioconda-recipes#51831

Merged

luan-n-nguyen added a commit to nf-core/oncoanalyser that referenced this issue Nov 4, 2024

Update REDUX container directive. Fixes multithreading crash with lat…

1f48383

…er JDK versions as described here: elastic/elasticsearch#106987

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987

JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987

ChrisHegarty commented Apr 2, 2024

elasticsearchmachine commented Apr 2, 2024

ChrisHegarty commented Apr 2, 2024

ChrisHegarty commented Apr 2, 2024 •

edited

Loading

ChrisHegarty commented Apr 2, 2024

aydasraf commented Apr 3, 2024 •

edited

Loading

ldematte commented Apr 4, 2024

aydasraf commented Apr 4, 2024

ChrisHegarty commented Apr 4, 2024

aydasraf commented Apr 4, 2024 •

edited

Loading

romain-chanu commented Apr 5, 2024 •

edited

Loading

romain-chanu commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

tschatzl commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

aydasraf commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

jesslm commented Apr 10, 2024 •

edited

Loading

aydasraf commented Apr 10, 2024 •

edited

Loading

jesslm commented Apr 10, 2024

ChrisHegarty commented Apr 10, 2024 •

edited

Loading

panthony commented Jul 31, 2024 •

edited

Loading

ldematte commented Jul 31, 2024

panthony commented Jul 31, 2024 •

edited

Loading

ldematte commented Aug 1, 2024

ldematte commented Aug 1, 2024

ChrisHegarty commented Aug 1, 2024

panthony commented Aug 1, 2024 •

edited

Loading

ldematte commented Aug 2, 2024

JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987

JDK G1 bug crashes with references [in]to jdk.internal.vm.FillerArray, when upgrading to 8.13.0 or 8.13.1 #106987

Comments

ChrisHegarty commented Apr 2, 2024

elasticsearchmachine commented Apr 2, 2024

ChrisHegarty commented Apr 2, 2024

ChrisHegarty commented Apr 2, 2024 • edited Loading

ChrisHegarty commented Apr 2, 2024

aydasraf commented Apr 3, 2024 • edited Loading

ldematte commented Apr 4, 2024

aydasraf commented Apr 4, 2024

ChrisHegarty commented Apr 4, 2024

aydasraf commented Apr 4, 2024 • edited Loading

romain-chanu commented Apr 5, 2024 • edited Loading

romain-chanu commented Apr 5, 2024 • edited Loading

ChrisHegarty commented Apr 5, 2024 • edited Loading

tschatzl commented Apr 5, 2024 • edited Loading

ChrisHegarty commented Apr 5, 2024 • edited Loading

aydasraf commented Apr 5, 2024 • edited Loading

ChrisHegarty commented Apr 5, 2024 • edited Loading

jesslm commented Apr 10, 2024 • edited Loading

aydasraf commented Apr 10, 2024 • edited Loading

jesslm commented Apr 10, 2024

ChrisHegarty commented Apr 10, 2024 • edited Loading

panthony commented Jul 31, 2024 • edited Loading

ldematte commented Jul 31, 2024

panthony commented Jul 31, 2024 • edited Loading

ldematte commented Aug 1, 2024

ldematte commented Aug 1, 2024

ChrisHegarty commented Aug 1, 2024

panthony commented Aug 1, 2024 • edited Loading

ldematte commented Aug 2, 2024

ChrisHegarty commented Apr 2, 2024 •

edited

Loading

aydasraf commented Apr 3, 2024 •

edited

Loading

aydasraf commented Apr 4, 2024 •

edited

Loading

romain-chanu commented Apr 5, 2024 •

edited

Loading

romain-chanu commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

tschatzl commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

aydasraf commented Apr 5, 2024 •

edited

Loading

ChrisHegarty commented Apr 5, 2024 •

edited

Loading

jesslm commented Apr 10, 2024 •

edited

Loading

aydasraf commented Apr 10, 2024 •

edited

Loading

ChrisHegarty commented Apr 10, 2024 •

edited

Loading

panthony commented Jul 31, 2024 •

edited

Loading

panthony commented Jul 31, 2024 •

edited

Loading

panthony commented Aug 1, 2024 •

edited

Loading