-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JFRTest#jfrPerfTest fails with Quarkus main
#236
Comments
First known bad commit seems to be 61edae46e0287ecaee7f5b3ef728ab4a6d465c02 in https://github.com/graalvm/mandrel/actions/runs/7026828114/job/19120880487 (Interestingly this started happening with the HELIDON_QUICKSTART failures #226) Last known working commit seems to be 670b43cac8587d27d7810af41c22ca3cf161df23 in https://github.com/graalvm/mandrel/actions/runs/7012992753/job/19078995046 |
The regression was introduced with quarkusio/quarkus#37347 and specifically with the runtime re-initialization of |
Inspecting the static fields of Java mode
Native mode build-time-initialized
Native mode run-time-initialized
Diffdiff --git a/native-build-time-init-mode.txt b/native-run-time-init-mode.txt
index 6444b24..d0fa1ef 100644
--- a/native-build-time-init-mode.txt
+++ b/native-run-time-init-mode.txt
@@ -1,4 +1,4 @@
-maxDirectMemory= 22 GB
+maxDirectMemory= 49 GB
logger = JdkLogger(io.netty.util.internal.PlatformDependent0)
ADDRESS_FIELD_OFFSET = 24
BYTE_ARRAY_BASE_OFFSET = 16
@@ -17,7 +17,7 @@ UNSAFE_UNAVAILABILITY_CAUSE = null
INTERNAL_UNSAFE = null
RUNNING_IN_NATIVE_IMAGE = true
IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-UNSAFE = sun.misc.Unsafe@65e6a946
+UNSAFE = sun.misc.Unsafe@4417167b
HASH_CODE_ASCII_SEED = -1028477387
HASH_CODE_C1 = -862048943
HASH_CODE_C2 = 461845907 |
Although still not clear why, setting Note however, that setting |
Faster reproducer:
In another terminal run
Note that without Enabling DEBUG logging doesn't show anything interesting and setting it to ALL results in a build failure (see quarkusio/quarkus#37598). |
Hi @zakkak I can look into this |
@roberttoyonaga no need to jump in yet, just pinging you FYI. I will keep looking further and let you know if needed. Thanks! |
Inspecting the static fields of Native mode build-time-initializeddiff --git a/native-build-time-init-mode.txt b/native-build-time-init-mode-jfr.txt
index f2e49be..3cea102 100644
--- a/native-build-time-init-mode.txt
+++ b/native-build-time-init-mode-jfr.txt
@@ -6,18 +6,18 @@ PlatformDependent0.INT_ARRAY_BASE_OFFSET = 16
PlatformDependent0.INT_ARRAY_INDEX_SCALE = 4
PlatformDependent0.LONG_ARRAY_BASE_OFFSET = 16
PlatformDependent0.LONG_ARRAY_INDEX_SCALE = 8
-PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
PlatformDependent0.EXPLICIT_NO_UNSAFE_CAUSE = null
-PlatformDependent0.ALLOCATE_ARRAY_METHOD = null
+PlatformDependent0.ALLOCATE_ARRAY_METHOD = public java.lang.Object jdk.internal.misc.Unsafe.allocateUninitializedArray(java.lang.Class,int)
PlatformDependent0.ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
PlatformDependent0.JAVA_VERSION = 21
PlatformDependent0.IS_ANDROID = false
PlatformDependent0.STORE_FENCE_AVAILABLE = true
PlatformDependent0.UNSAFE_UNAVAILABILITY_CAUSE = null
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@438c5457
PlatformDependent0.RUNNING_IN_NATIVE_IMAGE = true
PlatformDependent0.IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-PlatformDependent0.UNSAFE = sun.misc.Unsafe@970d61f
+PlatformDependent0.UNSAFE = sun.misc.Unsafe@62bf4323
PlatformDependent0.HASH_CODE_ASCII_SEED = -1028477387
PlatformDependent0.HASH_CODE_C1 = -862048943
PlatformDependent0.HASH_CODE_C2 = 461845907
@@ -31,7 +31,7 @@ PlatformDependent.MAYBE_SUPER_USER = false
PlatformDependent.CAN_ENABLE_TCP_NODELAY_BY_DEFAULT = true
PlatformDependent.UNSAFE_UNAVAILABILITY_CAUSE = null
PlatformDependent.DIRECT_BUFFER_PREFERRED = true
-PlatformDependent.MAX_DIRECT_MEMORY = 24660934656
+PlatformDependent.MAX_DIRECT_MEMORY = 24593825792
PlatformDependent.MPSC_CHUNK_SIZE = 1024
PlatformDependent.MIN_MAX_MPSC_CAPACITY = 2048
PlatformDependent.MAX_ALLOWED_MPSC_CAPACITY = 1073741824
@@ -40,22 +40,22 @@ PlatformDependent.TMPDIR = /tmp
PlatformDependent.BIT_MODE = 64
PlatformDependent.NORMALIZED_ARCH = x86_64
PlatformDependent.NORMALIZED_OS = linux
-PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@52d3654e
+PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@20d2ffcc
PlatformDependent.LINUX_OS_CLASSIFIERS = [fedora]
PlatformDependent.IS_WINDOWS = false
PlatformDependent.IS_OSX = false
PlatformDependent.IS_J9_JVM = false
PlatformDependent.IS_IVKVM_DOT_NET = false
PlatformDependent.ADDRESS_SIZE = 8
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
-PlatformDependent.DIRECT_MEMORY_LIMIT = 24660934656
-PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@71b6d77f
-PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@164d4f40
-PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = -1
-PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@47c35bd3
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0
+PlatformDependent.DIRECT_MEMORY_LIMIT = 24593825792
+PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@15f1832f
+PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@3d0d87db
+PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = 1024
+PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@59fc3392
PlatformDependent.LINUX_ID_PREFIX = ID=
PlatformDependent.LINUX_ID_LIKE_PREFIX = ID_LIKE=
PlatformDependent.BIG_ENDIAN_NATIVE_ORDER = false
-PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@e515370
+PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@26842a93
PlatformDependent.$assertionsDisabled = true Native mode run-time-initializeddiff --git a/native-run-time-init-mode.txt b/native-run-time-init-mode-jfr.txt
index 5a1aa1f..c31a84c 100644
--- a/native-run-time-init-mode.txt
+++ b/native-run-time-init-mode-jfr.txt
@@ -6,7 +6,7 @@ PlatformDependent0.INT_ARRAY_BASE_OFFSET = 16
PlatformDependent0.INT_ARRAY_INDEX_SCALE = 4
PlatformDependent0.LONG_ARRAY_BASE_OFFSET = 16
PlatformDependent0.LONG_ARRAY_INDEX_SCALE = 8
-PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
PlatformDependent0.EXPLICIT_NO_UNSAFE_CAUSE = null
PlatformDependent0.ALLOCATE_ARRAY_METHOD = null
PlatformDependent0.ALIGN_SLICE = public final java.nio.ByteBuffer java.nio.ByteBuffer.alignedSlice(int)
@@ -14,10 +14,10 @@ PlatformDependent0.JAVA_VERSION = 21
PlatformDependent0.IS_ANDROID = false
PlatformDependent0.STORE_FENCE_AVAILABLE = true
PlatformDependent0.UNSAFE_UNAVAILABILITY_CAUSE = null
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@147a7def
PlatformDependent0.RUNNING_IN_NATIVE_IMAGE = true
PlatformDependent0.IS_EXPLICIT_TRY_REFLECTION_SET_ACCESSIBLE = true
-PlatformDependent0.UNSAFE = sun.misc.Unsafe@744c41dd
+PlatformDependent0.UNSAFE = sun.misc.Unsafe@37286f61
PlatformDependent0.HASH_CODE_ASCII_SEED = -1028477387
PlatformDependent0.HASH_CODE_C1 = -862048943
PlatformDependent0.HASH_CODE_C2 = 461845907
@@ -40,22 +40,22 @@ PlatformDependent.TMPDIR = /tmp
PlatformDependent.BIT_MODE = 64
PlatformDependent.NORMALIZED_ARCH = x86_64
PlatformDependent.NORMALIZED_OS = linux
-PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@656445c5
+PlatformDependent.ALLOWED_LINUX_OS_CLASSIFIERS = [Ljava.lang.String;@48221138
PlatformDependent.LINUX_OS_CLASSIFIERS = [fedora]
PlatformDependent.IS_WINDOWS = false
PlatformDependent.IS_OSX = false
PlatformDependent.IS_J9_JVM = false
PlatformDependent.IS_IVKVM_DOT_NET = false
PlatformDependent.ADDRESS_SIZE = 8
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0
PlatformDependent.DIRECT_MEMORY_LIMIT = 53206843392
-PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@301cc3e2
-PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@8905ee9
+PlatformDependent.RANDOM_PROVIDER = io.netty.util.internal.PlatformDependent$2@6f6af829
+PlatformDependent.CLEANER = io.netty.util.internal.CleanerJava9@140b865c
PlatformDependent.UNINITIALIZED_ARRAY_ALLOCATION_THRESHOLD = -1
-PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@5b464f2a
+PlatformDependent.OS_RELEASE_FILES = [Ljava.lang.String;@70835f0e
PlatformDependent.LINUX_ID_PREFIX = ID=
PlatformDependent.LINUX_ID_LIKE_PREFIX = ID_LIKE=
PlatformDependent.BIG_ENDIAN_NATIVE_ORDER = false
-PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@2cd8284d
+PlatformDependent.NOOP = io.netty.util.internal.PlatformDependent$1@5d61e8ac
PlatformDependent.$assertionsDisabled = true FTR I am using https://github.com/zakkak/issue-reproducers/tree/2023-12-07-mandrel-it-tests-236 to get these. |
I take that back... When jfr monitoring is enabled I see
While when it is disabled I see:
which indicates that enabling jfr monitoring somehow results in |
@roberttoyonaga does this ^^ ring any bells? |
hmm not really |
@zakkak So just to confirm, you're seeing that build-time init of The only part of the diff you shared here that immediately strikes me as suspicious is that build time init with JFR has:
While runtime init has no diff: Maybe this is not an issue, but I've fixed bugs in the past related to allocation events causing issues at startup (when current_thread hasn't yet been assigned or other things are not set up yet). Maybe something along those lines is happening since it seems like quarkus is failing to respond to the first request. It's just a hunch though. I can try to look into it more after the serviceability meetup. |
Correct, but only when JFR is enabled.
Yes that's interesting, but it still used to work with that diff (we used to build-time-initialize these classes without observing CI failures, but now we run-time-initialize them and we get this CI failure). The comparison of run-time-init (current approach of Quarkus for these classes) with and without JFR in #236 (comment) is also pretty interesting: -PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = null
+PlatformDependent0.DIRECT_BUFFER_CONSTRUCTOR = private java.nio.DirectByteBuffer(long,long)
-PlatformDependent0.INTERNAL_UNSAFE = null
+PlatformDependent0.INTERNAL_UNSAFE = jdk.internal.misc.Unsafe@147a7def
-PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = false
+PlatformDependent.USE_DIRECT_BUFFER_NO_CLEANER = true
-PlatformDependent.DIRECT_MEMORY_COUNTER = null
+PlatformDependent.DIRECT_MEMORY_COUNTER = 0 My next step would be to check what is enabled in GraalVM / Mandrel when enabling jfr monitoring and why it affects netty's run-time initialization. |
Confirmed, it happens here https://github.com/oracle/graal/blob/4dd8a9d611b78668bb84116cf7ff8cc7e4146907/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrFeature.java#L156 I will try to do the same in Quarkus to see how it will behave. |
Since we moved `io.netty.util.internal.PlatformDependent`'s and `io.netty.util.internal.PlatformDependent0`' initialization at runtime we need to register `java.nio.DirectByteBuffer`'s constructors for reflective access to avoid miss-configurations due to not being able to access the constructor in https://github.com/netty/netty/blob/5db037beedca8aa5b6a486fa515cc6af013ced74/common/src/main/java/io/netty/util/internal/PlatformDependent0.java#L290-L292 More classes and methods need to be registered as discussed in quarkusio#37626 Closes Karm/mandrel-integration-tests#236
Since we moved `io.netty.util.internal.PlatformDependent`'s and `io.netty.util.internal.PlatformDependent0`' initialization at runtime we need to register `java.nio.DirectByteBuffer`'s constructors for reflective access to avoid miss-configurations due to not being able to access the constructor in https://github.com/netty/netty/blob/5db037beedca8aa5b6a486fa515cc6af013ced74/common/src/main/java/io/netty/util/internal/PlatformDependent0.java#L290-L292 More classes and methods need to be registered as discussed in quarkusio#37626 Closes Karm/mandrel-integration-tests#236
I found what's the route cause, see quarkusio/quarkus#37626, but I still have not a clear understanding of why (which code-flow it triggers) we see failures only with JFR enabled. |
Nice! And I guess setting the workaround of |
In quarkusio#37347 we moved `PlatformDependent` and `PlatformDependent0` classes to run-time-initialization as they are platform-dependent and thus need to be initialized on the target platform. It looks like there are more classes like these that need to be runtime initialized as they transitively rely on platform dependent values. Closes Karm/mandrel-integration-tests#236 Supersedes quarkusio#37628
FTR The stack trace in Quarkus leading to the failure is:
Which is the result of
|
Is it possible that If I remove https://github.com/oracle/graal/blob/4dd8a9d611b78668bb84116cf7ff8cc7e4146907/substratevm/src/com.oracle.svm.core/src/com/oracle/svm/core/jfr/JfrFeature.java#L156 , then everything seems to work fine (I get the expected "hello" response). |
Yes, although it's still not clear to me why. Thanks for confirming this with a modified GraalVM version. |
In quarkusio#37347 we moved `PlatformDependent` and `PlatformDependent0` classes to run-time-initialization as they are platform-dependent and thus need to be initialized on the target platform. It looks like there are more classes like these that need to be runtime initialized as they transitively rely on platform dependent values. Closes Karm/mandrel-integration-tests#236 Supersedes quarkusio#37628
Closed by quarkusio/quarkus#37633 I will create a new issue for the discrepancies when using |
In quarkusio#37347 we moved `PlatformDependent` and `PlatformDependent0` classes to run-time-initialization as they are platform-dependent and thus need to be initialized on the target platform. It looks like there are more classes like these that need to be runtime initialized as they transitively rely on platform dependent values. Closes Karm/mandrel-integration-tests#236 Supersedes quarkusio#37628 (cherry picked from commit 72d7b79)
How to reproduce
Failure:
Additional info
The regression is not caused by a GraalVM change, building with Mandrel 23.1.1.0 also doesn't work.
Setting
quarkus.version
to3.6.0
makes the test fail in a different stage (possibly a local issue?)The text was updated successfully, but these errors were encountered: