-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace read/write lock in JarResource to avoid virtual threads pinning #42139
Conversation
|
||
private static JarFileReference asyncLoadAcquiredJarFile(JarResource jarResource) { | ||
CompletableFuture<JarFileReference> newJarRefFuture = new CompletableFuture<>(); | ||
CompletableFuture<JarFileReference> existingJarRefFuture = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction of this variable is the actual fix. The old algorithm wrongly reused the newJarFileRef
one. This means that in the unfortunate case when a thread was about to close the old jar, but before setting the AtomicReference to null, another thread could fail the compareAndSet(null)
check here, take at the end of the loop from the AtomicReference
the old and no longer valid CompletableFuture
and in subsequent loop (after that the close finally managed to set the AtomicReference
to null) mistakenly set it back into the same AtomicReference
.
@famod can you check as well if it's fixed on your end? As you encounter that problem as well. I'll will check it on our side but it's always good to have second confirmation. |
Thanks for the fix @mariofusco ! @jedla97 thanks for signing up to test the fix! Hopefully @famod can do so as well |
This comment has been minimized.
This comment has been minimized.
@mariofusco Big thanks for the fix as I believe it was hard to find it. I tested it (run two reproducer multiple times) and the race condition didn't appear. So from my POV the original issue is fixed (hopefully we won't see any different 🙏 ). |
This comment has been minimized.
This comment has been minimized.
It would be great if there is a test which reproduce it too, with the changes to fix |
Sorry, I am unable to test this ATM. On my work machine, I'm getting a puzzling dependency resolution issue while building, it's related to gradle-tooling. |
I did a few more checks on this fix and also added a proper unit test. The unit test reproduces exactly the same situation that caused the @geoand @franz1981 the pull request has been already rebased and squashed as usual, so it should be ready to be merged. Please let me know if this is enough or you see any room for further improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM well done as usual Mario!
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
This comment has been minimized.
This comment has been minimized.
These failures are caused by the fact that I compiled the minimal jars that I used for this test with Java 21, so they cannot be loaded with Java 17.
I will recompile those jars with Java 17 and replace them in this pull request immediately. |
add test format test recompile test jars with jdk17
Done. |
This comment has been minimized.
This comment has been minimized.
Status for workflow
|
Status | Name | Step | Failures | Logs | Raw logs | Build scan |
---|---|---|---|---|---|---|
✔️ | Native Tests - Messaging1 | Failures | Logs | Raw logs | 🚧 |
Full information is available in the Build summary check run.
Failures
⚙️ Native Tests - Messaging1 #
📦 integration-tests/kafka-avro-apicurio2
✖ io.quarkus.it.kafka.KafkaAvroIT.testApicurioAvroConsumer
- History - More details - Source on GitHub
java.lang.RuntimeException:
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor#startKafkaDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/vectorized/redpanda:v24.1.2, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@74c1204d)
at io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor.startKafkaDevService(DevServicesKafkaProcessor.java:105)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:854)
at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18...
✖ io.quarkus.it.kafka.KafkaAvroIT.testApicurioAvroProducer
- History - More details - Source on GitHub
java.lang.RuntimeException:
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor#startKafkaDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/vectorized/redpanda:v24.1.2, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@66eb61f4)
at io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor.startKafkaDevService(DevServicesKafkaProcessor.java:105)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:854)
at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18...
✖ io.quarkus.it.kafka.KafkaAvroIT.testConfluentAvroConsumer
- History - More details - Source on GitHub
java.lang.RuntimeException:
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor#startKafkaDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/vectorized/redpanda:v24.1.2, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@563e520)
at io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor.startKafkaDevService(DevServicesKafkaProcessor.java:105)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:854)
at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)...
✖ io.quarkus.it.kafka.KafkaAvroIT.testConfluentAvroProducer
- History - More details - Source on GitHub
java.lang.RuntimeException:
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor#startKafkaDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/vectorized/redpanda:v24.1.2, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@171be7de)
at io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor.startKafkaDevService(DevServicesKafkaProcessor.java:105)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:854)
at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18...
✖ io.quarkus.it.kafka.KafkaAvroIT.testUrls
- History - More details - Source on GitHub
java.lang.RuntimeException:
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor#startKafkaDevService threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/vectorized/redpanda:v24.1.2, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@1717cdcc)
at io.quarkus.kafka.client.deployment.DevServicesKafkaProcessor.startKafkaDevService(DevServicesKafkaProcessor.java:105)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:732)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:854)
at io.quarkus.builder.BuildContext.run(BuildContext.java:256)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18...
Flaky tests - Develocity
⚙️ JVM Tests - JDK 21
📦 integration-tests/opentelemetry
✖ io.quarkus.it.opentelemetry.OpenTelemetryInjectionsTest.testOTelInjections
- History
Condition with Lambda expression in io.quarkus.it.opentelemetry.OpenTelemetryInjectionsTest was not fulfilled within 5 seconds.
-org.awaitility.core.ConditionTimeoutException
org.awaitility.core.ConditionTimeoutException: Condition with Lambda expression in io.quarkus.it.opentelemetry.OpenTelemetryInjectionsTest was not fulfilled within 5 seconds.
at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
at org.awaitility.core.CallableCondition.await(CallableCondition.java:78)
at org.awaitility.core.CallableCondition.await(CallableCondition.java:26)
at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:975)
at io.quarkus.it.opentelemetry.OpenTelemetryInjectionsTest.reset(OpenTelemetryInjectionsTest.java:26)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
Is this ready to be merged? Since the 3.13.0 is already out there shouldn't be any reason to hold it, right? |
Correct |
Let's keep an eye out on weird failures as we did with the previous iteration. |
Thank you for merging. Sure, please ping me as soon as you will notice something wrong. |
Will do! Thanks again for tackling this! If anything comes up, it will be by QE running their testsuite over the next few days. Otherwise if there is an issue, reports from users will start coming in when |
This pull request fixes the race condition present in this commit that, because of it, had to be reverted.
/cc @geoand @gsmet @franz1981 @jedla97 @michalvavrik
Fixes: #42067
The nature of the fix is described at #42139 (review)