Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket client fails to connect when using native transport #41082

Open
mkorman9 opened this issue Jun 9, 2024 · 12 comments
Open

Websocket client fails to connect when using native transport #41082

mkorman9 opened this issue Jun 9, 2024 · 12 comments

Comments

@mkorman9
Copy link

mkorman9 commented Jun 9, 2024

Describe the bug

quarkus-websockets client is unable to connect to the server endpoint when running tests with quarkus.vertx.prefer-native-transport set to true. java.io.InterruptedIOException is thrown by the connectToServer method and the suppressed exception is

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: incompatible event loop type: io.netty.channel.epoll.EpollEventLoop
        at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2096)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServerInternal(ServerWebSocketContainer.java:445)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServerInternal(ServerWebSocketContainer.java:411)
        at io.undertow.websockets.ServerWebSocketContainer.connectToServer(ServerWebSocketContainer.java:255)

Issue is only visible when native transport is enabled, so netty-transport-native-epoll dependency with matching platform should be present on classpath.

Expected behavior

Websockets client should be able to successfully connect to the server endpoint when running in tests

Actual behavior

connectToServer fails with java.io.InterruptedIOException

How to Reproduce?

  1. Prepare the environment running x86_64 Linux with any supported Java version
  2. Clone the minimal example that reproduces the issue - https://github.com/mkorman9/quarkus-websockets-bug
  3. Run ./gradlew build
  4. Tests should fail with:
WebsocketTest > test() FAILED
    java.lang.RuntimeException at WebsocketTest.java:24
        Caused by: java.io.InterruptedIOException at WebsocketTest.java:21
  1. Changing quarkus.vertx.prefer-native-transport to false in application.properties fixes the issue

Github Action reproducing the issue jas been already prepared - https://github.com/mkorman9/quarkus-websockets-bug/actions/runs/9436762644/job/25991803842

Output of uname -a or ver

Linux fv-az659-258 6.5.0-1021-azure #22~22.04.1-Ubuntu SMP Tue Apr 30 16:08:18 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

openjdk version "21.0.3" 2024-04-16 LTS OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)

Quarkus version or git rev

3.11.0

Build tool (ie. output of mvnw --version or gradlew --version)

------------------------------------------------------------ Gradle 8.6 ------------------------------------------------------------ Build time: 2024-02-02 16:47:16 UTC Revision: d55c486870a0dc6f6278f53d21381396d0741c6e Kotlin: 1.9.20 Groovy: 3.0.17 Ant: Apache Ant(TM) version 1.10.13 compiled on January 4 2023 JVM: 21.0.3 (Eclipse Adoptium 21.0.3+9-LTS) OS: Linux 6.5.0-1021-azure amd64

Additional information

No response

@mkorman9 mkorman9 added the kind/bug Something isn't working label Jun 9, 2024
Copy link

quarkus-bot bot commented Jun 9, 2024

/cc @geoand (kotlin), @zakkak (native-image)

@geoand
Copy link
Contributor

geoand commented Jun 9, 2024

Weird issue.

@cescoffier any idea what could be causing it?

Furthermore, any chance you can test this @mkorman9 with the new websockets-next?

@cescoffier
Copy link
Member

cescoffier commented Jun 9, 2024

Native transports are not supported in native.
See https://quarkus.io/guides/vertx-reference#native-transport.

@mkorman9
Copy link
Author

mkorman9 commented Jun 9, 2024

@geoand websockets-next seems to work fine -> https://github.com/mkorman9/quarkus-websockets-bug/tree/next
build -> https://github.com/mkorman9/quarkus-websockets-bug/actions/runs/9438594374/job/25995742353

so if it's a bug it's most likely somewhere in the "old" extension.

@cescoffier I haven't tried a native build, my example is for JVM only. I guess the bot added the native-image tag by mistake.

@geoand
Copy link
Contributor

geoand commented Jun 9, 2024

Thanks for checking

@cescoffier
Copy link
Member

So, when enabling native transport, it replaces the event loop implementation. Somewhere in the undertow extension, this possibility has not been anticipated. I need to check in details if we can remove the downcast.

Websocket-next being on top of Vertx (without the undertow layer) does not have the problem, as Vertx is perfectly aware of the event loop implementation replacement.

@cescoffier
Copy link
Member

I can't reproduce it on Mac (switched to KQueue instead of Epoll).
@geoand can you try the reproducer on Linux and find the exact place where the downcast is done?

@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

Sure thing

@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

The problem is that io.undertow.websockets.WebsocketConnectionBuilder is passed the EventLoopGroup configured from WebsocketCoreRecorder but it's hardcoded to use .channel(NioSocketChannel.class) (in line 186).

So the easy solution would be to change WebsocketConnectionBuilder to always use a NioEventLoop

@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

I can take a look at a proper fix later on today

geoand added a commit to quarkusio/quarkus-http that referenced this issue Jun 10, 2024
This is needed because WebsocketConnectionBuilder might be
configured with a non NioEventLoopGroup

Relates to: quarkusio/quarkus#41082
@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

quarkusio/quarkus-http#155 is a workaround. A proper solution would mean passing the Channel class all around so it would have to break some public class and I am not sure if that is acceptable.

@mkouba
Copy link
Contributor

mkouba commented Sep 27, 2024

The workaround proposed in quarkusio/quarkus-http#155 was merged. What are the next steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants