Revert "Disable MutinyTest#testSSE for now as it's unstable on CI" #11727

jaikiran · 2020-08-29T04:37:45Z

In #11693 we disabled the MutinyTest#testSSE due to CI issues. Now that CI has stabilized, re-enabling this to see if there are any genuine issues.

Fixes #11687

cescoffier · 2020-08-31T11:18:55Z

@jaikiran Thanks!

So, it's a bit more complicated :-)

In this test, the items are emitted on 2 threads which lead to a kind of race condition in how the items are written in RestEasy. It passes if we removed the emitOn (which was superfluous anyway).

So you want me to open a PR against your PR, or reopen a fresh one?

gsmet · 2020-08-31T11:21:43Z

@cescoffier just push the fix here if possible

cescoffier · 2020-08-31T11:54:47Z

@gsmet @jaikiran Done.

cescoffier · 2020-08-31T13:36:06Z

and who forgot to run the formatter before committing? I give you and hint... He is currently writing this comment.

jaikiran · 2020-08-31T13:37:04Z

I give you and hint... He is currently writing this comment.

Don't worry, I fixed it and pushed the change a few seconds back :)

gsmet · 2020-08-31T15:29:59Z

The JDK 14 build failed with an error in the very same test.

cescoffier · 2020-08-31T17:48:09Z

Hum... so it's not the reason... Something else is going on.

cescoffier · 2020-08-31T18:34:35Z

working on a faster reproducer (https://github.com/cescoffier/quarkus/runs/1052595409?check_suite_focus=true).... obviously it passes when you disable the other tests.

jaikiran · 2020-09-01T01:05:51Z

I haven't been able to get too much out of my investigations last weekend. But when I did check, what I noticed was when this test fails, the HTTP request that's made to /pets get issued 120 times (instead of once) within a span of that 1 minute timeout. The request does make it all the way to the endpoint (where I had added a print statement). This 120 request in 1 minute indicates a reconnect being issued (because the default reconnect timeout is 500ms in that SSE event source implementation of resteasy https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L60). I haven't been able to understand or figure out why the reconnect is being issued. The moment I add any kind of DEBUG or TRACE logging, I can no longer reproduce it, which is a sign that there might be some kind of race involved.

jaikiran · 2020-09-01T01:47:45Z

Given that resteasy seems to be involved, I checked out resteasy upgrades in Quarkus. We upgraded from 4.5.5.Final to 4.5.6.Final on July 22nd. Comparing the commit differences in these 2 releases of resteasy[1], I don't see anything significant. There's one commit which does touch the "HTTP layer", but I can't yet see how that might impact SSEs.

[1] resteasy/resteasy@4.5.5.Final...4.5.6.Final

cescoffier · 2020-09-01T06:01:38Z

I didn't count, but I confirm the behavior seen by @jaikiran 👍

2020-09-01T05:49:49.2528935Z Accepting...
2020-09-01T05:49:49.2530371Z Server Side - Emitting neo
2020-09-01T05:49:49.2530856Z Server Side - Emitting neo
2020-09-01T05:49:49.2531263Z Server Side - Emitting neo
2020-09-01T05:49:49.2531662Z Server Side - Emitting neo
2020-09-01T05:49:49.2532056Z Server Side - Emitting neo
2020-09-01T05:49:49.2532450Z Server Side - Emitting neo
2020-09-01T05:49:49.2533018Z Server Side - Emitting neo
2020-09-01T05:49:49.2533361Z Server Side - Emitting neo
...

So, the client is re-connecting to the stream which restarts from the beginning (it's a cold stream so it's the expected behavior).

The items are emitted, but the client does not get the item, probably due to the reconnection.

cescoffier · 2020-09-01T07:55:20Z

@FroMage by any chance do you know what could trigger a reconnection on the client-side of the SSE? The client does not receive any items and does not get a failure, it just reconnects.

FroMage · 2020-09-01T08:04:16Z

It only reconnects if it gets an IO exception on the connection, in theory

FroMage · 2020-09-01T08:05:09Z

But SSE mandates reconnections, so if you have a cold stream it's not clear that this is your issue. What precisely is sent by the server and what does the client receive and expect?

cescoffier · 2020-09-01T08:11:33Z

The client sends a few items, but generally, only send the first one and immediately get a reconnect, restarting the stream from the beginning.

From my understand:

the client connect to the SSE
the server endpoint is called, and get the stream
it emits the first items from the stream
something bad happens (not sure what) as the client does not get the item and reconnect.
go back to 1 until timeout

FroMage · 2020-09-01T08:19:05Z

OK, well in that case we have to know what is the "something bad happens".

jaikiran · 2020-09-01T08:24:51Z

Looking at this code https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L333, it may not necessarily be an IOException which triggers this reconnect. It looks like, if for whatever reason it fails to deserialize the SseEventInputImpl out of the response, it can go into a loop[1] where it tries to reconnect and then read the response over and over again.
This would have been so much easier if one of us could reproduce this locally. The whole GitHub actions based tests just takes way too long (even the minimal one) to even think of trying certain things.

[1] https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L376

FroMage · 2020-09-01T08:27:37Z

Wouldn't you get onUnrecoverableError called, though? Unless it deserialises to null but that's weird.

cescoffier · 2020-09-01T09:29:19Z

Trying to get a bit more details. I was able to use my own implementation of SseEventSourceImpl with more logs in it.

cescoffier · 2020-09-01T10:28:56Z

@jaikiran @FroMage So, here is the reason of the reconnect:

2020-09-01T10:24:58.5985523Z MySSE - reconnecting because input is null or closed: org.jboss.resteasy.plugins.providers.sse.SseEventInputImpl@1c4331b8

The input is clearly not null, so it's not opened.

jaikiran · 2020-09-01T10:57:26Z

The input is clearly not null, so it's not opened.

The only way I can see that happening is SseEventInputImpl#read() was called at least once on that SseEventInputImpl instance and for some reason failed to parse the input stream? Otherwise I don't see how the SseEventInputImpl#isClosed can ever be true. I can see that the SseEventInputImpl#read() unfortuantely doesn't log any messages on IOException or such so it becomes more difficult to try and narrow this down.

cescoffier · 2020-09-01T11:37:45Z

Trying to substitution the ServerSentEventImpl with my own implementation to log more.

jaikiran · 2020-09-01T11:41:42Z

(slightly OT - long term, I think it's time to see how we can ease the integration of Byteman https://byteman.jboss.org/ into all these different modes of Quarkus launched apps/tests)

kenfinnigan · 2020-09-01T15:38:00Z

Did anyone try running with the previous RESTEasy version, if only to eliminate the upgrade as a problem?

cescoffier · 2020-09-01T15:44:09Z

Continuing to make slow progress:

Something is wrong when this is happening:

InboundSseEvent event = eventInput.read();

It's closing the SSE because the read chunk is null:

chunk = readEvent(inputStream);

readEvent returns null.

So something is wrong in the content sent by the server.

FroMage · 2020-09-01T15:50:28Z

This goes via SseEventProvider which implements MessageBodyReader<SseEventInputImpl>, and calls

   @Override
   public SseEventInputImpl readFrom(Class<SseEventInputImpl> cls, Type type, Annotation[] annotations,
         MediaType mediaType, MultivaluedMap<String, String> httpHeaders, InputStream entityStream) throws IOException,
         WebApplicationException
   {
      MediaType streamType = mediaType;
      if (mediaType.getParameters() != null)
      {
         Map<String, String> map = mediaType.getParameters();
         String elementType = map.get(SseConstants.SSE_ELEMENT_MEDIA_TYPE);
         if (elementType != null)
         {
            mediaType = MediaType.valueOf(elementType);
         }
      }
      return new SseEventInputImpl(annotations, streamType, mediaType, httpHeaders, entityStream);
   }

Which doesn't do anything:

   public SseEventInputImpl(final Annotation[] annotations, final MediaType streamType, final MediaType elementType,
                            final MultivaluedMap<String, String> httpHeaders, final InputStream inputStream)
   {
      this.annotations = annotations;
      this.mediaType = elementType;
      this.httpHeaders = httpHeaders;
      this.inputStream = inputStream;
      this.textLike = MediaTypeHelper.isTextLike(streamType);
      this.escape = streamType != null && streamType.toString().startsWith("application/x-stream-general");
   }

But there are these methods, that I presume are called:

   public InboundSseEvent read() throws IOException
   {
      byte[] chunk = null;
      try
      {
         lastFieldWasData = false;
         chunk = readEvent(inputStream);
         if (chunk == null)
         {
            close();
            return null;
         }
      }
      catch (IOException e1)
      {
         try
         {
            close();

         }
         catch (IOException e)
         {
            //TODO: add a log message
         }
         throw e1;

      }

      final ByteArrayInputStream entityStream = new ByteArrayInputStream(chunk);
      final ByteArrayOutputStream temSave = new ByteArrayOutputStream();
      Charset charset = StandardCharsets.UTF_8;
      if (mediaType != null && mediaType.getParameters().get(MediaType.CHARSET_PARAMETER) != null)
      {
         charset = Charset.forName(mediaType.getParameters().get(MediaType.CHARSET_PARAMETER));
      }

      final InboundSseEventImpl.Builder eventBuilder = new InboundSseEventImpl.Builder(annotations, mediaType,
            httpHeaders);
      //TODO: Look at if this can be improved
      int b = -1;
      SseConstants.EVENT currentState = SseConstants.EVENT.START;
      while ((b = entityStream.read()) != -1)
      {
         try
         {
            if (currentState == SseConstants.EVENT.START)
            {
               if (b == '\r' || b == '\n')
               {
                  continue;
               }

               if (b == ':')
               {
                  currentState = SseConstants.EVENT.COMMENT;
                  continue;
               }
               else
               {
                  temSave.write(b);
                  currentState = SseConstants.EVENT.FIELD;
                  continue;
               }
            }
            if (currentState == SseConstants.EVENT.COMMENT)
            {

               b = readLine(entityStream, '\n', temSave);
               String commentLine = temSave.toString(charset.name());
               eventBuilder.commentLine(commentLine);
               temSave.reset();
               currentState = SseConstants.EVENT.START;
               continue;
            }
            if (currentState == SseConstants.EVENT.FIELD)
            {
               temSave.write(b);
               b = readLine(entityStream, ':', temSave);
               String fieldName = temSave.toString(StandardCharsets.UTF_8.name());
               temSave.reset();
               if (b == ':')
               {
                  //spec says there is space after colon
                  do
                  {
                     b = entityStream.read();
                  }
                  while (b == ' ');

                  if (b != '\n' && b != '\r' && b != -1)
                  {
                     temSave.write(b);
                     b = readLine(entityStream, '\n', temSave);
                  }
               }

               processField(eventBuilder, fieldName, mediaType, temSave.toByteArray());
               temSave.reset();
               currentState = SseConstants.EVENT.START;
               continue;
            }
         }
         catch (IOException e)
         {
            throw new IOException(Messages.MESSAGES.readEventException(), e);
         }
      }
      InboundSseEventImpl event = (InboundSseEventImpl) eventBuilder.build();
      if (this.providers != null)
      {
         event.setProvider(this.providers);
      }
      return event;
   }

Which I'm not sure how to make sense of, since it tries to call readEvent(inputStream) first which does all sorts of weird things, and can return null:

   public byte[] readEvent(final InputStream in) throws IOException
   {
      @SuppressWarnings("resource")
      EventByteArrayOutputStream buffer = new EventByteArrayOutputStream();
      int data;
      int pos = 0;
      boolean boundary = false;
      byte[] eolBuffer = new byte[5];
      while ((data = in.read()) != -1)
      {
         byte b = (byte) data;
         if (!textLike && b == '\\')
         {
            buffer.write(b);
            b = (byte) in.read();
         }
         else
         {
            if (b == '\r' || b == '\n')
            {
               eolBuffer[pos] = b;
               //if it meets \r\r , \n\n , \r\n\r\n or \n\r\n\r\n
               if ((pos > 0 && eolBuffer[pos] == eolBuffer[pos - 1])
                     || (pos >= 3 && new String(eolBuffer, 0, pos, StandardCharsets.UTF_8).contains(DELIMITER)))
               {
                  boundary = true;
               }
               //take it a boundary if there are 5 unexpected eols
               if (pos++ > 4)
               {
                  boundary = true;
               }
            }
            else
            {
               pos = 0;
            }
         }
         buffer.write(b);
         if (boundary && buffer.size() > pos)
         {
            return buffer.getEventPayLoad();
         }
         //if it's emtpy
         if (boundary && buffer.size() == pos)
         {
            pos = 0;
            boundary = false;
            buffer.reset();
            continue;
         }
      }
      return null;
   }

So yeah, that's where you'll have to look at the bytes being looked at.

FroMage · 2020-09-01T15:51:45Z

But this class doesn't appear to have changed at all recently.

cescoffier · 2020-09-01T15:53:27Z

@FroMage exactly, it's where I'm looking at the moment.
While we see the cause of reconnection on the client-side, the issue might on the server-side. It may send something corrupted. Instrumenting the server-side might be more tricky...

FroMage · 2020-09-02T08:18:15Z

I see a client in https://github.com/quarkusio/quarkus/blob/5ca9f4ac8f80a98ffc724da0f3e13d35f9833805/integration-tests/resteasy-mutiny/src/main/java/io/quarkus/it/resteasy/mutiny/MutinyResource.java

It's not in your call directly, but is there any chance it's running as part of the test for some reason? What if you disable these endpoints and the test that calls them?

I'd like to rule out the accidental use of a client context.

cescoffier · 2020-09-02T08:20:15Z

Let me try that right now.

FroMage · 2020-09-02T08:21:08Z

Also, print ResteasyProviderFactory.getInstance() in your original request and in each item you send over SSE along when you print the current thread. They should always be the same.

It looks like the only way to get a NOOPServerHelper is via the client context. Now, I don't understand how that can happen, but at least I want to verify this is what's happening.

cescoffier · 2020-09-02T08:50:39Z

@FroMage Run in progress: https://github.com/cescoffier/quarkus/actions/runs/235753180

cescoffier · 2020-09-02T09:17:10Z

@FroMage same failure:

/pets ResteasyProviderFactory = io.quarkus.restclient.runtime.RestClientRecorder$2@5063b38c
2020-09-02T09:09:23.2857412Z Sending double EOL
2020-09-02T09:09:23.2937794Z Server Side - Emitting neo, ResteasyProviderFactory = io.quarkus.restclient.runtime.RestClientRecorder$2@5063b38c
2020-09-02T09:09:23.2858429Z 	 Read 10
2020-09-02T09:09:23.2858831Z 	 Read 10
2020-09-02T09:09:23.2934889Z java.lang.UnsupportedOperationException
2020-09-02T09:09:23.2935975Z 	at org.jboss.resteasy.core.providerfactory.NOOPServerHelper.getMessageBodyWriters(NOOPServerHelper.java:118)
2020-09-02T09:09:23.2936931Z 	at org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl.getServerMessageBodyWriters(ResteasyProviderFactoryImpl.java:298)
2020-09-02T09:09:23.2938652Z 	at org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl.getMessageBodyWriter(ResteasyProviderFactoryImpl.java:1252)
2020-09-02T09:09:23.2939059Z send org.jboss.resteasy.plugins.providers.sse.OutboundSseEventImpl@607a0942
2020-09-02T09:09:23.2939850Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventProvider.writeTo(SseEventProvider.java:111)
2020-09-02T09:09:23.2940020Z writeEvent org.jboss.resteasy.plugins.providers.sse.OutboundSseEventImpl@607a0942
2020-09-02T09:09:23.2940779Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventProvider.writeTo(SseEventProvider.java:33)
2020-09-02T09:09:23.2940934Z Synchronous Exception while sending event
2020-09-02T09:09:23.2941259Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.writeEvent(SseEventOutputImpl.java:308)

FroMage · 2020-09-02T09:35:59Z

OK, we found the culprits then: f65ee2b added:

        ResteasyProviderFactory.setInstance(clientProviderFactory);

Which just breaks all server calls that expect this to be a server client factory.

FroMage · 2020-09-02T09:36:28Z

That's NASTY.

FroMage · 2020-09-02T09:37:46Z

BTW I think it's pretty nuts to have two ResteasyProviderFactory (client and server) but if we MUST have two, they should be resolved from the ResteasyContext rather than hanging around as statics.

Good luck fixing that, BTW, it won't be trivial.

jaikiran · 2020-09-02T09:47:48Z

OK, we found the culprits then: f65ee2b added:
        ResteasyProviderFactory.setInstance(clientProviderFactory);
Which just breaks all server calls that expect this to be a server client factory.

Hello @FroMage, does that also explain why this test passes sometimes but fails on other occasions? Or are we still missing any other links?

cescoffier · 2020-09-02T09:48:41Z

@radcortez @kenfinnigan How can we remove that call?

FroMage · 2020-09-02T09:51:35Z

How can it explain that it sometimes fails. I guess it could depend on when you end up executing that client recorder stuff. It's not clear to me, honestly. But it's still wrong.

gsmet · 2020-09-02T09:56:18Z

Looks like I was a visionary: #11448 (review) :)

gsmet · 2020-09-02T09:57:21Z

If it's a problem I have nothing against reverting this PR until we can get a proper fix.

FroMage · 2020-09-02T10:02:42Z

Looks like I was a visionary: #11448 (review) :)

Haha, well spotted, you should have called their bluff though ;)

jaikiran · 2020-09-02T10:07:17Z

@cescoffier, I now have something that might shed some light on why this is inconsistent. I just need your setup where this is happening consistently. I plan to push a certain change there. Is that OK? I will make sure I copy over your current branch locally before messing up that branch and pushign my commits.

cescoffier · 2020-09-02T10:17:40Z

@jaikiran go ahead. I'm not going to work on that for the rest of the day. Feel free to remove my hundreds of traces.

jaikiran · 2020-09-02T12:02:20Z

So, I managed to get Byteman integrated (in a hackish way for now) against this test to see what's going on. @FroMage had anyway already solved the mystery of the root cause, so it helped me come up with specific Byteman scripts[1] to figure why this fails only on certain occasions. Here's what it shows up.

When this test succeeds, the io.quarkus.restclient.runtime.RestClientRecorder is the one which first sets the org.jboss.resteasy.spi.ResteasyProviderFactory to an instance that is incorrect/unusable for our purposes. This however is then overwritten by io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder to a correct usable instance. Here's what the call path (in that order) looks like for a passing case:

2020-09-02T11:41:46.8131751Z Wed Sep 02 11:41:46 UTC 2020[Thread[main,5,main]]  provider factory is io.quarkus.restclient.runtime.RestClientRecorder$2@48c5e6fb
2020-09-02T11:41:46.8134811Z Stack trace for thread main
2020-09-02T11:41:46.8141141Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:46.8148741Z io.quarkus.restclient.runtime.RestClientRecorder.initializeResteasyProviderFactory(RestClientRecorder.java:72)
2020-09-02T11:41:46.8160829Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy_0(RestClientProcessor$registerProviders350329842.zig:506)
2020-09-02T11:41:46.8171859Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy(RestClientProcessor$registerProviders350329842.zig:40)
2020-09-02T11:41:46.8177626Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:202)

....

2020-09-02T11:41:46.8714811Z Rule.execute called for ResteasyProviderFactory#setInstance_6:1
2020-09-02T11:41:46.8714955Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:46.8715123Z Wed Sep 02 11:41:46 UTC 2020[Thread[main,5,main]]  provider factory is org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl@232fcbb1
2020-09-02T11:41:46.8715284Z Stack trace for thread main
2020-09-02T11:41:46.8715431Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:46.8715594Z org.jboss.resteasy.core.ResteasyDeploymentImpl.initializeFactory(ResteasyDeploymentImpl.java:373)
2020-09-02T11:41:46.8715755Z org.jboss.resteasy.core.ResteasyDeploymentImpl.startInternal(ResteasyDeploymentImpl.java:129)
2020-09-02T11:41:46.8715912Z org.jboss.resteasy.core.ResteasyDeploymentImpl.start(ResteasyDeploymentImpl.java:118)
2020-09-02T11:41:46.8810983Z io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder.staticInit(ResteasyStandaloneRecorder.java:71)
2020-09-02T11:41:46.8815366Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy_0(ResteasyStandaloneBuildStep$staticInit-210558872.zig:811)
2020-09-02T11:41:46.8819202Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy(ResteasyStandaloneBuildStep$staticInit-210558872.zig:40)
2020-09-02T11:41:46.8821692Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:213)

On the other hand, when the test fails, the order of these recorders is reversed. So the io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder first sets the org.jboss.resteasy.spi.ResteasyProviderFactory instance to an usable one, only to be overwritten to an incorrect/unusable one by the io.quarkus.restclient.runtime.RestClientRecorder. Here's what the call path looks like (in that order) in that case:

2020-09-02T11:41:35.9744918Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:35.9746213Z Wed Sep 02 11:41:35 UTC 2020[Thread[main,5,main]]  provider factory is org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl@4469a74d
2020-09-02T11:41:35.9811159Z Stack trace for thread main
2020-09-02T11:41:35.9812026Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:35.9813417Z org.jboss.resteasy.core.ResteasyDeploymentImpl.initializeFactory(ResteasyDeploymentImpl.java:373)
2020-09-02T11:41:35.9814759Z org.jboss.resteasy.core.ResteasyDeploymentImpl.startInternal(ResteasyDeploymentImpl.java:129)
2020-09-02T11:41:35.9815933Z org.jboss.resteasy.core.ResteasyDeploymentImpl.start(ResteasyDeploymentImpl.java:118)
2020-09-02T11:41:35.9817571Z io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder.staticInit(ResteasyStandaloneRecorder.java:71)
2020-09-02T11:41:35.9820429Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy_0(ResteasyStandaloneBuildStep$staticInit-210558872.zig:811)
2020-09-02T11:41:35.9821829Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy(ResteasyStandaloneBuildStep$staticInit-210558872.zig:40)
2020-09-02T11:41:35.9822303Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:202)

.....

2020-09-02T11:41:36.6974037Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:36.6975074Z Wed Sep 02 11:41:36 UTC 2020[Thread[main,5,main]]  provider factory is io.quarkus.restclient.runtime.RestClientRecorder$2@5a9529a
2020-09-02T11:41:36.6976080Z Stack trace for thread main
2020-09-02T11:41:36.6977075Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:36.6978114Z io.quarkus.restclient.runtime.RestClientRecorder.initializeResteasyProviderFactory(RestClientRecorder.java:72)
2020-09-02T11:41:36.6979154Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy_0(RestClientProcessor$registerProviders350329842.zig:506)
2020-09-02T11:41:36.6980196Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy(RestClientProcessor$registerProviders350329842.zig:40)
2020-09-02T11:41:36.6981210Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:213)

So the transient nature of this failure is due to non-determinstic execution ordering of the recorders.

[1] https://github.com/jaikiran/quarkus/commits/mutiny-sse-enable-byteman

P.S: I'll see if I can make this Byteman integration a bit more seamless and easier to have to usable by some kind of simple switch for solving issues like these.

This reverts commit fff6251.

jaikiran · 2020-09-03T10:02:28Z

Replaced by #11828

asoldano · 2020-09-08T00:04:27Z

@jaikiran @FroMage @cescoffier @kenfinnigan , sorry for the late feedback here, I've been on PTO and it took time to process the backlog once back. The exception above from the NOOPServerHelper is indeed caused by a ResteasyProviderFactory instance meant for client side usage being actually used on server side. And f65ee2b is really problematic, you should really be careful with ResteasyProviderFactory.setInstance(). Actually, ResteasyProviderFactory.getInstance() is risky too as it allows incurring in problems like this and unfortunately it's often used in resteasy as it's convenient...
In our case here, I believe RESTEasy should be fixed like this asoldano/Resteasy@3b7b8c4 , even if the invocation to setInstance() is still wrong (I'm sure there are other parts of the code where getInstance() should be avoided).

Having said this, I don't personally like the design of ResteasyProviderFactory that much (I've seen other failures because of lack of proper isolation of instances), but improving it is a hard task. I tried something back in 2019, but didn't manage to have enough time to focus on it and finish it (resteasy/resteasy#2240).
Related to this, I was not very happy to see the static instance set in https://github.com/resteasy/Resteasy/blob/master/resteasy-client-microprofile-base/src/main/java/org/jboss/resteasy/microprofile/client/RestClientBuilderImpl.java#L72 since the beginning, hence I would encourage looking for alternatives before adding the same in plain jaxrs client (resteasy/resteasy#2504)

FroMage · 2020-09-08T09:00:09Z

@asoldano asoldano/Resteasy@3b7b8c4 looks indeed like the right thing to do.

kenfinnigan · 2020-09-08T12:42:06Z

@asoldano There is no other alternative to setting the ResteasyProviderFactory statically on the REST Client because instantiation is controlled by service loading.

If there's another way to achieve what is needed, explicitly defining the provider factory to control what providers are loaded, without using a static method/field, then please do so. But I'd appreciate not blocking this approach without an alternative

FroMage · 2020-09-08T13:16:03Z

IMO the two issues are orthogonal. @asoldano 's fix is the proper fix for RESTEasy, because it ensures the server's providers are used by its SSE system.

With that fix, the order in which you set the ResteasyProviderFactory for the client will not impact SSE handling anymore. So it's the right fix, and I bet more getInstance() calls should be replaced, as he said, to avoid similar bugs in non-SSE systems.

The question of being able to configure ResteasyProviderFactory for client and server statically is another matter, and indeed, probably that static method is not the right thing to do, but we can fix that without reverting either this Quarkus workaround (ordering issue), and Alessio's RESTEasy bug fix.

asoldano · 2020-09-10T12:37:39Z

To follow up on this, Ron should have merged Ken's PR.
Speaking of my fix, I've merged it and I'm also changing / extending it a bit with this resteasy/resteasy#2509 (in case you want to try it, @FroMage ).

jaikiran mentioned this pull request Aug 29, 2020

Disable MutinyTest#testSSE for now as it's unstable on CI #11693

Merged

cescoffier mentioned this pull request Sep 1, 2020

ReactiveMongodbPanacheResourceTest is regularly failing #11711

Closed

cescoffier mentioned this pull request Sep 2, 2020

MutinyTest#testSSE can be unstable #11687

Closed

gsmet modified the milestone: 1.8.0.Final Sep 2, 2020

jaikiran and others added 2 commits September 2, 2020 17:41

Revert "Disable MutinyTest#testSSE for now as it's unstable on CI"

bf23aa1

This reverts commit fff6251.

Remove the multi-threaded emission in the SSE test

7f85d51

jaikiran closed this Sep 3, 2020

gsmet added the triage/invalid This doesn't seem right label Sep 8, 2020

Revert "Disable MutinyTest#testSSE for now as it's unstable on CI" #11727

Revert "Disable MutinyTest#testSSE for now as it's unstable on CI" #11727

Conversation

jaikiran commented Aug 29, 2020 • edited by gsmet Loading

cescoffier commented Aug 31, 2020

gsmet commented Aug 31, 2020

cescoffier commented Aug 31, 2020

cescoffier commented Aug 31, 2020

jaikiran commented Aug 31, 2020

gsmet commented Aug 31, 2020

cescoffier commented Aug 31, 2020

cescoffier commented Aug 31, 2020

jaikiran commented Sep 1, 2020 • edited Loading

jaikiran commented Sep 1, 2020

cescoffier commented Sep 1, 2020

cescoffier commented Sep 1, 2020

FroMage commented Sep 1, 2020

FroMage commented Sep 1, 2020

cescoffier commented Sep 1, 2020

FroMage commented Sep 1, 2020

jaikiran commented Sep 1, 2020

FroMage commented Sep 1, 2020

cescoffier commented Sep 1, 2020

cescoffier commented Sep 1, 2020

jaikiran commented Sep 1, 2020

cescoffier commented Sep 1, 2020

jaikiran commented Sep 1, 2020

kenfinnigan commented Sep 1, 2020

cescoffier commented Sep 1, 2020

FroMage commented Sep 1, 2020

FroMage commented Sep 1, 2020

cescoffier commented Sep 1, 2020

FroMage commented Sep 2, 2020

cescoffier commented Sep 2, 2020

FroMage commented Sep 2, 2020

cescoffier commented Sep 2, 2020

cescoffier commented Sep 2, 2020

FroMage commented Sep 2, 2020

FroMage commented Sep 2, 2020

FroMage commented Sep 2, 2020

jaikiran commented Sep 2, 2020

cescoffier commented Sep 2, 2020

FroMage commented Sep 2, 2020

gsmet commented Sep 2, 2020

gsmet commented Sep 2, 2020

FroMage commented Sep 2, 2020

jaikiran commented Sep 2, 2020

cescoffier commented Sep 2, 2020

jaikiran commented Sep 2, 2020

jaikiran commented Sep 3, 2020

asoldano commented Sep 8, 2020

FroMage commented Sep 8, 2020

kenfinnigan commented Sep 8, 2020

FroMage commented Sep 8, 2020

asoldano commented Sep 10, 2020

jaikiran commented Aug 29, 2020 •

edited by gsmet

Loading

jaikiran commented Sep 1, 2020 •

edited

Loading