Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Disable MutinyTest#testSSE for now as it's unstable on CI" #11727

Closed
wants to merge 2 commits into from
Closed

Revert "Disable MutinyTest#testSSE for now as it's unstable on CI" #11727

wants to merge 2 commits into from

Conversation

jaikiran
Copy link
Member

@jaikiran jaikiran commented Aug 29, 2020

In #11693 we disabled the MutinyTest#testSSE due to CI issues. Now that CI has stabilized, re-enabling this to see if there are any genuine issues.

Fixes #11687

@cescoffier
Copy link
Member

@jaikiran Thanks!

So, it's a bit more complicated :-)

In this test, the items are emitted on 2 threads which lead to a kind of race condition in how the items are written in RestEasy. It passes if we removed the emitOn (which was superfluous anyway).

So you want me to open a PR against your PR, or reopen a fresh one?

@gsmet
Copy link
Member

gsmet commented Aug 31, 2020

@cescoffier just push the fix here if possible

@cescoffier
Copy link
Member

@gsmet @jaikiran Done.

@cescoffier
Copy link
Member

and who forgot to run the formatter before committing? I give you and hint... He is currently writing this comment.

@jaikiran
Copy link
Member Author

I give you and hint... He is currently writing this comment.

Don't worry, I fixed it and pushed the change a few seconds back :)

@gsmet
Copy link
Member

gsmet commented Aug 31, 2020

The JDK 14 build failed with an error in the very same test.

@cescoffier
Copy link
Member

Hum... so it's not the reason... Something else is going on.

@cescoffier
Copy link
Member

working on a faster reproducer (https://github.com/cescoffier/quarkus/runs/1052595409?check_suite_focus=true).... obviously it passes when you disable the other tests.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 1, 2020

I haven't been able to get too much out of my investigations last weekend. But when I did check, what I noticed was when this test fails, the HTTP request that's made to /pets get issued 120 times (instead of once) within a span of that 1 minute timeout. The request does make it all the way to the endpoint (where I had added a print statement). This 120 request in 1 minute indicates a reconnect being issued (because the default reconnect timeout is 500ms in that SSE event source implementation of resteasy https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L60). I haven't been able to understand or figure out why the reconnect is being issued. The moment I add any kind of DEBUG or TRACE logging, I can no longer reproduce it, which is a sign that there might be some kind of race involved.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 1, 2020

Given that resteasy seems to be involved, I checked out resteasy upgrades in Quarkus. We upgraded from 4.5.5.Final to 4.5.6.Final on July 22nd. Comparing the commit differences in these 2 releases of resteasy[1], I don't see anything significant. There's one commit which does touch the "HTTP layer", but I can't yet see how that might impact SSEs.

[1] resteasy/resteasy@4.5.5.Final...4.5.6.Final

@cescoffier
Copy link
Member

I didn't count, but I confirm the behavior seen by @jaikiran 👍

2020-09-01T05:49:49.2528935Z Accepting...
2020-09-01T05:49:49.2530371Z Server Side - Emitting neo
2020-09-01T05:49:49.2530856Z Server Side - Emitting neo
2020-09-01T05:49:49.2531263Z Server Side - Emitting neo
2020-09-01T05:49:49.2531662Z Server Side - Emitting neo
2020-09-01T05:49:49.2532056Z Server Side - Emitting neo
2020-09-01T05:49:49.2532450Z Server Side - Emitting neo
2020-09-01T05:49:49.2533018Z Server Side - Emitting neo
2020-09-01T05:49:49.2533361Z Server Side - Emitting neo
...

So, the client is re-connecting to the stream which restarts from the beginning (it's a cold stream so it's the expected behavior).

The items are emitted, but the client does not get the item, probably due to the reconnection.

@cescoffier
Copy link
Member

@FroMage by any chance do you know what could trigger a reconnection on the client-side of the SSE? The client does not receive any items and does not get a failure, it just reconnects.

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

It only reconnects if it gets an IO exception on the connection, in theory

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

But SSE mandates reconnections, so if you have a cold stream it's not clear that this is your issue. What precisely is sent by the server and what does the client receive and expect?

@cescoffier
Copy link
Member

The client sends a few items, but generally, only send the first one and immediately get a reconnect, restarting the stream from the beginning.

From my understand:

  1. the client connect to the SSE
  2. the server endpoint is called, and get the stream
  3. it emits the first items from the stream
  4. something bad happens (not sure what) as the client does not get the item and reconnect.
  5. go back to 1 until timeout

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

OK, well in that case we have to know what is the "something bad happens".

@jaikiran
Copy link
Member Author

jaikiran commented Sep 1, 2020

Looking at this code https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L333, it may not necessarily be an IOException which triggers this reconnect. It looks like, if for whatever reason it fails to deserialize the SseEventInputImpl out of the response, it can go into a loop[1] where it tries to reconnect and then read the response over and over again.
This would have been so much easier if one of us could reproduce this locally. The whole GitHub actions based tests just takes way too long (even the minimal one) to even think of trying certain things.

[1] https://github.com/resteasy/Resteasy/blob/4.5.6.Final/resteasy-client/src/main/java/org/jboss/resteasy/plugins/providers/sse/client/SseEventSourceImpl.java#L376

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

Wouldn't you get onUnrecoverableError called, though? Unless it deserialises to null but that's weird.

@cescoffier
Copy link
Member

Trying to get a bit more details. I was able to use my own implementation of SseEventSourceImpl with more logs in it.

@cescoffier
Copy link
Member

@jaikiran @FroMage So, here is the reason of the reconnect:

2020-09-01T10:24:58.5985523Z MySSE - reconnecting because input is null or closed: org.jboss.resteasy.plugins.providers.sse.SseEventInputImpl@1c4331b8

The input is clearly not null, so it's not opened.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 1, 2020

The input is clearly not null, so it's not opened.

The only way I can see that happening is SseEventInputImpl#read() was called at least once on that SseEventInputImpl instance and for some reason failed to parse the input stream? Otherwise I don't see how the SseEventInputImpl#isClosed can ever be true. I can see that the SseEventInputImpl#read() unfortuantely doesn't log any messages on IOException or such so it becomes more difficult to try and narrow this down.

@cescoffier
Copy link
Member

Trying to substitution the ServerSentEventImpl with my own implementation to log more.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 1, 2020

(slightly OT - long term, I think it's time to see how we can ease the integration of Byteman https://byteman.jboss.org/ into all these different modes of Quarkus launched apps/tests)

@kenfinnigan
Copy link
Member

Did anyone try running with the previous RESTEasy version, if only to eliminate the upgrade as a problem?

@cescoffier
Copy link
Member

Continuing to make slow progress:

Something is wrong when this is happening:

InboundSseEvent event = eventInput.read();

It's closing the SSE because the read chunk is null:

chunk = readEvent(inputStream);

readEvent returns null.

So something is wrong in the content sent by the server.

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

This goes via SseEventProvider which implements MessageBodyReader<SseEventInputImpl>, and calls

   @Override
   public SseEventInputImpl readFrom(Class<SseEventInputImpl> cls, Type type, Annotation[] annotations,
         MediaType mediaType, MultivaluedMap<String, String> httpHeaders, InputStream entityStream) throws IOException,
         WebApplicationException
   {
      MediaType streamType = mediaType;
      if (mediaType.getParameters() != null)
      {
         Map<String, String> map = mediaType.getParameters();
         String elementType = map.get(SseConstants.SSE_ELEMENT_MEDIA_TYPE);
         if (elementType != null)
         {
            mediaType = MediaType.valueOf(elementType);
         }
      }
      return new SseEventInputImpl(annotations, streamType, mediaType, httpHeaders, entityStream);
   }

Which doesn't do anything:

   public SseEventInputImpl(final Annotation[] annotations, final MediaType streamType, final MediaType elementType,
                            final MultivaluedMap<String, String> httpHeaders, final InputStream inputStream)
   {
      this.annotations = annotations;
      this.mediaType = elementType;
      this.httpHeaders = httpHeaders;
      this.inputStream = inputStream;
      this.textLike = MediaTypeHelper.isTextLike(streamType);
      this.escape = streamType != null && streamType.toString().startsWith("application/x-stream-general");
   }

But there are these methods, that I presume are called:

   public InboundSseEvent read() throws IOException
   {
      byte[] chunk = null;
      try
      {
         lastFieldWasData = false;
         chunk = readEvent(inputStream);
         if (chunk == null)
         {
            close();
            return null;
         }
      }
      catch (IOException e1)
      {
         try
         {
            close();

         }
         catch (IOException e)
         {
            //TODO: add a log message
         }
         throw e1;

      }

      final ByteArrayInputStream entityStream = new ByteArrayInputStream(chunk);
      final ByteArrayOutputStream temSave = new ByteArrayOutputStream();
      Charset charset = StandardCharsets.UTF_8;
      if (mediaType != null && mediaType.getParameters().get(MediaType.CHARSET_PARAMETER) != null)
      {
         charset = Charset.forName(mediaType.getParameters().get(MediaType.CHARSET_PARAMETER));
      }

      final InboundSseEventImpl.Builder eventBuilder = new InboundSseEventImpl.Builder(annotations, mediaType,
            httpHeaders);
      //TODO: Look at if this can be improved
      int b = -1;
      SseConstants.EVENT currentState = SseConstants.EVENT.START;
      while ((b = entityStream.read()) != -1)
      {
         try
         {
            if (currentState == SseConstants.EVENT.START)
            {
               if (b == '\r' || b == '\n')
               {
                  continue;
               }

               if (b == ':')
               {
                  currentState = SseConstants.EVENT.COMMENT;
                  continue;
               }
               else
               {
                  temSave.write(b);
                  currentState = SseConstants.EVENT.FIELD;
                  continue;
               }
            }
            if (currentState == SseConstants.EVENT.COMMENT)
            {

               b = readLine(entityStream, '\n', temSave);
               String commentLine = temSave.toString(charset.name());
               eventBuilder.commentLine(commentLine);
               temSave.reset();
               currentState = SseConstants.EVENT.START;
               continue;
            }
            if (currentState == SseConstants.EVENT.FIELD)
            {
               temSave.write(b);
               b = readLine(entityStream, ':', temSave);
               String fieldName = temSave.toString(StandardCharsets.UTF_8.name());
               temSave.reset();
               if (b == ':')
               {
                  //spec says there is space after colon
                  do
                  {
                     b = entityStream.read();
                  }
                  while (b == ' ');

                  if (b != '\n' && b != '\r' && b != -1)
                  {
                     temSave.write(b);
                     b = readLine(entityStream, '\n', temSave);
                  }
               }

               processField(eventBuilder, fieldName, mediaType, temSave.toByteArray());
               temSave.reset();
               currentState = SseConstants.EVENT.START;
               continue;
            }
         }
         catch (IOException e)
         {
            throw new IOException(Messages.MESSAGES.readEventException(), e);
         }
      }
      InboundSseEventImpl event = (InboundSseEventImpl) eventBuilder.build();
      if (this.providers != null)
      {
         event.setProvider(this.providers);
      }
      return event;
   }

Which I'm not sure how to make sense of, since it tries to call readEvent(inputStream) first which does all sorts of weird things, and can return null:

   public byte[] readEvent(final InputStream in) throws IOException
   {
      @SuppressWarnings("resource")
      EventByteArrayOutputStream buffer = new EventByteArrayOutputStream();
      int data;
      int pos = 0;
      boolean boundary = false;
      byte[] eolBuffer = new byte[5];
      while ((data = in.read()) != -1)
      {
         byte b = (byte) data;
         if (!textLike && b == '\\')
         {
            buffer.write(b);
            b = (byte) in.read();
         }
         else
         {
            if (b == '\r' || b == '\n')
            {
               eolBuffer[pos] = b;
               //if it meets \r\r , \n\n , \r\n\r\n or \n\r\n\r\n
               if ((pos > 0 && eolBuffer[pos] == eolBuffer[pos - 1])
                     || (pos >= 3 && new String(eolBuffer, 0, pos, StandardCharsets.UTF_8).contains(DELIMITER)))
               {
                  boundary = true;
               }
               //take it a boundary if there are 5 unexpected eols
               if (pos++ > 4)
               {
                  boundary = true;
               }
            }
            else
            {
               pos = 0;
            }
         }
         buffer.write(b);
         if (boundary && buffer.size() > pos)
         {
            return buffer.getEventPayLoad();
         }
         //if it's emtpy
         if (boundary && buffer.size() == pos)
         {
            pos = 0;
            boundary = false;
            buffer.reset();
            continue;
         }
      }
      return null;
   }

So yeah, that's where you'll have to look at the bytes being looked at.

@FroMage
Copy link
Member

FroMage commented Sep 1, 2020

But this class doesn't appear to have changed at all recently.

@cescoffier
Copy link
Member

@FroMage exactly, it's where I'm looking at the moment.
While we see the cause of reconnection on the client-side, the issue might on the server-side. It may send something corrupted. Instrumenting the server-side might be more tricky...

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

I see a client in https://github.com/quarkusio/quarkus/blob/5ca9f4ac8f80a98ffc724da0f3e13d35f9833805/integration-tests/resteasy-mutiny/src/main/java/io/quarkus/it/resteasy/mutiny/MutinyResource.java

It's not in your call directly, but is there any chance it's running as part of the test for some reason? What if you disable these endpoints and the test that calls them?

I'd like to rule out the accidental use of a client context.

@cescoffier
Copy link
Member

Let me try that right now.

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

Also, print ResteasyProviderFactory.getInstance() in your original request and in each item you send over SSE along when you print the current thread. They should always be the same.

It looks like the only way to get a NOOPServerHelper is via the client context. Now, I don't understand how that can happen, but at least I want to verify this is what's happening.

@cescoffier
Copy link
Member

@cescoffier
Copy link
Member

@FroMage same failure:

/pets ResteasyProviderFactory = io.quarkus.restclient.runtime.RestClientRecorder$2@5063b38c
2020-09-02T09:09:23.2857412Z Sending double EOL
2020-09-02T09:09:23.2937794Z Server Side - Emitting neo, ResteasyProviderFactory = io.quarkus.restclient.runtime.RestClientRecorder$2@5063b38c
2020-09-02T09:09:23.2858429Z 	 Read 10
2020-09-02T09:09:23.2858831Z 	 Read 10
2020-09-02T09:09:23.2934889Z java.lang.UnsupportedOperationException
2020-09-02T09:09:23.2935975Z 	at org.jboss.resteasy.core.providerfactory.NOOPServerHelper.getMessageBodyWriters(NOOPServerHelper.java:118)
2020-09-02T09:09:23.2936931Z 	at org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl.getServerMessageBodyWriters(ResteasyProviderFactoryImpl.java:298)
2020-09-02T09:09:23.2938652Z 	at org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl.getMessageBodyWriter(ResteasyProviderFactoryImpl.java:1252)
2020-09-02T09:09:23.2939059Z send org.jboss.resteasy.plugins.providers.sse.OutboundSseEventImpl@607a0942
2020-09-02T09:09:23.2939850Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventProvider.writeTo(SseEventProvider.java:111)
2020-09-02T09:09:23.2940020Z writeEvent org.jboss.resteasy.plugins.providers.sse.OutboundSseEventImpl@607a0942
2020-09-02T09:09:23.2940779Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventProvider.writeTo(SseEventProvider.java:33)
2020-09-02T09:09:23.2940934Z Synchronous Exception while sending event
2020-09-02T09:09:23.2941259Z 	at org.jboss.resteasy.plugins.providers.sse.SseEventOutputImpl.writeEvent(SseEventOutputImpl.java:308)

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

OK, we found the culprits then: f65ee2b added:

        ResteasyProviderFactory.setInstance(clientProviderFactory);

Which just breaks all server calls that expect this to be a server client factory.

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

That's NASTY.

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

BTW I think it's pretty nuts to have two ResteasyProviderFactory (client and server) but if we MUST have two, they should be resolved from the ResteasyContext rather than hanging around as statics.

Good luck fixing that, BTW, it won't be trivial.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 2, 2020

OK, we found the culprits then: f65ee2b added:

        ResteasyProviderFactory.setInstance(clientProviderFactory);

Which just breaks all server calls that expect this to be a server client factory.

Hello @FroMage, does that also explain why this test passes sometimes but fails on other occasions? Or are we still missing any other links?

@cescoffier
Copy link
Member

@radcortez @kenfinnigan How can we remove that call?

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

How can it explain that it sometimes fails. I guess it could depend on when you end up executing that client recorder stuff. It's not clear to me, honestly. But it's still wrong.

@gsmet
Copy link
Member

gsmet commented Sep 2, 2020

Looks like I was a visionary: #11448 (review) :)

@gsmet
Copy link
Member

gsmet commented Sep 2, 2020

If it's a problem I have nothing against reverting this PR until we can get a proper fix.

@FroMage
Copy link
Member

FroMage commented Sep 2, 2020

Looks like I was a visionary: #11448 (review) :)

Haha, well spotted, you should have called their bluff though ;)

@jaikiran
Copy link
Member Author

jaikiran commented Sep 2, 2020

@cescoffier, I now have something that might shed some light on why this is inconsistent. I just need your setup where this is happening consistently. I plan to push a certain change there. Is that OK? I will make sure I copy over your current branch locally before messing up that branch and pushign my commits.

@gsmet gsmet modified the milestone: 1.8.0.Final Sep 2, 2020
@cescoffier
Copy link
Member

@jaikiran go ahead. I'm not going to work on that for the rest of the day. Feel free to remove my hundreds of traces.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 2, 2020

So, I managed to get Byteman integrated (in a hackish way for now) against this test to see what's going on. @FroMage had anyway already solved the mystery of the root cause, so it helped me come up with specific Byteman scripts[1] to figure why this fails only on certain occasions. Here's what it shows up.

When this test succeeds, the io.quarkus.restclient.runtime.RestClientRecorder is the one which first sets the org.jboss.resteasy.spi.ResteasyProviderFactory to an instance that is incorrect/unusable for our purposes. This however is then overwritten by io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder to a correct usable instance. Here's what the call path (in that order) looks like for a passing case:

2020-09-02T11:41:46.8131751Z Wed Sep 02 11:41:46 UTC 2020[Thread[main,5,main]]  provider factory is io.quarkus.restclient.runtime.RestClientRecorder$2@48c5e6fb
2020-09-02T11:41:46.8134811Z Stack trace for thread main
2020-09-02T11:41:46.8141141Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:46.8148741Z io.quarkus.restclient.runtime.RestClientRecorder.initializeResteasyProviderFactory(RestClientRecorder.java:72)
2020-09-02T11:41:46.8160829Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy_0(RestClientProcessor$registerProviders350329842.zig:506)
2020-09-02T11:41:46.8171859Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy(RestClientProcessor$registerProviders350329842.zig:40)
2020-09-02T11:41:46.8177626Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:202)

....

2020-09-02T11:41:46.8714811Z Rule.execute called for ResteasyProviderFactory#setInstance_6:1
2020-09-02T11:41:46.8714955Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:46.8715123Z Wed Sep 02 11:41:46 UTC 2020[Thread[main,5,main]]  provider factory is org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl@232fcbb1
2020-09-02T11:41:46.8715284Z Stack trace for thread main
2020-09-02T11:41:46.8715431Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:46.8715594Z org.jboss.resteasy.core.ResteasyDeploymentImpl.initializeFactory(ResteasyDeploymentImpl.java:373)
2020-09-02T11:41:46.8715755Z org.jboss.resteasy.core.ResteasyDeploymentImpl.startInternal(ResteasyDeploymentImpl.java:129)
2020-09-02T11:41:46.8715912Z org.jboss.resteasy.core.ResteasyDeploymentImpl.start(ResteasyDeploymentImpl.java:118)
2020-09-02T11:41:46.8810983Z io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder.staticInit(ResteasyStandaloneRecorder.java:71)
2020-09-02T11:41:46.8815366Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy_0(ResteasyStandaloneBuildStep$staticInit-210558872.zig:811)
2020-09-02T11:41:46.8819202Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy(ResteasyStandaloneBuildStep$staticInit-210558872.zig:40)
2020-09-02T11:41:46.8821692Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:213)

On the other hand, when the test fails, the order of these recorders is reversed. So the io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder first sets the org.jboss.resteasy.spi.ResteasyProviderFactory instance to an usable one, only to be overwritten to an incorrect/unusable one by the io.quarkus.restclient.runtime.RestClientRecorder. Here's what the call path looks like (in that order) in that case:

2020-09-02T11:41:35.9744918Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:35.9746213Z Wed Sep 02 11:41:35 UTC 2020[Thread[main,5,main]]  provider factory is org.jboss.resteasy.core.providerfactory.ResteasyProviderFactoryImpl@4469a74d
2020-09-02T11:41:35.9811159Z Stack trace for thread main
2020-09-02T11:41:35.9812026Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:35.9813417Z org.jboss.resteasy.core.ResteasyDeploymentImpl.initializeFactory(ResteasyDeploymentImpl.java:373)
2020-09-02T11:41:35.9814759Z org.jboss.resteasy.core.ResteasyDeploymentImpl.startInternal(ResteasyDeploymentImpl.java:129)
2020-09-02T11:41:35.9815933Z org.jboss.resteasy.core.ResteasyDeploymentImpl.start(ResteasyDeploymentImpl.java:118)
2020-09-02T11:41:35.9817571Z io.quarkus.resteasy.runtime.standalone.ResteasyStandaloneRecorder.staticInit(ResteasyStandaloneRecorder.java:71)
2020-09-02T11:41:35.9820429Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy_0(ResteasyStandaloneBuildStep$staticInit-210558872.zig:811)
2020-09-02T11:41:35.9821829Z io.quarkus.deployment.steps.ResteasyStandaloneBuildStep$staticInit-210558872.deploy(ResteasyStandaloneBuildStep$staticInit-210558872.zig:40)
2020-09-02T11:41:35.9822303Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:202)

.....

2020-09-02T11:41:36.6974037Z ResteasyProviderFactory#setInstance execute
2020-09-02T11:41:36.6975074Z Wed Sep 02 11:41:36 UTC 2020[Thread[main,5,main]]  provider factory is io.quarkus.restclient.runtime.RestClientRecorder$2@5a9529a
2020-09-02T11:41:36.6976080Z Stack trace for thread main
2020-09-02T11:41:36.6977075Z org.jboss.resteasy.spi.ResteasyProviderFactory.setInstance(ResteasyProviderFactory.java:97)
2020-09-02T11:41:36.6978114Z io.quarkus.restclient.runtime.RestClientRecorder.initializeResteasyProviderFactory(RestClientRecorder.java:72)
2020-09-02T11:41:36.6979154Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy_0(RestClientProcessor$registerProviders350329842.zig:506)
2020-09-02T11:41:36.6980196Z io.quarkus.deployment.steps.RestClientProcessor$registerProviders350329842.deploy(RestClientProcessor$registerProviders350329842.zig:40)
2020-09-02T11:41:36.6981210Z io.quarkus.runner.ApplicationImpl.<clinit>(ApplicationImpl.zig:213)


So the transient nature of this failure is due to non-determinstic execution ordering of the recorders.

[1] https://github.com/jaikiran/quarkus/commits/mutiny-sse-enable-byteman

P.S: I'll see if I can make this Byteman integration a bit more seamless and easier to have to usable by some kind of simple switch for solving issues like these.

@jaikiran
Copy link
Member Author

jaikiran commented Sep 3, 2020

Replaced by #11828

@jaikiran jaikiran closed this Sep 3, 2020
@asoldano
Copy link
Contributor

asoldano commented Sep 8, 2020

@jaikiran @FroMage @cescoffier @kenfinnigan , sorry for the late feedback here, I've been on PTO and it took time to process the backlog once back. The exception above from the NOOPServerHelper is indeed caused by a ResteasyProviderFactory instance meant for client side usage being actually used on server side. And f65ee2b is really problematic, you should really be careful with ResteasyProviderFactory.setInstance(). Actually, ResteasyProviderFactory.getInstance() is risky too as it allows incurring in problems like this and unfortunately it's often used in resteasy as it's convenient...
In our case here, I believe RESTEasy should be fixed like this asoldano/Resteasy@3b7b8c4 , even if the invocation to setInstance() is still wrong (I'm sure there are other parts of the code where getInstance() should be avoided).

Having said this, I don't personally like the design of ResteasyProviderFactory that much (I've seen other failures because of lack of proper isolation of instances), but improving it is a hard task. I tried something back in 2019, but didn't manage to have enough time to focus on it and finish it (resteasy/resteasy#2240).
Related to this, I was not very happy to see the static instance set in https://github.com/resteasy/Resteasy/blob/master/resteasy-client-microprofile-base/src/main/java/org/jboss/resteasy/microprofile/client/RestClientBuilderImpl.java#L72 since the beginning, hence I would encourage looking for alternatives before adding the same in plain jaxrs client (resteasy/resteasy#2504)

@FroMage
Copy link
Member

FroMage commented Sep 8, 2020

@asoldano asoldano/Resteasy@3b7b8c4 looks indeed like the right thing to do.

@kenfinnigan
Copy link
Member

@asoldano There is no other alternative to setting the ResteasyProviderFactory statically on the REST Client because instantiation is controlled by service loading.

If there's another way to achieve what is needed, explicitly defining the provider factory to control what providers are loaded, without using a static method/field, then please do so. But I'd appreciate not blocking this approach without an alternative

@FroMage
Copy link
Member

FroMage commented Sep 8, 2020

IMO the two issues are orthogonal. @asoldano 's fix is the proper fix for RESTEasy, because it ensures the server's providers are used by its SSE system.

With that fix, the order in which you set the ResteasyProviderFactory for the client will not impact SSE handling anymore. So it's the right fix, and I bet more getInstance() calls should be replaced, as he said, to avoid similar bugs in non-SSE systems.

The question of being able to configure ResteasyProviderFactory for client and server statically is another matter, and indeed, probably that static method is not the right thing to do, but we can fix that without reverting either this Quarkus workaround (ordering issue), and Alessio's RESTEasy bug fix.

@gsmet gsmet added the triage/invalid This doesn't seem right label Sep 8, 2020
@asoldano
Copy link
Contributor

To follow up on this, Ron should have merged Ken's PR.
Speaking of my fix, I've merged it and I'm also changing / extending it a bit with this resteasy/resteasy#2509 (in case you want to try it, @FroMage ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependencies Pull requests that update a dependency file triage/invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MutinyTest#testSSE can be unstable
6 participants