Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventingService.afterExecute sometimes executed twice #1455

Closed
computerlove opened this issue Jun 27, 2022 · 22 comments · Fixed by quarkusio/quarkus#27173
Closed

EventingService.afterExecute sometimes executed twice #1455

computerlove opened this issue Jun 27, 2022 · 22 comments · Fixed by quarkusio/quarkus#27173
Assignees

Comments

@computerlove
Copy link
Contributor

computerlove commented Jun 27, 2022

We have an EventingService logging responsetimes:

private static final Map<String, Long> requestTimes = new ConcurrentHashMap<>();

@Override
public void beforeExecute(Context context) {
    requestTimes.put(context.getExecutionId(), currentTimeMillis());
}

@Override
public void afterExecute(Context context) {
    Long startTime = requestTimes.remove(context.getExecutionId());
    log.info("response time " + context.getExecutionId() + " " + calculate(startTime) );
}

After upgrading to Quarkus 2.10.0 afterExecute is executed twice for some Contexts.
Our application fires of ~5 graphql requests at the same time, and it seems random which one is triggering double execution of afterExecute

@phillip-kruger
Copy link
Member

interesting. This is a bug, let me have a look.

@phillip-kruger phillip-kruger self-assigned this Jun 27, 2022
@phillip-kruger
Copy link
Member

Do you perhaps have a reproducer ?

@computerlove
Copy link
Contributor Author

Haven't had time yet, will make one soon!

@phillip-kruger
Copy link
Member

I had a quick look, I can not see anything obvious, so a reproducer would help a lot. Thanks :)

@computerlove
Copy link
Contributor Author

Made a crude reproducer
The frequency of the incidents is higher when the client aborts the request before it has finished.

Run server from root with mvn quarkus:dev, then start client from its folder the same way.

@phillip-kruger
Copy link
Member

Just some feedback, your reproducer recreates the issue perfectly, however, I am not sure how to fix this. I am going to discuss this with @cescoffier tomorrow and hopefully get a fix in for this a.s.a.p.

@cescoffier
Copy link

@phillip-kruger do you know where this callback is invoked?

@phillip-kruger
Copy link
Member

Here:

Uni.createFrom().completionStage(() -> graphQL.executeAsync(executionInput))

I tried:

  • memoize()
  • Removing Uni and just use the underlying CompletionStage

In both cases this still happen

I also checked the onTermination and onCancelation but they are also called twice.

@cescoffier
Copy link

Hum, the Uni emits a single event (item, failure or cancellation (not handled in your case, BTW). So, are you sure that the writeAsync method is not called twice?

@phillip-kruger
Copy link
Member

Let me double check that. Hold on.

@phillip-kruger
Copy link
Member

No, writeAsync is called only one time.

@cescoffier
Copy link

so either the completion stage is totally broken and emits twice, or there is something really wrong.

Add a .log() just after the creation of the Uni with the completion stage.

@cescoffier
Copy link

@phillip-kruger I quick debug session showed me that writeAsync is called many (many) times, sometimes with the same SmallRye Context.

It leads to calling the callback multiple times, as each execution receives a result dispatched using the callback.

@phillip-kruger
Copy link
Member

Really ? My debugging showed writeAsync called ones (per executionId) but then the result is sometimes received multiple times. You can also see the beforeExecute event are always only called once.

@cescoffier
Copy link

I didn't check the execution Id (only the context reference). I will check once back.

@phillip-kruger
Copy link
Member

@cescoffier this still happens with 2.10.3.Final (so the context termination is not related) I still have not idea why this is happening. I'll debug some more once I have time.

@cescoffier
Copy link

What an interesting issue!

The problem is the executionId which is totally messed up because the code is called from different thread and the executed seems to be a thread local or something thread specific.

So, basically, it's even worse than duplicated afterExecute. It may produce a totally broken result for an execution id, as the execution id was not the right one.

I "fixed" the issue by just adding this:

private void writeAsync(GraphQL graphQL,
                          ExecutionInput executionInput,
                          SmallRyeContext smallRyeContext,
                          ExecutionResponseWriter writer) {
      String id = smallRyeContext.getExecutionId(); // Store the execution id
      Uni.createFrom().completionStage(() -> graphQL.executeAsync(executionInput))
              .subscribe().with(executionResult -> {
                  SmallRyeContextManager.restore(smallRyeContext);
                  // To see the issue:
                  System.out.println("what's your id? " + smallRyeContext.getExecutionId() + " =?= " + id);

                  // To fix the issue
                  smallRyeContext.setExecutionId(id);
                  // Notify after
                  eventEmitter.fireAfterExecute(smallRyeContext);

                  ExecutionResponse executionResponse = new ExecutionResponse(executionResult);
                  if (!payloadOption.equals(LogPayloadOption.off)) {
                      log.payloadOut(executionResponse.toString());
                  }
                  writer.write(executionResponse);

              }, failure -> {
                  if (failure != null) {
                      writer.fail(failure);
                  }
              });
  }

@phillip-kruger
Copy link
Member

Wow!! Thanks @cescoffier ! I also went down the wrong Id route but did not figure out what was wrong.

But this probably means everything on the context is not properly propagated ?? I'll have a look about that in the morning.

Thanks for your help...

@cescoffier
Copy link

I'm seeing that the computation is based on thread locals, so definitely broken as your processing involves multiple threads.

@phillip-kruger
Copy link
Member

Ok, I'll have a look tomorrow, thanks again, this helps a lot

@phillip-kruger
Copy link
Member

Good news. I fixed this in Quarkus (see above PR). @computerlove let me know once the PR is merged or on the next release if this works for you. It fixed your reproducer. Thanks. Closing here.

@computerlove
Copy link
Contributor Author

Yep, looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants