Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC / Panache Reactive Combination causes issues after a couple requests #14820

Closed
0SkillAllLuck opened this issue Feb 4, 2021 · 36 comments · Fixed by #16830
Closed

GRPC / Panache Reactive Combination causes issues after a couple requests #14820

0SkillAllLuck opened this issue Feb 4, 2021 · 36 comments · Fixed by #16830
Assignees
Labels
area/grpc gRPC area/panache kind/bug Something isn't working
Milestone

Comments

@0SkillAllLuck
Copy link
Contributor

Describe the bug
When invoking Panache Entities via a GRPC Service in a reactive way, the first couple invocations will work but after that all that happens is a "java.lang.IllegalStateException: session is currently connecting to database"

Expected behavior
The Panache invocations work until the end of the universe.

Actual behavior
After a couple of invocations only "java.lang.IllegalStateException: session is currently connecting to database" is produced.
HealthChecks will also fail if present in the project.

To Reproduce

Reproducer: 0SkillAllLuck/code-with-quarkus

Environment (please complete the following information):

  • Output of java -version: openjdk version "11.0.9" 2020-10-20
  • Quarkus version or git rev: 1.11.1.Final
  • Build tool (ie. output of mvnw --version or gradlew --version): Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
@0SkillAllLuck 0SkillAllLuck added the kind/bug Something isn't working label Feb 4, 2021
@ghost ghost added area/grpc gRPC area/panache labels Feb 4, 2021
@ghost
Copy link

ghost commented Feb 4, 2021

@FroMage
Copy link
Member

FroMage commented Feb 4, 2021

@Sanne this seems related to session/thread discussions, no? Can you route this to someone who can take a look?

@Sanne
Copy link
Member

Sanne commented Feb 4, 2021

Sure. @FroMage , could you have a look? :)

Jokes aside, @0SkillAllLuck : currently you can only use Panache Reactive if either of these are true:

a. you're running in a reactive context
b. you wrap your code win a transaction, for example using Panache.withTransaction( ... ).

I apologize for the confusion, we need to work on detecting this and throwing a better error.

@0SkillAllLuck
Copy link
Contributor Author

@Sanne I just tried to wrap my call with Panache.withTransaction( ... ). Either I am stupid enough to not use it correctly or it still doesn't work. I updated the reproducer repo with the wrapping. It's still doing the same behaviour.

I also thought using the Mutiny GRPC Service should be reactive or aren't they?

@0SkillAllLuck
Copy link
Contributor Author

It looks like the issue is gone with version 1.12.0 of quarkus.

@Sanne
Copy link
Member

Sanne commented Mar 5, 2021

excellent, thanks @0SkillAllLuck . Yes this specific issue was suspiciously similar to some others we fixed, I probably should have marked them as duplicates.

@0SkillAllLuck
Copy link
Contributor Author

@Sanne I think it is still an issue. I had .runSubscriptionOn(Infrastructure.getDefaultExecutor()) in my gRPC services. When removing them the issue reappeared. Also it seems like if you include smallrye-health it will break after some time as well.

@0SkillAllLuck 0SkillAllLuck reopened this Mar 11, 2021
@0SkillAllLuck
Copy link
Contributor Author

Also I noticed that sometimes the error message now says: „io.vertx.pgclient.PgException: sorry, too many clients already“. Are the clients not closed properly?

@FroMage
Copy link
Member

FroMage commented Mar 11, 2021

That's worrying, can you provide a reproducer?

@0SkillAllLuck
Copy link
Contributor Author

While building the reproducer I noticed that it also happens when using hibernate-reactive without panache so I created another reproducer with only hibernate.

Postgres used: docker run --name postgres -e POSTGRES_PASSWORD=quarkus -e POSTGRES_USER=quarkus -e POSTGRES_DB=quarkus -d -p 5432:5432 postgres

Panache Reproducer
Hibernate Reproducer

For both reproducers I used grpcui and just did 4 requests to the grpc service. The 5h request will hang forever and cause the issue

@FroMage
Copy link
Member

FroMage commented Mar 12, 2021

I got "sorry, too many clients already" yesterday, and it was caused by me having too many IO threads talking to the DB. I could turn it down using quarkus.datasource.reactive.max-size=3 or bump up the max connections in the DB itself.

Perhaps that's just what this is about?

@0SkillAllLuck
Copy link
Contributor Author

When setting quarkus.datasource.reactive.max-size to a low number I no longer get the io.vertx.pgclient.PgException: sorry, too many clients already but I get the java.lang.IllegalStateException: session is currently connecting to database.

@0SkillAllLuck
Copy link
Contributor Author

Also when using hibernate-panache without the reactive version it works well when wrapping it like this:

Multi.createFrom().items(() -> …).runSubscriptionOn(Infrastructure.getDefaultWorkerPool())

@0SkillAllLuck
Copy link
Contributor Author

@FroMage, @Sanne I figured out a way around this issue that potentially helps to find the cause. When using hibernate-reactive the „normal“ way (without panache) and using all tricks like withTransaction and runSubscriptionOn it still doesn‘t work in gRPC. But when using a SessionFactory instead of Session in the Inject and then manually closing the session after the call it works fine. So I guess that in the context of gRPC the session isn‘t closed correctly.

@0SkillAllLuck
Copy link
Contributor Author

Also when using the non reactive version of panache and enabling metrics you can see that the sessions stay in the active state for ever

@0SkillAllLuck
Copy link
Contributor Author

For anyone experiencing the same issue I have found a workaround by appending all grpc calls with:

        .onTermination().invoke(() -> Panache.getSession().close())
        .runSubscriptionOn(Infrastructure.getDefaultExecutor())
        .emitOn(Infrastructure.getDefaultExecutor());

@cescoffier
Copy link
Member

@michalszynkiewicz can you have a look? It may be out of date.

@michalszynkiewicz michalszynkiewicz self-assigned this Apr 26, 2021
@michalszynkiewicz
Copy link
Member

@cescoffier I'm able to reproduce it on 2.0.0.Alpha1, will take a deeper look

@michalszynkiewicz
Copy link
Member

In my tests, in the hibernate version, it is enough to close the session, e.g.:

        return this.session.createNamedQuery( "Fruits.findAll", Fruit.class ).getResults()
                .onItem().transform(Fruit::toGrpc).onTermination().invoke(() -> JpaOperations.getSession().close());

@Sanne @cescoffier @FroMage what should close the session?

@0SkillAllLuck
Copy link
Contributor Author

@michalszynkiewicz I have tried a similiar aproach with Panache.getSession().close(), that seems to work fine under light load but caused issues under higher load. I will try to collect the concrete exceptions and errors that were caused. Is Panache.getSession() and JpaOperations.getSession() the same?

@michalszynkiewicz
Copy link
Member

michalszynkiewicz commented Apr 27, 2021

indeed, three terminals each making 4000 calls with grpcurl result in some

ERROR:
  Code: Unknown
  Message: java.lang.IllegalStateException - Session/EntityManager is closed

And it's still the case with what we have in the main branch

@0SkillAllLuck
Copy link
Contributor Author

It seems like the higher the load the more percentage of the request result in that error.

@Sanne
Copy link
Member

Sanne commented Apr 27, 2021

@michalszynkiewicz thanks for helping :) I suspect this might be a duplicate of hibernate/hibernate-reactive#707 , could you build Hibernate Reactive from main and try reproducing with that one?

@Sanne
Copy link
Member

Sanne commented Apr 27, 2021

@michalszynkiewicz sorry I was wrong, that fix was already integrated in Quarkus main: #16588 -

@0SkillAllLuck
Copy link
Contributor Author

Also I had the same issue without the reactive version of hibernate, so shouldn‘t be a hibernate-reactive issue

@michalszynkiewicz
Copy link
Member

I see the session is created with ReactiveSessionProducer#createMutinySession but disposeMutinySession is not called, maybe that's one of the issues, lack of some CDI clean up of request scope?

That wouldn't explain why sometimes a closed session is used whe we add manual session closing...

@Sanne
Copy link
Member

Sanne commented Apr 27, 2021

Keep in mind that the Session can't be closed yet after the reactive method is invoked, as it needs to process the IO results when the actual data to process is returned. It should be closed automatically at the end of such event.

@mkouba had a fix for something that might be related, but AFAIR he mentioned it wouldn't cover all use cases:

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 27, 2021
@michalszynkiewicz
Copy link
Member

IIRC, the request context clean-up wasn't working properly for this use case.

@0SkillAllLuck , if you'd like to test if it works for you, you can use this branch: https://github.com/michalszynkiewicz/quarkus/tree/grpc-request-context-cleanup

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 27, 2021
michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 28, 2021
michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 28, 2021
@0SkillAllLuck
Copy link
Contributor Author

@michalszynkiewicz Maybe that is an error on my site, but now it works flawless until I get the first error for example a null constraint. From that point on no request will work but instead that first error will be repeatet over and over again.

I used the following order:

  1. read all request: OK
  2. create request: OK
  3. read all request: OK
  4. create request: NOK, null constraint
  5. read all request: NOK, null constraint

Am I missing something obvious ?

@michalszynkiewicz
Copy link
Member

@0SkillAllLuck I haven't seen this. I see problems in dev mode, all doesn't work after dev mode reload, I'll update the PR when I get to the bottom of it.

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 29, 2021
@michalszynkiewicz
Copy link
Member

michalszynkiewicz commented Apr 29, 2021

@0SkillAllLuck I tried to reproduce the behavior you're describing with my newest code and I couldn't. Could you check if the newest changes in the branch I linked?

Thanks a lot for testing the previous version :)

If it still doesn't work for you, I'd be grateful for a small reproducer (an update of one of the existing ones?)

@0SkillAllLuck
Copy link
Contributor Author

@michalszynkiewicz I updated to the latest changes in your branch. Now the error is gone but changes made inside Panache.withTransaction are not persisted. I need to manually call Panache.getSession().flush()

@michalszynkiewicz
Copy link
Member

@0SkillAllLuck I don't think you need Panache.withTransaction in grpc service... Looking at Quarkus quickstart for hibernate reactive the flush may be necessary: https://github.com/quarkusio/quarkus-quickstarts/blob/main/hibernate-reactive-quickstart/src/main/java/org/acme/hibernate/reactive/FruitMutinyResource.java#L60
Is that true @Sanne ?

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 30, 2021
@0SkillAllLuck
Copy link
Contributor Author

@michalszynkiewicz I've removed all @Transactional and Panache.withTransaction parts now. Now everything seems to be working as intended, if longterm issue occur I'll update this issue.

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 30, 2021
michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 30, 2021
michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 30, 2021
michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue Apr 30, 2021
@michalszynkiewicz
Copy link
Member

Great, thanks a lot for checking. I fixed a few other issues in my PR, I hope it will get into the next release of 2.0.0

michalszynkiewicz added a commit to michalszynkiewicz/quarkus that referenced this issue May 4, 2021
@quarkus-bot quarkus-bot bot added this to the 2.0 - main milestone May 4, 2021
@michalszynkiewicz
Copy link
Member

I'm wondering if the fixes for it are not worth backporting to 1.11. What do you guys think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/grpc gRPC area/panache kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants