Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hibernate Reactive: NamedQuery fails after hot reload #15164

Closed
markusdlugi opened this issue Feb 18, 2021 · 32 comments · Fixed by #15779
Closed

Hibernate Reactive: NamedQuery fails after hot reload #15164

markusdlugi opened this issue Feb 18, 2021 · 32 comments · Fixed by #15779
Labels
area/hibernate-reactive Hibernate Reactive area/persistence OBSOLETE, DO NOT USE env/windows Impacts Windows machines kind/bug Something isn't working
Milestone

Comments

@markusdlugi
Copy link
Contributor

Describe the bug
Same issue as already described by @cescoffier in #13355, but I was actually able to reproduce it reliably 😄

After the application is reloaded in dev mode due to a code change, named queries do not work anymore. You don't even need to change something in the queried entity itself, just any change that leads to an entire reload of the application will be sufficient.

Expected behavior
NamedQuery should work just as before hot reload.

Actual behavior
Executing a NamedQuery leads to the following exception:

2021-02-18 10:46:11,037 ERROR [org.jbo.res.rea.ser.cor.ExceptionMapping] (vert.x-eventloop-thread-11) Request failed : java.lang.IllegalArgumentException: Type specified for TypedQuery [com.example.Fruit] is incompatible with query return type [class com.example.Fruit]
	at org.hibernate.internal.AbstractSharedSessionContract.resultClassChecking(AbstractSharedSessionContract.java:863)
	at org.hibernate.reactive.session.impl.ReactiveSessionImpl.createReactiveQuery(ReactiveSessionImpl.java:461)
	at org.hibernate.reactive.session.impl.ReactiveSessionImpl.buildReactiveQueryFromName(ReactiveSessionImpl.java:443)
	at org.hibernate.reactive.session.impl.ReactiveSessionImpl.createReactiveNamedQuery(ReactiveSessionImpl.java:430)
	at org.hibernate.reactive.mutiny.impl.MutinySessionImpl.createNamedQuery(MutinySessionImpl.java:222)
	at io.quarkus.hibernate.reactive.runtime.ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_ClientProxy.createNamedQuery(ReactiveSessionProducer_ProducerMethod_createMutinySession_1321d110ee9e92bda147899150401e0a136779c7_ClientProxy.zig:997)
	at com.example.FruitDao.get(FruitDao.java:20)
	at com.example.FruitDao_ClientProxy.get(FruitDao_ClientProxy.zig:217)
	at com.example.FruitResource.get(FruitResource.java:32)
	at com.example.FruitResource_ClientProxy.get(FruitResource_ClientProxy.zig:217)
	at com.example.FruitResource$quarkusrestinvoker$get_2420412d5b210b3632823e74ede45d605c6ff069.invoke(FruitResource$quarkusrestinvoker$get_2420412d5b210b3632823e74ede45d605c6ff069.zig:33)
	at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
	at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:7)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:117)
	at org.jboss.resteasy.reactive.server.handlers.RestInitialHandler.beginProcessing(RestInitialHandler.java:47)
	at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:17)
	at org.jboss.resteasy.reactive.server.vertx.ResteasyReactiveVertxHandler.handle(ResteasyReactiveVertxHandler.java:7)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1038)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:137)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:132)
	at io.quarkus.vertx.http.runtime.StaticResourcesRecorder.lambda$start$1(StaticResourcesRecorder.java:62)
	at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1038)
	at io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:101)
	at io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:132)
	at io.vertx.ext.web.handler.impl.StaticHandlerImpl.lambda$sendStatic$1(StaticHandlerImpl.java:206)
	at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:327)
	at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
	at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:829)

To Reproduce

Link to a small reproducer (preferably a Maven project if the issue is not Gradle-specific).

https://github.com/markusdlugi/hibernate-reactive-hot-swap

Steps to reproduce the behavior:

  1. Start PostgreSQL database using infrastructure/docker-compose.yml
  2. Start application in dev mode using mvn quarkus:dev
  3. Execute GET http://localhost:8080/fruits - should still work with HTTP 200
  4. Make a code change to trigger hot reload, e.g. by removing line 49 in com.example.FruitDao
  5. Execute GET http://localhost:8080/fruits again - application will reload in the background and fail with exception

Configuration

# Add your application.properties here, if applicable.
quarkus.datasource.db-kind=postgresql
quarkus.datasource.username=test
quarkus.datasource.password=test
quarkus.datasource.reactive.url=vertx-reactive:postgresql://localhost:5432/hibernate-reactive-hot-swap
quarkus.hibernate-orm.database.generation=create

Screenshots
(If applicable, add screenshots to help explain your problem.)

Environment (please complete the following information):

  • Output of uname -a or ver: Microsoft Windows [Version 10.0.17763.1757]
  • Output of java -version: OpenJDK 64-Bit Server VM Corretto-11.0.10.9.1 (build 11.0.10+9-LTS, mixed mode)
  • GraalVM version (if different from Java): N/A
  • Quarkus version or git rev: 1.11.3.Final
  • Build tool (ie. output of mvnw --version or gradlew --version): Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)

Additional context
For some reason, the issue only appears if both of the following conditions are met:

  • The application has a dependency to io.quarkus:quarkus-smallrye-health
  • The folder META-INF/resources exists (can be empty)

If either of these is removed, the issue just disappears. No idea why.

I don't think that these are the only conditions where this issue appears, most likely this can also happen under different circumstances, but those were the conditions where it failed in our application.

@markusdlugi markusdlugi added the kind/bug Something isn't working label Feb 18, 2021
@quarkus-bot quarkus-bot bot added area/hibernate-reactive Hibernate Reactive area/persistence OBSOLETE, DO NOT USE env/windows Impacts Windows machines labels Feb 18, 2021
@quarkus-bot
Copy link

quarkus-bot bot commented Feb 18, 2021

/cc @DavideD, @Sanne, @gavinking

@geoand
Copy link
Contributor

geoand commented Feb 18, 2021

Can you try this while having set quarkus.dev.instrumentation=false in application.properties or as system property when running mvn quarkus:dev please?

I tried your example and I can reproduce the issue when a structural change is made to a class (or some new class is added).
On the other hand, when Quarkus can use instrumentation to update changed classes (i.e. when no structural changes have been made), the problem does not manifest

@markusdlugi
Copy link
Contributor Author

Just tested setting that in both application.properties and as system property. Unfortunately it didn't seem to have any effect, issue still exists.

@geoand
Copy link
Contributor

geoand commented Feb 18, 2021

So this does indeed look like a Hibernate (Reactive) issue and not strictly a Quarkus issue

@markusdlugi
Copy link
Contributor Author

I think this might also be related to hibernate/hibernate-reactive#537, what do you think? @Sanne, @gavinking

@gsmet
Copy link
Member

gsmet commented Feb 19, 2021

Yeah I suspect one of the class is still loaded with the previous classloader. That would explain why you need as structural change to trigger it as the new instrumentation mode doesn’t change the class loader.

@markusdlugi
Copy link
Contributor Author

@gsmet, I checked in a heap dump and you are of course right - the problematic entity class is present twice, with each instance of the class being loaded with a different classloader. So seems like something in SmallRye Health leads to the old classloader leaking.

Now from my point of view, the question is: Would it make more sense to

  1. Fix the classloader leak in SR Health, which would fix this particular issue but wouldn't help us if a different component also leads to a classloader leak, or
  2. Should we adjust Hibernate to be able to handle entities even if they are loaded from different classloaders.

@gsmet
Copy link
Member

gsmet commented Mar 2, 2021

@DavideD any chance you could have a look to this one?

@DavideD
Copy link
Contributor

DavideD commented Mar 3, 2021 via email

@Sanne
Copy link
Member

Sanne commented Mar 3, 2021

Well while it's true that we might be able to workaround things in Hibernate, if SmallRye Health is leaking classloaders that needs to be fixed at higher priority.

@DavideD
Copy link
Contributor

DavideD commented Mar 3, 2021

When I try to run the reproducer, I have this error:

 java.util.NoSuchElementException: SRCFG00011: Could not expand value IFS+x in property BASH_FUNC__module_raw%%
        at io.smallrye.config.ExpressionConfigSourceInterceptor.lambda$getValue$0(ExpressionConfigSourceInterceptor.java:44)
        at io.smallrye.common.expression.ExpressionNode.emit(ExpressionNode.java:22)
        at io.smallrye.common.expression.CompositeNode.emit(CompositeNode.java:22)

@markusdlugi
Copy link
Contributor Author

@DavideD That's interesting, haven't seen that one myself. Quick search brought up #15269. Maybe updating the reproducer project to 1.12.0.Final as suggested there might help?

@DavideD
Copy link
Contributor

DavideD commented Mar 3, 2021

I've added the dependency to the a quickstart and I can recreate the exception

@DavideD
Copy link
Contributor

DavideD commented Mar 3, 2021

I gave a look at this issue and I'm not sure exactly what we can do about it from the Hibernate Reactive side.

The code we are using is similar to Hibernate ORM and the cause of the problem seems to be this variable in EntityType:

private transient Class returnedClass;

In ORM this variable is lazily initialised and during the hot reload, it gets set to null. When reading it the second time for the check, Hibernate ORM needs to re-initialise it and therefore, everything works.

For some reasons, this doesn't happen when using Hibernate Reactive. The variable still reference the old type and therefore keeps track of the old classloader.

I suppose we could read the returnedClass property each time, but it seems something is missing somewhere else.
I have no idea where to look for it.

@markusdlugi
Copy link
Contributor Author

@DavideD Thanks a lot for investigating!

I was under the assumption that SmallRye Health must be responsible for the leak because the exception isn't thrown when you remove the dependency to quarkus-smallrye-health. But from your debugging it sounds more like a Hibernate Reactive issue. Maybe you could verify if the behavior in Hibernate changes if you remove the dependency to SR Health? Or does the EntityType still contain a reference to the old class even when that dependency is not present?

@DavideD
Copy link
Contributor

DavideD commented Mar 4, 2021

The error only happens if I add the dependency to the classpath.

@DavideD
Copy link
Contributor

DavideD commented Mar 4, 2021

By the way, Hibernate Reactive is still using the same class EntityType available in ORM

@DavideD
Copy link
Contributor

DavideD commented Mar 4, 2021

The error only happens if I add the dependency to the classpath.

I think I was wrong about this... let me check again

@DavideD
Copy link
Contributor

DavideD commented Mar 4, 2021

Argh... it seems it happens even without the dependency.
Maybe someone else can double check this? It's possible to replicate the issue using a quickstart: https://github.com/quarkusio/quarkus-quickstarts/tree/master/hibernate-reactive-quickstart

Changing Fruit between two calls to the rest API.

@DavideD
Copy link
Contributor

DavideD commented Mar 4, 2021

Still...not sure what Hibernate Reactive should do differently

@markusdlugi
Copy link
Contributor Author

Thanks once again for checking.

I tried the reproducer project again and it seems like you are right, removing the dependency to quarkus-smallrye-health does not change the behavior. That's strange because I was fairly sure it did back when I created the issue, but seems like I was wrong. Sorry about that.

What still does affect the behavior is removing the META-INF/resources folder - once you remove that, the issue cannot be reproduced anymore, NamedQueries are working fine, no exceptions.

I also tried the quickstart you mentioned and can confirm that the issue is also reproducible there. Similarly, if you remove the META-INF/resources folder there, issue also disappears for the quickstart.

I had another look at some heap dumps, and actually noticed that the Fruit entity being present twice is true both in the failing case and in the working case (so without META-INF/resources). So the classloader leak does not seem to be the main root cause of the exception - Quarkus is leaking the classloaders in any case (mainly due to the contextClassLoaders of the Vert.x threads, as it seems). But in one case, Hibernate is able to handle that, while in the other case, for some reason, it isn't. I still haven't found a reason why Hibernate behaves differently in the two cases.

@markusdlugi
Copy link
Contributor Author

I did some more debugging.

I found out that Hibernate will always try to look up the entity class at EntityType.determineAssociatedEntityClass() because in fact the returnedClass attribute which @DavideD mentioned was always null after the hot reload, because a new instance of ManyToOneType (subclass of EntityType) is correctly created during hot reload.

When trying to look up the class, Hibernate will use Thread.currentThread().getContextClassLoader() (implemented in ReflectHelper.classForName()). It just so happens that for some reason, the current thread uses the same contextClassLoader in the good case as is used inside the HQLQueryPlan as return type of the NamedQuery. In the bad case (when META-INF/resources is present), the class loaders differ and everything goes to hell.

I still have no clue why it just magically works when META-INF/resources is not present and how that influences the threads and class loaders, but I don't think it matters because that doesn't seem to be something that we can rely on. I think we have to approach this issue from another angle.

During debugging, I noticed that the Vert.x eventloop threads are not recreated in dev mode when a hot reload (with new classloader) is performed. Therefore, the old threads are still used which still reference the old contextClassLoader. Maybe we should instead ensure that all old threads are stopped and new ones are created. They would then use the new class loader and we most likely wouldn't run into this issue in the first place because we start from a completely clean plate.

TL;DR: Instead of trying to understand what exactly differs in good case and bad case, change Quarkus dev mode to create new eventloop threads when a structural change (with new classloader) is performed.

@geoand, @gsmet, what do you think?

@geoand
Copy link
Contributor

geoand commented Mar 9, 2021

TL;DR: Instead of trying to understand what exactly differs in good case and bad case, change Quarkus dev mode to create new eventloop threads when a structural change (with new classloader) is performed.

@geoand, @gsmet, what do you think?

That will likely cause massive problems.
Although I haven't looked at any of the specifics of this case, maybe something like https://github.com/quarkusio/quarkus/blob/master/extensions/vertx-http/runtime/src/main/java/io/quarkus/vertx/http/runtime/VertxHttpRecorder.java#L345
or https://github.com/quarkusio/quarkus/blob/master/extensions/scheduler/runtime/src/main/java/io/quarkus/scheduler/runtime/devconsole/SchedulerDevConsoleRecorder.java#L25
would help?
What those pieces of code do is grab the initial TCCL and reset it when they actually execute the Vert.x handler

@markusdlugi
Copy link
Contributor Author

Okay, so if we can't recreate the threads in dev mode during hot reload, what about at least resetting the TCCL of all Vert.x threads on reload? I don't think the approach you mentioned of temporarily setting the TCCL will help us here, because the TCCL is wrong to begin with after hot reload, since it is still pointing to an old classloader that shouldn't be used anymore.

Can you maybe point me to the part of the code where the new classloader is created during reload? Then maybe I could try and test a possible solution.

@geoand
Copy link
Contributor

geoand commented Mar 16, 2021

Okay, so if we can't recreate the threads in dev mode during hot reload, what about at least resetting the TCCL of all Vert.x threads on reload? I don't think the approach you mentioned of temporarily setting the TCCL will help us here, because the TCCL is wrong to begin with after hot reload, since it is still pointing to an old classloader that shouldn't be used anymore.

It's not wrong the first time is it?

@markusdlugi
Copy link
Contributor Author

No, at first application start it's correct of course. But after the reload the TCCL still seems to point to the old classloaders, which leads to these issues with Hibernate (and potentially other issues?).

@geoand
Copy link
Contributor

geoand commented Mar 16, 2021

But that is exactly the issue that the code above works around, isn't it?

@markusdlugi
Copy link
Contributor Author

Yes, but I'd argue that in the case of this issue, you can't easily set the TCCL just for that use case and then reset it afterwards as it is done in your examples, because

  1. you would have to intercept the entire application code this way, since the Hibernate call doing the class lookup is inside the normal application context
  2. you would have to intercept all kinds of entrypoints to the application code (REST, Scheduler, Messaging, ...)
  3. you most likely won't be able to get a reference to the correct TCCL at that point in time.

Therefore I think it might be easier to actually change the TCCL of all the threads in dev mode when the application is reloaded.

I'm of course not an expert on the architecture of Quarkus, so that's just what I think would be the most sensible option from an outsider's perspective. I would be happy to be proven wrong 😄 Just trying to move this issue forward.

@geoand
Copy link
Contributor

geoand commented Mar 16, 2021

Therefore I think it might be easier to actually change the TCCL of all the threads in dev mode when the application is reloaded.

That's understandable, but the potential impact of such a change is pretty big. I'd like to avoid that.

@geoand
Copy link
Contributor

geoand commented Mar 16, 2021

Also, I wanted to clarify that we can't fix the issue in Quarkus itself.
The problem stems from the fact that Vert.x hangs on to original TCCL of the thread and resets it's quite often.
This leads the us having to implement ugly hacks like #15779

@markusdlugi
Copy link
Contributor Author

@geoand Wow, that's funny. You actually just fixed this issue with #15779 😄

I had a look at your changes in that PR and got suspicious because here the behavior was similar, because it only failed if some static resource was present in META-INF/resources, but worked otherwise. And since you explicitly set the TCCL in your PR in case there are static resources, I figured this might actually have an impact on this issue as well. And sure enough, after making those same changes locally and rerunning the reproducer, the NamedQuery now works reliably even after a hot reload! 👍

I guess that wasn't your intention (or maybe it was? 😉), but thanks a lot!

@geoand
Copy link
Contributor

geoand commented Mar 17, 2021

I'm glad it worked!

To be honest, I was kinda hoping it would fix this issue as well, but I had no idea :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hibernate-reactive Hibernate Reactive area/persistence OBSOLETE, DO NOT USE env/windows Impacts Windows machines kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants