Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Persistence units" sometimes hangs/is very slow to load in dev ui for trivial app #39930

Open
holly-cummins opened this issue Apr 7, 2024 · 20 comments
Labels
area/dev-ui area/hibernate-orm Hibernate ORM kind/bug Something isn't working

Comments

@holly-cummins
Copy link
Contributor

Describe the bug

I've noticed that sometimes when walking through a simple Quarkus demo (the same one in https://quarkus.io/guides/getting-started-dev-services), when I try and show how the persistence units can be read off the dev ui, I just get a Loading... page on http://localhost:8080/q/dev-ui/io.quarkus.quarkus-hibernate-orm/persistence-units.

Expected behavior

We expect something like this, of course.
image

Actual behavior

http://localhost:8080/q/dev-ui/io.quarkus.quarkus-hibernate-orm/persistence-units just shows Loading ...

This is an intermittent issue (sorry!). When I tried to reproduce just now, I couldn't get Loading, but I could get a blank page if I changed my persistent units and then reloaded the persistence units url:

image

How to Reproduce?

  1. Find a large audience. While I can reproduce locally, it reproduces most reliably in a live demo. This is a good example of "audience-driven testing." :)
  2. Follow the instructions at https://github.com/holly-cummins/simple-quarkus-demo

As mentioned above, this is an intermittent issue, so I don't know exactly why it happens. It's about 50% of the time for me, but of course when I was raising this issue and trying to get a screencap, it was about 0%.

Output of uname -a or ver

No response

Output of java -version

No response

Quarkus version or git rev

3.9.2, seen on 3.9.0 and probably earlier

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@holly-cummins holly-cummins added the kind/bug Something isn't working label Apr 7, 2024
@quarkus-bot quarkus-bot bot added area/dev-ui area/persistence OBSOLETE, DO NOT USE labels Apr 7, 2024
Copy link

quarkus-bot bot commented Apr 7, 2024

/cc @cescoffier (devui), @phillip-kruger (devui)

@phillip-kruger
Copy link
Member

phillip-kruger commented Apr 7, 2024

It could be that the websocket is not connected yet, check the bottom left corner. If your browser was open but the server was down, the timeout for reconnect happens, and then it can take a while to reconnect. I am not sure how to fix this. We want to do the auto-reconnect (that is nice for hot-reload) but maybe we should stop trying after a certain amount / time and just show a pop-up that will retry ? w.d.y.t ?

@holly-cummins
Copy link
Contributor Author

I think it probably is related to a restart in some way; the dev ui is working, but then I add the import.sql and that forces a restart.

I'd be cautious about displaying a pop-up, though. This behaviour is in the Should Not Happen category, and if we display a pop-up when it happens, it might just make the problem more obvious, without actually helping much. A pop-up would be extra-bad if it happens as often for others as it seems to happen to me; the dev ui would just be flashing up pop-ups all the time, and looking pretty dysfunctional. Whereas probably some of the time, users might not even notice, or might think it's just them.

I'd be more inclined to do what we can to pretend it didn't happen and do a sneaky recovery. :)

@phillip-kruger
Copy link
Member

I'll have a look and see if I can recreate it

@phillip-kruger
Copy link
Member

I did not manage to recreate the issue you described. If you get that again, please can you see if there is any errors in the browser's console log ?

I had a look at the code and there everything look fine. The loading is basically displayed until the persistence units are loaded from the json-rpc backend. (That should be fairly instant)

So what I thought the issues could be in my previous response is not correct. This seems to be an issue specifically to this extension. When I shut down the server and then navigate to the page I can see the loading, but as soon as the server is back up, the page display, even without refreshing.

So to fix this we need to be able to recreate it so that we can look in the console for errors . Please post if you have something or have a way to definitely recreate the issue. Thanks :)

@holly-cummins
Copy link
Contributor Author

Understood about no quick fix. Intermittent issues are the worst!

I often see the problem when I'm doing a live demo (see "audience-driven testing" in the reproducer steps) ... and in those cases, it's not ideal to pause and divert to gathering diagnostics. :)
But I also sometimes see it when I'm practicing a demo, so if I catch it I'll gather console.log. (I wish I'd thought of that when I saw it last time, doh!)

@ITrium-Salah
Copy link

ITrium-Salah commented Apr 30, 2024

``I was actually facing the same issue, depending of the project sometime i can't access the persistent unit info. Sometime the bug appear 100% for my programming session and it is very annoying (I depend so much on the ddl produced by this tool)
Being exposed to this bug 100% of the time on certain sessions/projects I was able to analyze the json-rpc exchanges

imagedevui

We saw clearly that the "io.quarkus.quarkus-hibernate-orm.getInfo" json-rpc call is never answered by the server.
We also see that the browser send all the grpc call in double for the hibernate-orm extension (the server responds to requests in this case) except for the getInfo.

Actually the problem come from the ReactiveImprovedExtractionContextImp.getQueryResultSet and the join call that never return.

The bug also show (but in a secondary manner) a DeadLock on the HibernateOrmDevInfo$PersistenceUnit, this class have 3 methode synchronized on the instance that called external supplier for the ddl generation.
it is a bad choice to export a synchronization on instance or class to functions or Suplier over which we do not have control

In this class:

public static class PersistenceUnit {

    private final String name;
    private final List<Entity> managedEntities;
    private final List<Query> namedQueries;
    private final List<Query> namedNativeQueries;
    private String createDDL;
    private String dropDDL;
    private String updateDDL;
    private final Supplier<String> createDDLSupplier;
    private final Supplier<String> dropDDLSupplier;
    private final Supplier<String> updateDDLSupplier;

    public PersistenceUnit(String name, List<Entity> managedEntities,
            List<Query> namedQueries,
            List<Query> namedNativeQueries, Supplier<String> createDDL, Supplier<String> dropDDL,
            Supplier<String> updateDDLSupplier) {
        this.name = name;
        this.managedEntities = managedEntities;
        this.namedQueries = namedQueries;
        this.namedNativeQueries = namedNativeQueries;
        this.createDDLSupplier = createDDL;
        this.dropDDLSupplier = dropDDL;
        this.updateDDLSupplier = updateDDLSupplier;
    }

    public String getName() {
        return name;
    }

    public List<Entity> getManagedEntities() {
        return managedEntities;
    }

    public List<Query> getNamedQueries() {
        return namedQueries;
    }

    public List<Query> getNamedNativeQueries() {
        return namedNativeQueries;
    }

    public List<Query> getAllNamedQueries() {
        ArrayList<Query> allQueries = new ArrayList<>();
        allQueries.addAll(namedQueries);
        allQueries.addAll(namedNativeQueries);
        return allQueries;
    }

    public synchronized String getCreateDDL() {
        if (createDDL == null) {
            createDDL = createDDLSupplier.get();
        }
        return createDDL;
    }

    public synchronized String getDropDDL() {
        if (dropDDL == null) {
            dropDDL = dropDDLSupplier.get();
        }
        return dropDDL;
    }

    public synchronized String getUpdateDDL() {
        if (updateDDL == null) {
            updateDDL = updateDDLSupplier.get();
        }
        return updateDDL;
    }

}

we can, at least, replace the 3 getters by:

    public String getCreateDDL() {
        synchronized(createDDLSupplier){
            if (createDDL == null) {
                createDDL = createDDLSupplier.get();
            }
            return createDDL;
      }
    }

    public String getDropDDL() {
        synchronized(dropDDLSupplier){
            if (dropDDL == null) {
                dropDDL = dropDDLSupplier.get();
            }
            return dropDDL;
       }
    }

    public String getUpdateDDL() {
        synchronized(updateDDLSupplier){
            if (updateDDL == null) {
                updateDDL = updateDDLSupplier.get();
            }
            return updateDDL;
       }
    }

With this change the getDDL are not blocking each other if one of the supplier is stuck or have a long process to complete.

It seem that the root problem of the bug is comming from the Pooled connections or the reactive way to access/release connection.
The bug appear when the getUpdateDDL is called (the more complex generation that will acquire the moste connecions on the pool)
To reproduce the bug:

  1. Setup: Hibernate ORM reactive, Pg-reactive-client, Quarkus-Rest

  2. Add a REST route to avoid the need of the quarkus webui

       @GET
       @Path("update")
       public String update()
       {
           for (HibernateOrmDevInfo.PersistenceUnit persistenceUnit : HibernateOrmDevController.get().getInfo().getPersistenceUnits()) {
               return persistenceUnit.getUpdateDDL();
           }
           return "";
       }
    
  3. Add a "stress" to your database connections
    a. By having the postgresql number of connections close to the default max connections
    b. by forcing the maximum number of connections of the PgPool : 'quarkus.datasource.reactive.max-size=3'

  4. Call the endpoint, the call will be stuck at ReactiveImprovedExtractionContextImp.getQueryResultSet join call

If this bug is related to the PgPool connections and not at all with the DevInfo (the query are sent outside a transaction and maybe the bug is confined in this logic only) maybe it can explain some beahvior that i see with my applications randomly.

I hope this "analyze" will help.

Workaround:

  1. Increase the number of connections on the DB (postgresql max-connections)
  2. a. Start your program without db connection
    b. call the persistent-unit info in the devui (this will work without db connection)
    c. restart your database, the dev-ui will use the cached HibernateDevInfo data

@phillip-kruger
Copy link
Member

This is probably something that the hibernate-orm maintainers should look at.

/cc @yrodiere @gsmet

@ITrium-Salah
Copy link

This is probably something that the hibernate-orm maintainers should look at.

/cc @yrodiere @gsmet

I can add the thread dump if it can help? (we see on the dump the deadlock DevInfo problem and the hibernate join problem)

@yrodiere
Copy link
Member

Thanks for your analysis @ITrium-Salah. I understand that you have potential improvements in mind; they make a lot of sense, so feel free to send a pull request!

Do I understand correctly that the root of the issue is still unclear?

I doubt the bug is in PgPool, since @holly-cummins has been experiencing it with blocking Hibernate ORM, which uses a different connection pool. At this point I'd focus investigations on the Dev UI or Hibernate ORM.

This is probably something that the hibernate-orm maintainers should look at.
/cc @yrodiere @gsmet

I can add the thread dump if it can help? (we see on the dump the deadlock DevInfo problem and the hibernate join problem)

Please do, I'll take any source of information in case I can't reproduce the problem.

@ITrium-Salah
Copy link

Thanks for your analysis @ITrium-Salah. I understand that you have potential improvements in mind; they make a lot of sense, so feel free to send a pull request!

Do I understand correctly that the root of the issue is still unclear?

I doubt the bug is in PgPool, since @holly-cummins has been experiencing it with blocking Hibernate ORM, which uses a different connection pool. At this point I'd focus investigations on the Dev UI or Hibernate ORM.

This is probably something that the hibernate-orm maintainers should look at.
/cc @yrodiere @gsmet

I can add the thread dump if it can help? (we see on the dump the deadlock DevInfo problem and the hibernate join problem)

Please do, I'll take any source of information in case I can't reproduce the problem.

Yes the root cause is still unknown for me, the only think that make me think about a problem with connections pool is the fact that i can make this bug systematic with quarkus.datasource.reactive.max-size=3 to force the connection pool to be bounded.

For the improvement of devinfo yes it seems to me a shame to treat these tasks sequentially even if they are cached afterwards. I also wonder if other parts of the codes use instance or class synchronization. In my experience, I was confronted with some very efficient competitive code which ended up being sequential because of a call to a function whose synchronization was too brutal (synchronized this or synchronized class).
I'm ok to create a pull request, do you have a guide, or a "how to", to help the new contributors?

Threads Dump with deadlock:

"executor-thread-7@17903" tid=0xda nid=NA waiting
  java.lang.Thread.State: WAITING
	 blocks executor-thread-11@18604
	 blocks executor-thread-16@18611
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
	  at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1864)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
	  at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1898)
	  at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2117)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResultSet(ReactiveImprovedExtractionContextImpl.java:113)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:79)
	  at org.hibernate.reactive.provider.service.AbstractReactiveInformationSchemaBasedExtractorImpl.processColumnsResultSet(AbstractReactiveInformationSchemaBasedExtractorImpl.java:291)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.addColumns(AbstractInformationExtractorImpl.java:893)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.extractTableInformation(AbstractInformationExtractorImpl.java:855)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.lambda'$5(AbstractInformationExtractorImpl.java:809)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl$$Lambda/0x00000288b5ec7678.process(Unknown Source:-1)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:80)
	  at org.hibernate.reactive.provider.service.AbstractReactiveInformationSchemaBasedExtractorImpl.processTableResultSet(AbstractReactiveInformationSchemaBasedExtractorImpl.java:236)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.locateTableInNamespace(AbstractInformationExtractorImpl.java:804)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.getTable(AbstractInformationExtractorImpl.java:355)
	  at org.hibernate.tool.schema.extract.internal.DatabaseInformationImpl.getTableInformation(DatabaseInformationImpl.java:113)
	  at org.hibernate.tool.schema.extract.internal.DatabaseInformationImpl.locateTableInformation(DatabaseInformationImpl.java:151)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.lambda$getForeignKeys$9(AbstractInformationExtractorImpl.java:1313)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl$$Lambda/0x00000288b5ec7020.process(Unknown Source:-1)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:80)
	  at org.hibernate.reactive.provider.service.PostgreSqlReactiveInformationExtractorImpl.processImportedKeysResultSet(PostgreSqlReactiveInformationExtractorImpl.java:139)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.getForeignKeys(AbstractInformationExtractorImpl.java:1292)
	  at org.hibernate.tool.schema.extract.internal.TableInformationImpl.foreignKeys(TableInformationImpl.java:97)
	  at org.hibernate.tool.schema.extract.internal.TableInformationImpl.getForeignKeys(TableInformationImpl.java:91)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.equivalentForeignKeyExistsInDatabase(AbstractSchemaMigrator.java:483)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.checkForExistingForeignKey(AbstractSchemaMigrator.java:475)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.applyForeignKeys(AbstractSchemaMigrator.java:442)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.performMigration(AbstractSchemaMigrator.java:268)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.doMigration(AbstractSchemaMigrator.java:117)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController.generateDDL(HibernateOrmDevController.java:161)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController$DDLSupplier.get(HibernateOrmDevController.java:104)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController$DDLSupplier.get(HibernateOrmDevController.java:88)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit.getUpdateDDL(HibernateOrmDevInfo.java:107)
	  - locked <0x4910> (a io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit)
	  at java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder:-1)
	  at java.lang.invoke.LambdaForm$MH/0x00000288b500c000.invoke(LambdaForm$MH:-1)
	  at java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder:-1)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:153)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	  at java.lang.reflect.Method.invoke(Method.java:580)
	  at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:688)
	  at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:770)

"executor-thread-11@18604" tid=0xde nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 waiting for executor-thread-7@17903 to release lock on <0x4910> (a io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit.getCreateDDL(HibernateOrmDevInfo.java:92)
	  at java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder:-1)
	  at java.lang.invoke.LambdaForm$MH/0x00000288b500c000.invoke(LambdaForm$MH:-1)
	  at java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder:-1)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:153)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	  at java.lang.reflect.Method.invoke(Method.java:580)
	  at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:688)

"executor-thread-16@18611" tid=0xe3 nid=NA waiting for monitor entry
  java.lang.Thread.State: BLOCKED
	 waiting for executor-thread-7@17903 to release lock on <0x4910> (a io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit.getCreateDDL(HibernateOrmDevInfo.java:92)
	  at java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(DirectMethodHandle$Holder:-1)
	  at java.lang.invoke.LambdaForm$MH/0x00000288b500c000.invoke(LambdaForm$MH:-1)
	  at java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder:-1)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:153)
	  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	  at java.lang.reflect.Method.invoke(Method.java:580)
	  at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:688)

Thread dump without the deadlock:

"executor-thread-4@17797" tid=0xce nid=NA waiting
  java.lang.Thread.State: WAITING
	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
	  at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1864)
	  at java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
	  at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
	  at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1898)
	  at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2117)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResultSet(ReactiveImprovedExtractionContextImpl.java:113)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:79)
	  at org.hibernate.reactive.provider.service.AbstractReactiveInformationSchemaBasedExtractorImpl.processColumnsResultSet(AbstractReactiveInformationSchemaBasedExtractorImpl.java:291)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.addColumns(AbstractInformationExtractorImpl.java:893)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.extractTableInformation(AbstractInformationExtractorImpl.java:855)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.lambda$locateTableInNamespace$5(AbstractInformationExtractorImpl.java:809)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl$$Lambda/0x000001d28ee8ce78.process(Unknown Source:-1)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:80)
	  at org.hibernate.reactive.provider.service.AbstractReactiveInformationSchemaBasedExtractorImpl.processTableResultSet(AbstractReactiveInformationSchemaBasedExtractorImpl.java:236)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.locateTableInNamespace(AbstractInformationExtractorImpl.java:804)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.getTable(AbstractInformationExtractorImpl.java:355)
	  at org.hibernate.tool.schema.extract.internal.DatabaseInformationImpl.getTableInformation(DatabaseInformationImpl.java:113)
	  at org.hibernate.tool.schema.extract.internal.DatabaseInformationImpl.locateTableInformation(DatabaseInformationImpl.java:151)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.lambda$getForeignKeys$9(AbstractInformationExtractorImpl.java:1313)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl$$Lambda/0x000001d28ee8c820.process(Unknown Source:-1)
	  at org.hibernate.reactive.provider.service.ReactiveImprovedExtractionContextImpl.getQueryResults(ReactiveImprovedExtractionContextImpl.java:80)
	  at org.hibernate.reactive.provider.service.PostgreSqlReactiveInformationExtractorImpl.processImportedKeysResultSet(PostgreSqlReactiveInformationExtractorImpl.java:139)
	  at org.hibernate.tool.schema.extract.internal.AbstractInformationExtractorImpl.getForeignKeys(AbstractInformationExtractorImpl.java:1292)
	  at org.hibernate.tool.schema.extract.internal.TableInformationImpl.foreignKeys(TableInformationImpl.java:97)
	  at org.hibernate.tool.schema.extract.internal.TableInformationImpl.getForeignKeys(TableInformationImpl.java:91)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.equivalentForeignKeyExistsInDatabase(AbstractSchemaMigrator.java:483)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.checkForExistingForeignKey(AbstractSchemaMigrator.java:475)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.applyForeignKeys(AbstractSchemaMigrator.java:442)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.performMigration(AbstractSchemaMigrator.java:268)
	  at org.hibernate.tool.schema.internal.AbstractSchemaMigrator.doMigration(AbstractSchemaMigrator.java:117)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController.generateDDL(HibernateOrmDevController.java:161)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController$DDLSupplier.get(HibernateOrmDevController.java:104)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevController$DDLSupplier.get(HibernateOrmDevController.java:88)
	  at io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit.getUpdateDDL(HibernateOrmDevInfo.java:107)
	  - locked <0x458f> (a io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo$PersistenceUnit)

@yrodiere
Copy link
Member

Thanks for the info, I'll try to find time to have a closer look.

I'm ok to create a pull request, do you have a guide, or a "how to", to help the new contributors?

Yes: https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md
It looks long but actually only the first few sections (up to and including "Before you contribute") really need a read, the other ones are only there if you need them.

@yrodiere
Copy link
Member

yrodiere commented May 2, 2024

@ITrium-Salah Unless I'm mistaken, your thread dump does not demonstrate a deadlock, but simply a connection starvation... ?

Now of course, if generating a single update script involves multiple concurrent connections, in extreme cases it's possible that multiple concurrent calls to getUpdateDDL (or even a single call) would acquire the whole connection pool and be left waiting for a new connection, possibly deadlocked.

That being said, @holly-cummins is telling us this happens with a very simple demo, so I assume there's a single persistence unit, and most likely the connection pool size is the default of 20, which should be more than enough to avoid such a problem...

Some notes in case this really is the problem:

  • We could mitigate it by making the locks in io.quarkus.hibernate.orm.runtime.dev.HibernateOrmDevInfo.PersistenceUnit more coarse-grained, i.e. share the same lock across persistence units, to prevent them from generating their update script concurrently and therefore ensure each generation will have access to enough concurrent connections.
  • We could mitigate it by making sure the UI is more "lazy" and only retrieves this part of the information for a given PU when the relevant tab gets displayed.
  • We could fix it by avoiding the use of multiple concurrent transactions in schema update script generation in Hibernate ORM and/or Hibernate Reactive.
    For Hibernate ORM I don't see an obvious place where this happens; in fact we're taking care to use a single connection all along, so I doubt this is the problem.
    For Hibernate Reactive, weirdly it seems to acquire a connection to then immediately ignore it and acquire a second one from the pool in selectJdbcOutsideTransaction. That's probably a bug @DavideD , isn't it? At the very least I'd expect the first connection not to be acquired.

Anyway, I had a look, and as expected, I can't reproduce the problem (unless I force a connection starvation) with Holly's reproducer, and as expected, guesswork is not leading anywhere:

  • as mentioned above, the connection starvation is unlikely in Holly's case, and in any case I couldn't find a place in Hibernate ORM where we'd use multiple connections concurrently during DDL generation.
  • I could imagine a connection starvation caused by a connection leak, but... where would that leak come from?

If anyone encounters this again, please take a thread dump! Especially if using Hibernate ORM. Hibernate Reactive is nice and all, but thread dumps of reactive applications miss a lot of information :/

@holly-cummins, any talk in front of 400 people planned in the near future? :-)

@ITrium-Salah
Copy link

The deadlock is NOT the cause of the bug but just an observation when i was investigating on it.
The thread dump is clear on the dead lock

"executor-thread-7@17903" tid=0xda nid=NA waiting
  java.lang.Thread.State: WAITING
	 blocks executor-thread-11@18604
	 blocks executor-thread-16@18611

The thread 7 is blocking thread 11 and 16 and the bug causing an infinte wait the thread 7 will never release is lock resource.

My remark on the PersitenceUnit code of devmode, was simply to note a synchronization that was much too broad with regard to the functional need for synchronization.
This is a language difficulty that often misleads developers, when writing:

class A{
  public synchronized void methodA(){}
  public synchronized void methodB(){}
  public static synchronized void methodStaticA(){}
  public static synchronized void methodStaticB(){}
}

Developers think that:

  1. If two threads call methodA on the same instance the execution will be sequential

  2. If two threads call methodB on the same instance the execution will be sequential

  3. If two threads call methodStaticA the execution will be sequential

  4. If two threads call methodStaticB the execution will be sequential
    Even Some developpers think:

  5. If two threads call methodA on two different instances the execution will be sequential

point n°5 is of course false, and 1,2,3,4 true but partially.
If one thread call methodA and another one call methodB the execution will be also sequential (same for static methods), the keyword on the method suggests that only the method is synchronized and this is where misunderstandings and underperformance code can appear.
My example code is actually equivalent to:

class A{
  public void methodA(){ 
        synchronized(this){}
    }
  public void methodB(){ 
        synchronized(this){}
    }
  public static void methodStaticA(){ 
        synchronized(A.class){}
    }
  public static void methodStaticB(){
        synchronized(A.class){}
    }
}

We see very clearly with this writing the cost of the synchronized keyword on the methods that I personally avoid using, especially in object-oriented programming where it is difficult to know what the parent or child classes do with the object "this" or "Object.class"

that said my proposed modification of the HibernateOrmDevInfo$PersistenceUnit was simply to remove the excess synchronization and the bug is not related with this. (i will try to find time for the pull request)

@yrodiere I don't understand why executing an SQL query would require a new connection if others are available. If no new connection is available the thread should be blocked while the current requests are completed and then resume execution? I don't know how hibernate is implemented but it would seem logical to me that it works like this being able to handle a large number of requests.

I have created a projet https://github.com/ITrium-Salah/quarkus-orm-issue-39930
just run mvn quarkus:dev and curl http://127.0.0.1:8080/test
The curl command will be blocked, to force the bug i have set the reactive max size to 1 connection.

NB: My english is bad, i'm sorry by advance for this long post but i want to be sure to be as much clear i can.

@yrodiere
Copy link
Member

yrodiere commented May 15, 2024

@yrodiere I don't understand why executing an SQL query would require a new connection if others are available

It happens in situations where Hibernate ORM needs to run SQL outside of the current transaction, and cannot or has no way to suspend the current transaction. Then the only solution is to use a separate transaction. Don't ask me why it's necessary, but it happens -- probably because of quirks specific to some databases or JDBC drivers. See https://github.com/yrodiere/hibernate-reactive/blob/aecefba2ddb08ca5c2d609c9ad2a179f49d7d6c0/hibernate-reactive-core/src/main/java/org/hibernate/reactive/pool/ReactiveConnection.java#L78, https://github.com/hibernate/hibernate-orm/blob/81a3541d2624d546efac71c75d8e0335bc1b7ee3/hibernate-core/src/main/java/org/hibernate/resource/transaction/spi/IsolationDelegate.java#L14.
Like I said though, nowadays I don't think Hibernate ORM (non-reactive) does this for schema generation.

Now I certainly agree that looks like a bug:

For Hibernate Reactive, weirdly it seems to acquire a connection to then immediately ignore it and acquire a second one from the pool in selectJdbcOutsideTransaction. That's probably a bug @DavideD , isn't it? At the very least I'd expect the first connection not to be acquired.

I've reported it as a bug: hibernate/hibernate-reactive#1909

But that's limited to Hibernate Reactive, so not the bug @holly-cummins was experiencing.

I'll see if I can reproduce the same problem by changing your reproducer to use Hibernate ORM.

@yrodiere
Copy link
Member

I'll see if I can reproduce the same problem by changing your reproducer to use Hibernate ORM.

I can't. With quarkus.datasource.jdbc.max-size=1 and the same reproducer, Hibernate ORM performs just fine. I also tried disabling the updateDDL cache and running thousands of requests in parallel, same result.

So, while the problem @ITrium-Salah is experiencing can be explained by hibernate/hibernate-reactive#1909, we still lack a reproducer for Holly's problem with Hibernate ORM, which is different.

@holly-cummins
Copy link
Contributor Author

I just experienced this again ... with an audience. I tried it in my hotel room later, and the persistence units view worked fine.

@holly-cummins
Copy link
Contributor Author

I think it was in the same session, I visited the dev services view and got a "You do not have any Dev Services running." message. I definitely did have a dev service running (I'd just been showing it off!). I'm not sure if the same connection issue could cause both glitches, or if my tech was just cursed today.

@phillip-kruger
Copy link
Member

@holly-cummins if this is due to dev-ui's websocket not connected, this PR should make this better in the future: #43841

@holly-cummins
Copy link
Contributor Author

Fingers crossed. I guess next time it happens I should try a different window or browser, too. There's no way I can do a debug sequence in front of an audience, but discreetly popping across to another window might be do-able. If that sorts it out, it gives us useful diagnostic information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev-ui area/hibernate-orm Hibernate ORM kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants