Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow fetching of lazy relations outside of its original transaction #38916

Open
ymajoros opened this issue Feb 21, 2024 · 12 comments
Open

allow fetching of lazy relations outside of its original transaction #38916

ymajoros opened this issue Feb 21, 2024 · 12 comments
Labels
area/hibernate-orm Hibernate ORM kind/enhancement New feature or request

Comments

@ymajoros
Copy link

ymajoros commented Feb 21, 2024

Description

As a follow-up to #38807, I think it would be very useful to be able to access lazy relations of detached entities:

  • Spring has "Open session in view".
  • Wildfly seems to have a similar mechanism by default (our application is migrating from Wildfly and fetching relations from detached entities somehow wasn't an issue).
  • Eclipselink (JPA RI) also allows to access lazy properties of detached entities by default (not even sure you can disable that).
  • enable_lazy_load_no_trans is mentioned in multiple sources, although I tried to activate it with quarkus.hibernate-orm.unsupported-properties."hibernate.enable_lazy_load_no_trans"=true, which is actually being read, but I couldn't find any usage besides configuration in Hibernate 6.4.
  • Some consider it an antipattern, others have a more nuanced perspective.

Example use case for which it would be practically needed: chained relations, e.g. a Statement has a previousStatement; you want to be able to fetch the previous statement, you'll maybe need the next one afterwards etc. Performance is often mentioned in the CONs, but you would end up fetching the objects one by one when needed anyway, detached entity or not, and an open-in-view mechanism wouldn't make any difference there. On the contrary, it will in some situations be more performant: the relation will only be queried from the database if/when needed instead of being fetched in all cases, eagerly or from a projection.

I suggest implementing a similar feature in Quarkus.

Implementation ideas

No response

@yrodiere
Copy link
Member

Hey @beikov , could you please share your opinion on the right solution for this using Hibernate ORM? I'd definitely be interested in hearing it.

@beikov
Copy link
Contributor

beikov commented Feb 21, 2024

I think the proper solution is to always think about the what data you really need and model this with DTOs to avoid any sort of unpredictable lazy loading.
In this particular case, you seem to need some sort of recursive fetching, so I would suggest you to make use of recursive CTEs to model the data loading e.g.

with statements as (
  select cast(:id) as id
  union all
  select s.previousStatement.id as id from Statement s join statements r on s.id = r.id
)
select s from Statement s where s.id in (select r.id from statements r)

This way, you will select all statements that are reachable via previousStatement associations, starting at the statement with the id as given through the :id parameter.

Ideally, we'd offer a recursive fetch join to model this, but we don't have that yet.

@yrodiere yrodiere added area/persistence OBSOLETE, DO NOT USE area/hibernate-orm Hibernate ORM labels Feb 21, 2024
@ymajoros
Copy link
Author

ymajoros commented Feb 21, 2024

@beikov thanks for your feedback. In my use case:

  • most of the time we don't need the previousStatement
  • ... but some flows do
  • ... and sometimes we need to dig further

As getting a statement is a O(1) operation, but fetching the whole chain is O(n), I do not want to fetch them eagerly or with any kind of projection or recursive fetch. Fetching lazily means that I have a best case of O(1) and a worst case of O(f) (f meaning a few here). The proposed alternative is O(n) best case, making it perform much worse (n can be big here).

There are a lot of other valid use cases. Like the 'debate' about open-session-in-view for spring, there will always be PROs and CONs, and I suggest to not go too deep into this debate here but let users make conscious decisions about their use cases.

Therefore, such a functionality is useful for some users.

@beikov
Copy link
Contributor

beikov commented Feb 21, 2024

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs. Any sort of unpredictable lazy initialization will lead to pain, so it's best to avoid it in the first place.
Open-session-in-view or enabling the hibernate.enable_lazy_load_no_trans setting are both recipes for disaster.

but fetching the whole chain is O(n)

Talking about O(n) here is a bit confusing, because the expensive part really is the separate select statement execution, not the lookup in the database itself. Using a recursive fetch like I showed you only does one database request.

There are a lot of other valid use cases. Like the 'debate' about open-session-in-view for spring, there will always be PROs and CONs, and I suggest to not go too deep into this debate here but let users make conscious decisions about their use cases.

The only PRO I see is that you can turn off your brain and not think about the data access design, but that will bite you very fast when you run into performance issues, because changing the design later is painful. So I don't really view that as a PRO, but rather as a CON in disguise. Modelling your data fetching requirements is IMO not something that should be an after-thought just like performance testing shouldn't be, because at some point there will be that one user with hundreds of chained Statement associations who will bring down your whole application.

Therefore, such a functionality is useful for some users.

If you really want this broken behavior, then let the transaction span across the whole request by annotating your HTTP endpoint method with @Transactional. Nobody is preventing you from doing that. Having a global flag is IMO a very bad idea as that will make every HTTP request claim a database connection and also span a transaction including locks for much longer than needed.

@ymajoros
Copy link
Author

This is quite opinionated, and I really have a different view and experience.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

Of course, we could replace the relation with an ad-hoc call. But that also means refactoring the database to replace the many-to-one by a relation table, which I don't consider better design for this use case.

'Turn off your brain' is rather though wording: I'd rather rephrase it as decoupling the service returning a Statement from whatever the next transaction wants to do with it.

There is no use case where a user will bring down the whole application, for multiple reasons.

This is also a very simple use case, but navigating complex object graphs not known in advance in a different transaction isn't bad design per se. Performance isn't an afterthought here, on the contrary.

I'm new to this project, but having bigger transactions is one of the steps that we are taking further. Still, it doesn't cover all use cases and we sometimes do want a separate one.

Of course, https://hibernate.atlassian.net/browse/HHH-17750 makes our problem worse: if fetching a lazy transient attribute of a detached entity would work in a new transaction, the need for this would be much lower (though still there).

@beikov
Copy link
Contributor

beikov commented Feb 21, 2024

This is quite opinionated, and I really have a different view and experience.

I was asked for an opinion and here you have it. I simply prefer designing the data access rather than burying requirements in e.g. the UI code through implicit lazy initialization. If you ask me, no entity should ever be serialized to JSON, but only DTOs. I understand that this might seem extreme to some folks, but designing the data access this way makes it way simpler to reason about performance and optimize, simply because there are no unknowns.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

That's totally understandable, hence my suggestion.

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs.

Of course, we could replace the relation with an ad-hoc call. But that also means refactoring the database to replace the many-to-one by a relation table, which I don't consider better design for this use case.

I didn't suggest that and I agree that this would not improve the design.

'Turn off your brain' is rather though wording: I'd rather rephrase it as decoupling the service returning a Statement from whatever the next transaction wants to do with it.

I don't know your application design, but it seems to me that your app could know beforehand what the next transaction requires and hence load the appropriate data without the need for lazy loading. It's a matter of creating separate data access methods for the respective flows.

This is also a very simple use case, but navigating complex object graphs not known in advance in a different transaction isn't bad design per se. Performance isn't an afterthought here, on the contrary.
I'm new to this project, but having bigger transactions is one of the steps that we are taking further. Still, it doesn't cover all use cases and we sometimes do want a separate one.

No idea how you are "navigating" in "a different transaction", but maybe you you can solve your issues by simply loading the subsequent object graphs explicitly instead of relying on lazy loading?

Of course, https://hibernate.atlassian.net/browse/HHH-17750 makes our problem worse: if fetching a lazy transient attribute of a detached entity would work in a new transaction, the need for this would be much lower (though still there).

It seems @mbladel is already looking into this issue, so let's maybe keep the discussion about the Jira issue in the Jira comments.

@ymajoros
Copy link
Author

I simply prefer designing the data access rather than burying requirements in e.g. the UI code through implicit lazy initialization.

This isn't UI code. The first transaction is simply committed and we end up with an object with lazy relations.

If you ask me, no entity should ever be serialized to JSON, but only DTOs.

I do strictly agree with that. Not what is happening in my example.

I understand that this might seem extreme to some folks, but designing the data access this way makes it way simpler to reason about performance and optimize, simply because there are no unknowns.

There are unknowns, the first service doesn't know how much data it has to send back. Even the caller doesn't know at call time, it might need to fetch relations. It's the essence of lazy relationships, the fact they are in a different transaction doesn't make this different.

Even if the recursive fetch is a single request, it is definitely a O(n) one that I want to avoid. As said, I most often do not need this data.

That's totally understandable, hence my suggestion.

Then IMO you should model your different flows explicitly by creating separate data access methods that initializes exactly the state that the caller needs.

No, at the time of the first call, I can't decide how many previous statements I need. It also depends on expensive calculations on the statement itself (calling third party services, ...). I really can't know beforehand.

I don't know your application design, but it seems to me that your app could know beforehand what the next transaction requires and hence load the appropriate data without the need for lazy loading. It's a matter of creating separate data access methods for the respective flows.

So, basically, this is not the case.

I also don't see different kind of structures for slightly different use cases as a better design. Your proposal does have its merits to optimize some use cases, but isn't the holy grail either.

JPA is a leaky abstraction, and that's both a problem and some of its strengths. Lazy loading, even in a different transaction, is quite useful in some scenarios. I do agree it can be misused.

No idea how you are "navigating" in "a different transaction", but maybe you you can solve your issues by simply loading the subsequent object graphs explicitly instead of relying on lazy loading?

That means you need to know all the paths that you'll have to follow beforehand. If there is any complex logic or dependencies, this isn't necessarily the case.

It seems @mbladel is already looking into this issue, so let's maybe keep the discussion about the Jira issue in the Jira comments.

Yes. I just meant this is the first reason why we wanted this feature, even it were only for migration.

@beikov
Copy link
Contributor

beikov commented Feb 21, 2024

No, at the time of the first call, I can't decide how many previous statements I need. It also depends on expensive calculations on the statement itself (calling third party services, ...). I really can't know beforehand.
That means you need to know all the paths that you'll have to follow beforehand. If there is any complex logic or dependencies, this isn't necessarily the case.

Maybe I'm not totally understanding, but in your second transaction, you can execute your logic to determine what data needs to be fetched and then load the Statement with the id of the previousStatement along with how many previousStatement you may need, but in one go. Put all of that into a DTO so there is no lazy entity anymore and you're done :)

@ymajoros
Copy link
Author

Maybe I'm not totally understanding, but in your second transaction, you can execute your logic to determine what data needs to be fetched and then load the Statement with the id of the previousStatement along with how many previousStatement you may need, but in one go. Put all of that into a DTO so there is no lazy entity anymore and you're done :)

In the second transaction, I can't know if I need a 3rd or 4th object before checking the second one. How would an ad-hoc call to fetch a new statement be different from a lazy fetch regarding performance? I'm still in the service layer btw.

@lazyBisa
Copy link

lazyBisa commented Mar 8, 2024

maybe another perspective: we are blocked right now in migrating to quarkus since the existing codebase was using eclipselink and making excessive use of lazy-loading data outside of an active session. While this is something we would like to change, the effort required to refactor the persistence layers in all packages is too much effort, and the service constraints (how it's used, what the code does...) justifies living with it.

We tried configuring the hibernate.enable_lazy_load_no_trans property via quarkus.hibernate-orm.unsupported-properties, but it does not load anything.

@ymajoros
Copy link
Author

ymajoros commented Mar 8, 2024

@lazyBisa I do agree for sure, and migration is part of my issue too.

To be honest: though different point of views on detached entities are somewhat interesting, this is irrelevant here as this is what we choose to use in our own projects. Being able to merge detached entities with lazy fields sounds like a basic use case, regardless of how other people like the pattern or not. JPA is a leaky abstraction, and the arguments I heard here seem to be against some core functionalities of JPA. This doesn't seem like the best place for this discussion IMHO.

Anyway, it seems this was considered as a Hibernate bug and fixed in the upcoming 6.4.5 release, which looks almost ready in Jira: see https://hibernate.atlassian.net/browse/HHH-17750 & https://hibernate.atlassian.net/projects/HHH/versions/32262/tab/release-report-all-issues

So, I guess the next step is now to simply wait on that release + integration of Hibernate core 6.4.5 in Quarkus?

@yrodiere
Copy link
Member

So, I guess the next step is now to simply wait on that release + integration of Hibernate core 6.4.5 in Quarkus?

Correct. That will lead to closing #38807, while this specific issue (open session in view for Quarkus, essentially) will still be open for discussion / up for grab.

@yrodiere yrodiere removed the area/persistence OBSOLETE, DO NOT USE label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hibernate-orm Hibernate ORM kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants