Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize save(All) operations #2975

Open
shanon84 opened this issue Nov 24, 2024 · 1 comment
Open

Optimize save(All) operations #2975

shanon84 opened this issue Nov 24, 2024 · 1 comment
Assignees
Labels
status: waiting-for-triage An issue we've not yet triaged

Comments

@shanon84
Copy link

Hey there,
I have been looking into the logic to save graphs with the Neo4jTemplate. It seems like the processing is very slow if you don't have flat Nodes to save. Let's say we have something like this modelled:
Person -[:lives_at]-> Adress

If we save Persons with Adresses now, the expected operations would be (simplyfied):
bulk save all Persons
bulk save all related Adresses
bulk save Relationships between Persons and Adresses as given

(maybe even save all nodes in chunked bulkrequests and all relationships in other)

But thats not what happens (even in simple scenarios: no version property, no dynamic label, modelled own IDs):
bulk save all Persons
save each Adress on its own
save each Relationship on its own

This leads to very expensive save operations, if you save a graph with more than 2 node types involved.
I think the hole save operation should be overhauled:
pre processing to collect and prepare all nodes and relationship to save
and after knowing what to save the operation itself can happen in bulk(s)

If this is fine with you, I would start with this in some time.

Best regards

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 24, 2024
@meistermeier meistermeier self-assigned this Jan 6, 2025
@meistermeier
Copy link
Collaborator

Thanks for the report. We are aware that some operations, due to technical restrictions, are taking more queries than expected. There were already improvements made to the relationship persistence when it comes to multiple same relationships on one entity.
I know that this does not apply to your example with a lot of root entities which all have just one relationship and one related node. I will keep this issue open, as the other performance improvements, because I think it's important to at least have another closer look if we could batch here. Last time I have revisited this topic, I ended up at a point where it was unclear to the generic code if the ids of all related entities are also user-generated (@GeneratedValue(generator)). This would require a major change in the persistence logic and I am yet unsure if we could find a solution that does not bring more performance impact for people who cannot benefit from but having different graph structures / use-cases. It would require a pre-flight over the reachable entities and relationships to see if they would support such a batch operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

No branches or pull requests

3 participants