Optimize save(All) operations #2975

shanon84 · 2024-11-24T10:38:52Z

Hey there,
I have been looking into the logic to save graphs with the Neo4jTemplate. It seems like the processing is very slow if you don't have flat Nodes to save. Let's say we have something like this modelled:
Person -[:lives_at]-> Adress

If we save Persons with Adresses now, the expected operations would be (simplyfied):
bulk save all Persons
bulk save all related Adresses
bulk save Relationships between Persons and Adresses as given

(maybe even save all nodes in chunked bulkrequests and all relationships in other)

But thats not what happens (even in simple scenarios: no version property, no dynamic label, modelled own IDs):
bulk save all Persons
save each Adress on its own
save each Relationship on its own

This leads to very expensive save operations, if you save a graph with more than 2 node types involved.
I think the hole save operation should be overhauled:
pre processing to collect and prepare all nodes and relationship to save
and after knowing what to save the operation itself can happen in bulk(s)

If this is fine with you, I would start with this in some time.

Best regards

meistermeier · 2025-01-08T08:43:14Z

Thanks for the report. We are aware that some operations, due to technical restrictions, are taking more queries than expected. There were already improvements made to the relationship persistence when it comes to multiple same relationships on one entity.
I know that this does not apply to your example with a lot of root entities which all have just one relationship and one related node. I will keep this issue open, as the other performance improvements, because I think it's important to at least have another closer look if we could batch here. Last time I have revisited this topic, I ended up at a point where it was unclear to the generic code if the ids of all related entities are also user-generated (@GeneratedValue(generator)). This would require a major change in the persistence logic and I am yet unsure if we could find a solution that does not bring more performance impact for people who cannot benefit from but having different graph structures / use-cases. It would require a pre-flight over the reachable entities and relationships to see if they would support such a batch operation.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 24, 2024

meistermeier self-assigned this Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize save(All) operations #2975

Optimize save(All) operations #2975

shanon84 commented Nov 24, 2024

meistermeier commented Jan 8, 2025

Optimize save(All) operations #2975

Optimize save(All) operations #2975

Comments

shanon84 commented Nov 24, 2024

meistermeier commented Jan 8, 2025