Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOAD CSV improvements #887

Merged
merged 54 commits into from
Mar 8, 2024
Merged

Conversation

rsill-neo4j
Copy link
Contributor

i think the first few (older) examples could be brought in line with the newer examples i added

also, i'm pretty sure the tests will fail left and right - the expected query results were written by hand ✋

finally, the GitHub diff is not really pretty ( = readable). there ought to be a preview of the changes though

Copy link
Contributor

@AlexicaWright AlexicaWright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
Copy link
Contributor

@stefano-ottolenghi stefano-ottolenghi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a structure that would more effectively cater for the needs of the majority is one like:

  • Import csv files into neo4j
    • Load from local paths (one example + saying where they are sourced by default)
    • Load from remote locations
    • Create constraints
    • Handle large amounts of data
    • Import relationships and datasets coming from relational databases (expand on the section you have, this is important. It's often a source of confusion how people are supposed to handle csv files, what they should do if they get a number of csvs and what if they get one csv with all nodes and relationships inside it)
  • Process data ahead of import
    • Cast CSV columns to relevant Neo4j types
    • Split list values
    • Create additional node labels
  • LOAD CSV options and functions
    • {all subsections from CSV file format}
  • Performance recommendations
  • A full example
  • (Other ways of importing data into Neo4j) (just to give pointers to neo4j-admin import and drivers! Say that avoiding load csv and deferring the csv parsing+querying to your own app is a very valid choice (one that field team very often takes))

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
@rsill-neo4j rsill-neo4j marked this pull request as ready for review February 22, 2024 09:05
Copy link
Contributor

@JPryce-Aklundh JPryce-Aklundh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @rsill-neo4j!
I have a few editorial suggestions, and also 2 broader points:

  • Maybe the "Uniqueness constraints" and "Handle large amounts of data" can go into a separate section at the end, called something along the lines of "Recommended/Best practices"?
  • As discussed with @gem-neo4j , I think the "Performance" section could be cut from this page and perhaps be extended into its own tutorial.

Also, make sure to follow the new formating style of data types (e.g 'STRING' instead of 'string' (and 'STRING values' instead of 'strings').

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
Copy link
Contributor

@gem-neo4j gem-neo4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I have added the correct styling (judging from other pages in the docs)

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved
Copy link
Contributor

@gem-neo4j gem-neo4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! just one comment now, will approve assuming you will deal with it!

From a graph perspective, these are nodes with different labels, so it takes different queries to load them.

The example executes multiple passes of `LOAD CSV` on that one file, and each pass focuses on the creation of _one_ entity type.
This is the most performant practice, see <<_separate_creation_of_nodes_and_relationships>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the <<>> part isn't rendering correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch :) it refers to a section we commented out. gonna update

@neo-technology-commit-status-publisher
Copy link
Collaborator

Thanks for the documentation updates.

The preview documentation has now been torn down - reopening this PR will republish it.

Copy link
Contributor

@JPryce-Aklundh JPryce-Aklundh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Great work @rsill-neo4j!

@JPryce-Aklundh JPryce-Aklundh added the cherry-pick-this-to-5.x Cherry pick this PR changes to the 5.x branch label Mar 8, 2024
@rsill-neo4j rsill-neo4j merged commit c82783c into neo4j:dev Mar 8, 2024
5 checks passed
rsill-neo4j added a commit that referenced this pull request Mar 8, 2024
i think the first few (older) examples could be brought in line with the
newer examples i added

also, i'm pretty sure the tests will fail left and right - the expected
query results were written by hand ✋

finally, the GitHub diff is not really pretty ( = readable). there ought
to be a preview of the changes though

---------

Co-authored-by: Jessica Wright <[email protected]>
Co-authored-by: Stefano Ottolenghi <[email protected]>
Co-authored-by: Jens Pryce-Åklundh <[email protected]>
Co-authored-by: Gem Lamont <[email protected]>
@rsill-neo4j rsill-neo4j added the cherry-picked This PR was merged in another version and the changes has been cherry picked in this PR. label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-this-to-5.x Cherry pick this PR changes to the 5.x branch cherry-picked This PR was merged in another version and the changes has been cherry picked in this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants