Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp age csv loader (#2044) #2068

Merged
merged 1 commit into from
Aug 22, 2024

Commits on Aug 22, 2024

  1. Revamp age csv loader (apache#2044)

    * Allow 0 as entry_id
    
    - No regression test were impacted by this change.
    
    * Use batch inserts to improve performance
    
    - Changed heap_insert to heap_multi_insert since it is faster than
      calling heap_insert() in a loop. When multiple tuples can be inserted
      on a single page, just a single WAL record covering all of them, and
      only need to lock/unlock the page once.
    
    - BATCH_SIZE is set to 1000, which is the number of tuples to insert in
      a single batch. This number was chosen after some experimentation.
    
    - Change some of the field names to avoid confusion.
    
    * Use sequence for generating ids for edge and vertex
    
    - Sequence is not used if the id_field_exists is true in
      load_labels_from_file function, since the entry id is present in the
      csv.
    
    * Add function to create temporary table for ids
    
    - Created a temporary table and populate it with already generated
      vertex ids. A unique index is created on id column to ensure that
      new ids generated (using entry id from csv) are unique.
    
    * Insert generated ids in the temporary table to enforce uniqueness
    
    - Insert ids in the temporary table and also update index to
      enforce uniqueness.
    - If the entry id provided in the CSV is greater than the current
      sequence value, the sequence value is updated to match the entry ID.
      For example:
      Suppose the current sequence value is 1, and the CSV entry ID is 2.
      If we use 2 but not update the sequence to 2, next time the CREATE
      clause is used, 2 will be returned by sequence as an entry id,
      resulting in duplicate.
    - Update batch functions
    
    * Add functions to create graph and label automatically
    
    - These functions will check existence of graph and label, and create
      them if they don't exist.
    
    * Add regression tests
    MuhammadTahaNaveed committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    99a6bab View commit details
    Browse the repository at this point in the history