-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp age csv loader #2044
Merged
jrgemignani
merged 7 commits into
apache:master
from
MuhammadTahaNaveed:age-load-revamp
Aug 14, 2024
Merged
Revamp age csv loader #2044
jrgemignani
merged 7 commits into
apache:master
from
MuhammadTahaNaveed:age-load-revamp
Aug 14, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
MuhammadTahaNaveed
commented
Aug 13, 2024
•
edited
Loading
edited
- Allow 0 as entry_id
- Use batch inserts to improve performance
- Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once.
- BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation.
- Change some of the field names to avoid confusion.
- Use sequence for generating ids for edge and vertex
- Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv.
- Add function to create temporary table for ids, this is only used for loading vertices
- A temporary table is created and populated with already generated vertex ids when first time load_labels_from_file function is called. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. This table and index will be deleted automatically whenever the session ends.
- Whenever a row is inserted in labels, the corresponding id is inserted into temp table as well.
- Add functions to create graph and label automatically
- These functions will check existence of graph and label, and create them if they don't exist.
- No regression test were impacted by this change.
- Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion.
- Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv.
- Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique.
- Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions
- These functions will check existence of graph and label, and create them if they don't exist.
github-actions
bot
added
master
override-stale
To keep issues/PRs untouched from stale action
labels
Aug 13, 2024
jrgemignani
approved these changes
Aug 14, 2024
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 19, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 20, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 20, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 20, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
MuhammadTahaNaveed
added a commit
to MuhammadTahaNaveed/age
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
jrgemignani
pushed a commit
that referenced
this pull request
Aug 22, 2024
* Allow 0 as entry_id - No regression test were impacted by this change. * Use batch inserts to improve performance - Changed heap_insert to heap_multi_insert since it is faster than calling heap_insert() in a loop. When multiple tuples can be inserted on a single page, just a single WAL record covering all of them, and only need to lock/unlock the page once. - BATCH_SIZE is set to 1000, which is the number of tuples to insert in a single batch. This number was chosen after some experimentation. - Change some of the field names to avoid confusion. * Use sequence for generating ids for edge and vertex - Sequence is not used if the id_field_exists is true in load_labels_from_file function, since the entry id is present in the csv. * Add function to create temporary table for ids - Created a temporary table and populate it with already generated vertex ids. A unique index is created on id column to ensure that new ids generated (using entry id from csv) are unique. * Insert generated ids in the temporary table to enforce uniqueness - Insert ids in the temporary table and also update index to enforce uniqueness. - If the entry id provided in the CSV is greater than the current sequence value, the sequence value is updated to match the entry ID. For example: Suppose the current sequence value is 1, and the CSV entry ID is 2. If we use 2 but not update the sequence to 2, next time the CREATE clause is used, 2 will be returned by sequence as an entry id, resulting in duplicate. - Update batch functions * Add functions to create graph and label automatically - These functions will check existence of graph and label, and create them if they don't exist. * Add regression tests
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.