-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch operations #232
Batch operations #232
Conversation
Hi @AnaNek!
|
0a947f0
to
e691e0b
Compare
e6ea1ce
to
a78cf5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patches!
9d50c96
to
86d471f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments on insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patch! The comparison graph seems astounding. I don't see such improvements every day.
I left some comments after review. I haven't got to review some details yet, but I think there are already plenty of crucial question that we need to resolve.
Oh, and if you have any questions about sharding reload, feel free to ask me any time. It may really be complicated. |
86d471f
to
f65afe3
Compare
715b6ef
to
94f7650
Compare
53d7fb5
to
6c0477e
Compare
I am going to change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ok. I have left several comment about README, feel free to ignore or fix.
Just in case, my expectations:
I'm going to look at insert/replace now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except the upsert API (see above).
I'm not completely sure that I understand how the schema/shading info reload works in different cases, but, well, it is covered by tests and if there are some tricky cases, there is no sense to stop here for a month in attempts to find them.
6c0477e
to
91d7815
Compare
91d7815
to
db644af
Compare
Batch insert is mostly used for operation with one bucket / one Tarantool node in a transaction. In this case batch insert is more efficient then inserting tuple-by-tuple. Right now CRUD cannot provide batch insert with full consistency. CRUD offers batch insert with partial consistency. That means that full consistency can be provided only on single replicaset using `box` transactions. Part of #193
db644af
to
32769f7
Compare
Batch upsert is mostly used for operation with one bucket / one Tarantool node in a transaction. In this case batch upsert is more efficient then upserting tuple-by-tuple. Right now CRUD cannot provide batch upsert with full consistency. CRUD offers batch upsert with partial consistency. That means that full consistency can be provided only on single replicaset using `box` transactions. Part of #193
Batch upsert is mostly used for operation with one bucket / one Tarantool node in a transaction. In this case batch replace is more efficient then replacing tuple-by-tuple. Right now CRUD cannot provide batch replace with full consistency. CRUD offers batch upsert with partial consistency. That means that full consistency can be provided only on single replicaset using `box` transactions. Part of #193
Before this commit `simple_operation_cases` were organized as map (table indexed not with numbers), in Lua iteration over map does not occur in the order in which the elements were specified in the map. But simple operation tests could fail in case if tests would be executed not in the order in which they are specified, because, for example, if `replace()` is performed before `insert()`, an error will be received. So simple operation tests are codependent. To solve this problem `truncate_space_on_cluster` was added after each simple operation test. Part of #193
Since we have PR #244 it will be nice to collect statistics for batch operations too. To establish the effectiveness of `crud.batch_insert()` method compared to `crud.insert()`, perf tests were added. `crud.insert()` in the loop and `crud.batch_insert()` are compared for different batch sizes. Closes #193
32769f7
to
f08319f
Compare
I glanced on the new Thank you for the hard work! |
@AnaNek what storage engine did you use in benchmarks? |
We use |
Right now CRUD cannot provide batch insert/upsert/replace with full consistency.
CRUD offers batch insert/upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using
box
transactions.Proposed API is the following:
For
insert_many
/insert_object_many
:Returns metadata and array contains inserted rows, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain
tuple
field. This field contains the tuplefor which the error occurred.
For
upsert_many
/upsert_object_many
:Returns metadata and array of empty arrays, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain
tuple
field. This field contains the tuplefor which the error occurred.
For
replace_many
/replace_object_many
:Returns metadata and array contains inserted/replaced rows, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain
tuple
field. This field contains the tuplefor which the error occurred.
Perf test run on MacBook Pro (2017) i7/16Gb/512SSD
According to results as the number of inserted tuples increases
the average call time grows faster for
insert()
thenfor
batch_insert()
. For example with batch size == 10000 tuplesinsert()
is ~26 times slower thenbatch_insert()
.==== BATCH COMPARISON PERFORMANCE REPORT ====
== SUCCESS REQUESTS ==
(The higher the better)
== SUCCESS REQUESTS PER SECOND ==
(The higher the better)
== ERRORS ==
(Bad if higher than zero)
== AVERAGE CALL TIME ==
(The lower the better)
== MAX CALL TIME ==
(The lower the better)
"insert" was planned for 30 seconds with 1 connections and 1 fibers total.
"batch_insert" was planned for 30 seconds with 1 connections and 1 fibers total.
Closes #193