Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch operations #232

Merged
merged 5 commits into from
Jun 28, 2022
Merged

Batch operations #232

merged 5 commits into from
Jun 28, 2022

Conversation

AnaNek
Copy link
Contributor

@AnaNek AnaNek commented Nov 11, 2021

Right now CRUD cannot provide batch insert/upsert/replace with full consistency.
CRUD offers batch insert/upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using box transactions.

Proposed API is the following:

-- Batch insert tuples
local result, err = crud.insert_many(space_name, tuples, opts)
-- Batch insert objects
local result, err = crud.insert_object_many(space_name, objects, opts)
-- Batch upsert tuples
local result, err = crud.upsert_many(space_name, tuples, operations, opts)
-- Batch upsert objects
local result, err = crud.upsert_object_many(space_name, objects, operations, opts)
-- Batch replace tuples
local result, err = crud.replace_many(space_name, tuples, opts)
-- Batch replace objects
local result, err = crud.replace_object_many(space_name, objects, opts)

For insert_many/insert_object_many:

Returns metadata and array contains inserted rows, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain tuple field. This field contains the tuple
for which the error occurred.

For upsert_many/upsert_object_many:

Returns metadata and array of empty arrays, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain tuple field. This field contains the tuple
for which the error occurred.

For replace_many/replace_object_many:

Returns metadata and array contains inserted/replaced rows, array of errors (
one error corresponds to one replicaset for which the error occurred).
Error object can contain tuple field. This field contains the tuple
for which the error occurred.

Perf test run on MacBook Pro (2017) i7/16Gb/512SSD
According to results as the number of inserted tuples increases
the average call time grows faster for insert() then
for batch_insert(). For example with batch size == 10000 tuples
insert() is ~26 times slower then batch_insert().

==== BATCH COMPARISON PERFORMANCE REPORT ====

== SUCCESS REQUESTS ==
(The higher the better)

1 10 100 1000 10000
insert 68892 11171 1205 120 13
batch_insert 76141 46159 16627 2276 317

== SUCCESS REQUESTS PER SECOND ==
(The higher the better)

1 10 100 1000 10000
insert 2296.39 372.33 40.15 4.00 0.40
batch_insert 2538.01 1538.62 554.22 75.84 10.56

== ERRORS ==
(Bad if higher than zero)

1 10 100 1000 10000
insert 0 0 0 0 0
batch_insert 0 0 0 0 0

== AVERAGE CALL TIME ==
(The lower the better)

1 10 100 1000 10000
insert 0.434 ms 2.684 ms 24.907 ms 250.079 ms 2484.446 ms
batch_insert 0.393 ms 0.649 ms 1.803 ms 13.183 ms 94.658 ms

== MAX CALL TIME ==
(The lower the better)

1 10 100 1000 10000
insert 10.472 ms 8.554 ms 49.886 ms 294.954 ms 2540.472 ms
batch_insert 10.572 ms 5.614 ms 10.994 ms 31.748 ms 130.214 ms

"insert" was planned for 30 seconds with 1 connections and 1 fibers total.
"batch_insert" was planned for 30 seconds with 1 connections and 1 fibers total.

image

Closes #193

@AnaNek AnaNek marked this pull request as draft November 11, 2021 09:45
@AnaNek AnaNek requested a review from Totktonada November 11, 2021 09:45
@akudiyar
Copy link
Contributor

akudiyar commented Dec 5, 2021

Hi @AnaNek!
Thank you very much for implementing this, it's a very important problem. Here are my thoughts about the possible implementation, I hope you will consider them too:

  • Inserting tuple-by-tuple is not very efficient. When there's a task of butch inserting/replacing, in most of the cases that means operating with one bucket / one Tarantool node in a transaction, for the most efficiency
  • If we have a storage name and a bucket ID, we can efficiently do batch replacing on a storage node.
  • We can split the incoming tuples on the router by bucket ID, pack them into batches and send batches to the storage nodes.
  • The result of the operation must be consistent (but it may be chosen by the user) -- e.g., if one tuple fails to insert, the whole pack is considered failed and a comprehensive error is returned to the user. Also when inserting large sets of data we need to consider lowering the network traffic impact -- thus, we need to provide the ability to switch off returning the results and do some research about the minimization of extra data (metadata) transferring between client/router and router/storage.

@AnaNek AnaNek force-pushed the batch-insert-upsert branch 2 times, most recently from 0a947f0 to e691e0b Compare February 3, 2022 09:53
@AnaNek AnaNek marked this pull request as ready for review February 3, 2022 10:04
@AnaNek AnaNek force-pushed the batch-insert-upsert branch 5 times, most recently from e6ea1ce to a78cf5f Compare February 7, 2022 09:06
@AnaNek AnaNek requested a review from ligurio February 10, 2022 10:36
Copy link
Member

@ligurio ligurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patches!

CHANGELOG.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
crud/common/sharding.lua Outdated Show resolved Hide resolved
crud/common/utils.lua Outdated Show resolved Hide resolved
test/integration/batch_operations_test.lua Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@AnaNek AnaNek force-pushed the batch-insert-upsert branch 3 times, most recently from 9d50c96 to 86d471f Compare April 9, 2022 19:08
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments on insert.

README.md Outdated Show resolved Hide resolved
crud/common/call.lua Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
crud/common/sharding/init.lua Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
crud/common/utils.lua Outdated Show resolved Hide resolved
crud/common/utils.lua Outdated Show resolved Hide resolved
Copy link
Member

@DifferentialOrange DifferentialOrange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patch! The comparison graph seems astounding. I don't see such improvements every day.

I left some comments after review. I haven't got to review some details yet, but I think there are already plenty of crucial question that we need to resolve.

README.md Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
crud/batch_insert.lua Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
crud/common/utils.lua Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
test/integration/stats_test.lua Outdated Show resolved Hide resolved
test/performance/perf_test.lua Show resolved Hide resolved
@DifferentialOrange
Copy link
Member

Oh, and if you have any questions about sharding reload, feel free to ask me any time. It may really be complicated.

@AnaNek AnaNek force-pushed the batch-insert-upsert branch from 86d471f to f65afe3 Compare May 13, 2022 11:07
@ligurio ligurio removed their request for review May 16, 2022 09:28
@AnaNek AnaNek force-pushed the batch-insert-upsert branch 7 times, most recently from 715b6ef to 94f7650 Compare May 19, 2022 00:19
CHANGELOG.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@AnaNek AnaNek force-pushed the batch-insert-upsert branch 2 times, most recently from 53d7fb5 to 6c0477e Compare June 24, 2022 12:13
@AnaNek
Copy link
Contributor Author

AnaNek commented Jun 24, 2022

I am going to change upsert_many()/upsert_object_many() API but other operations done.

Copy link
Member

@DifferentialOrange DifferentialOrange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ok. I have left several comment about README, feel free to ignore or fix.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@Totktonada
Copy link
Member

Totktonada commented Jun 24, 2022

Just in case, my expectations:

  1. Issue re noreturn (see Batch operations #232 (comment)). Added to Add new flag "return nothing" in options for DML operations #267.
  2. Issue re returning index of a tuple in the error (it should ease client's code in tt crud import).
  3. In upsert_many(): don't return rows.
  4. In upsert_many(): accept an array of {tuple, ops} tables as input (instead of tuples and ops arrays separately).

I'm going to look at insert/replace now.

crud/common/schema.lua Outdated Show resolved Hide resolved
crud/common/utils.lua Outdated Show resolved Hide resolved
Copy link
Member

@Totktonada Totktonada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the upsert API (see above).

I'm not completely sure that I understand how the schema/shading info reload works in different cases, but, well, it is covered by tests and if there are some tricky cases, there is no sense to stop here for a month in attempts to find them.

crud/upsert_many.lua Outdated Show resolved Hide resolved
@AnaNek AnaNek force-pushed the batch-insert-upsert branch from 6c0477e to 91d7815 Compare June 27, 2022 11:59
@AnaNek AnaNek marked this pull request as draft June 27, 2022 12:00
@AnaNek AnaNek force-pushed the batch-insert-upsert branch from 91d7815 to db644af Compare June 27, 2022 13:27
Batch insert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch insert is more efficient
then inserting tuple-by-tuple.
Right now CRUD cannot provide batch insert with full consistency.
CRUD offers batch insert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
@AnaNek AnaNek force-pushed the batch-insert-upsert branch from db644af to 32769f7 Compare June 28, 2022 10:48
AnaNek added 4 commits June 28, 2022 14:09
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch upsert is more efficient
then upserting tuple-by-tuple.
Right now CRUD cannot provide batch upsert with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
Batch upsert is mostly used for operation with
one bucket / one Tarantool node in a transaction.
In this case batch replace is more efficient
then replacing tuple-by-tuple.
Right now CRUD cannot provide batch replace with full consistency.
CRUD offers batch upsert with partial consistency. That means
that full consistency can be provided only on single replicaset
using `box` transactions.

Part of #193
Before this commit `simple_operation_cases`
were organized as map
(table indexed not with numbers),
in Lua iteration over map does not occur in
the order in which the elements were specified
in the map.
But simple operation tests could fail
in case if tests would be executed not
in the order in which they are specified,
because, for example, if `replace()` is performed
before `insert()`, an error will be received.
So simple operation tests are codependent.
To solve this problem `truncate_space_on_cluster`
was added after each simple operation test.

Part of #193
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
@AnaNek AnaNek force-pushed the batch-insert-upsert branch from 32769f7 to f08319f Compare June 28, 2022 11:11
@AnaNek AnaNek marked this pull request as ready for review June 28, 2022 11:24
@Totktonada
Copy link
Member

I glanced on the new upsert_many() implementation and I don't see any problems visually. I think the patchset is ready to go.

Thank you for the hard work!

@Totktonada Totktonada merged commit 7f84d60 into master Jun 28, 2022
@Totktonada Totktonada deleted the batch-insert-upsert branch June 28, 2022 17:07
@iimos
Copy link

iimos commented Oct 20, 2022

@AnaNek what storage engine did you use in benchmarks?

@DifferentialOrange
Copy link
Member

@AnaNek what storage engine did you use in benchmarks?

We use memtx for the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add batch insert/upsert/insert_objects/upsert_objects
6 participants