sql: batching between statements #19177

petermattis · 2017-10-11T14:20:01Z

A query containing multiple DML statements should gather the KV operations into a single batch request, regardless of whether RETURNING NOTHING is specified. For example, consider a query containing 3 INSERT statements (i.e. these statements all arrive in the same pgwire request):

BEGIN;
INSERT INTO kv VALUES (1, 1); 
INSERT INTO kv VALUES (2, 2); 
INSERT INTO kv VALUES (3, 3);
COMMIT;

The INSERT statements are currently executed serially which translates to poor performance. We can make this slightly better using RETURNING NOTHING:

BEGIN;
INSERT INTO kv VALUES (1, 1) RETURNING NOTHING; 
INSERT INTO kv VALUES (2, 2) RETURNING NOTHING; 
INSERT INTO kv VALUES (3, 3) RETURNING NOTHING;
COMMIT;

With RETURNING NOTHING, we'll now parallelize that above work. An even better approach (faster, less resource consumption) is to use a single INSERT statement:

BEGIN;
INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3)
COMMIT;

Unfortunately, not all applications can easily be changed to use a batched insert or RETURNING NOTHING. It seems feasible to convert multiple INSERT statements received at the same time, into a single KV batch request. Lots of refactoring will be necessary in order to perform error handling correctly and deal with complexities like foreign keys, but those don't seem like insurmountable problems.

As a first step, we should quantify the performance difference between the 3 approaches.

The text was updated successfully, but these errors were encountered:

tbg · 2017-10-11T16:36:44Z

This can easily be a pessimization as well, since

INSERT INTO kv VALUES (1, 1) RETURNING NOTHING; 
INSERT INTO kv VALUES (2, 2) RETURNING NOTHING; 
INSERT INTO kv VALUES (3, 3) RETURNING NOTHING;

runs three single-statement txns (=possible 1pc)

and

INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3)

runs a three-statement txn. It's certainly going to be faster for that particular schema and with the particular number of three rows, but generally we should be careful about creating "spread out" transactions. Such transactions would be in a single batch, but then DistSender does all the splitting into small batches again, which makes me wonder if it's really going to be all that faster due to the longer-lived intents. The third example, for instance, should even perform worse than the second if all three values live on separate ranges (since DistSender will hold back the EndTransaction and we'll eat extra round trips), unless the SQL overhead really dominates.

I'm also concerned that it'll become less straightforward for users to steer what gets batched up with what and how large the de-facto transactions are.

Looking forward to the explorations. I expect the "right thing" to be fairly complex, but I'm ready to be surprised.

petermattis · 2017-10-11T16:49:54Z

That's a good point, though I should clarify that the single-query usually is wrapped in BEGIN...COMMIT. I'll adjust the original message.

jordanlewis · 2017-10-11T17:08:00Z

Is there an efficient way to do that DistSender work of splitting-by-range earlier in the stack to inform a decision? At least in the case where we're not in a SQL transaction.

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch? If we already know that our input SQL batch isn't in a transaction, we should propagate that information down through whatever new pathway is required here to prevent having to re-txnify them.

nvanbenschoten · 2017-10-11T17:20:38Z

Unfortunately, not all applications can easily be changed to use a batched insert or RETURNING NOTHING

This is certainly true, but I wonder how often these same applications are able to batch multiple statements into the same pgwire request.

An alternative which I've discussed with @robert-s-lee before was offering a session variable that made all applicable statements use RETURNING NOTHING semantics, even without the individual statements specifying it. Because of the differences in error handling/return values, this, of course, would need to be opt-in.

It's also possible for us to do this fully transparently with no client intervention for any batch that contains a COMMIT statement. This is fine because we'll necessarily have to wait for all statements to finish before responding to the client, so we can in this case return the result of the parallel executed statements instead of dropping them on the floor.

nvanbenschoten · 2017-10-11T17:24:34Z

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch?

@jordanlewis could you expand on this? We run single statements in implicit transactions, and I'm not seeing how we could avoid doing so while still providing intra-statement ACID guarantees, especially if the statement spans ranges.

jordanlewis · 2017-10-11T17:25:59Z

@nvanbenschoten we don't need intra-statement ACID guarantees when the input batch wasn't wrapped in BEGIN..END, right? Or do we?

nvanbenschoten · 2017-10-11T17:31:54Z

Don't we? With the statement INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3), we can't have the (3,3) insert on one range but the (1,1) and (2,2) fail on another.

jordanlewis · 2017-10-11T17:33:30Z

Oh - we're talking about different things in that case. That is a single statement, in my understanding.

I was discussing multi-statement batches of the form:

INSERT INTO kv VALUES (1,1);
INSERT INTO kv VALUES (2,2);

nvanbenschoten · 2017-10-11T17:39:10Z

Oh, I see where we were misunderstanding each other. I think @petermattis's correction above was clarifying that (at least for this discussion) we're always talking about batching statements within the same Txn. Batching across Txn boundaries would be much more difficult.

robert-s-lee · 2017-10-11T17:46:33Z

MySQL has insert ignore that lets failed rows fail but let the good rows to go through. When multi row is used for performance reasons, having control over error condition and what rows errored out will be useful. Otherwise when a large multi row insert fails, the app will be forced to revert to single row by row insert in order to find the row that caused the violation.

…

Sent from my iPhone

On Oct 11, 2017, at 1:39 PM, Nathan VanBenschoten ***@***.***> wrote: Oh, I see where we were misunderstanding each other. I think @petermattis's correction above was clarifying that (at least for this discussion) we're always talking about batching statements within the same Txn. Batching across Txn boundaries would be much more difficult. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bdarnell · 2017-10-15T23:03:27Z

This is certainly true, but I wonder how often these same applications are able to batch multiple statements into the same pgwire request.

Yeah, I think this is a major concern. My sense is that the ability to send multiple commands at once like this is fairly uncommon (less common than the ability to send single-statement batched inserts). Before we try to quantify the performance differences we should see whether anyone would be able to take advantage of it.

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch?

Currently, if the DistSender splits a batch, it wraps it in a transaction (grep for OpRequiresTxnError for context) unless it specifies INCONSISTENT reads or only contains requests that cannot be part of a transaction (using the isTxn method flag). It would be possible to add a flag to the batch header to indicate that the DistSender may split up the batch without wrapping it in a transaction, although it may be tricky to distinguish ambiguous from unambiguous errors when this happens. We'd also need to make sure that each INSERT statement remains atomic (if we're not already in a transaction. So if you did INSERT INTO kv VALUES (1), (2); INSERT INTO kv VALUES (3), the writes for 1 and 2 would need to be atomic. In practice we'd probably have to just disable this optimization for this case).

MySQL has insert ignore that lets failed rows fail but let the good rows to go through.

We have this in the form of ON CONFLICT DO NOTHING.

petermattis · 2017-10-16T00:21:01Z

Yeah, I think this is a major concern. My sense is that the ability to send multiple commands at once like this is fairly uncommon (less common than the ability to send single-statement batched inserts). Before we try to quantify the performance differences we should see whether anyone would be able to take advantage of it.

Hmm, I thought I recall @robert-s-lee saying that certain ORMs were doing exactly this. That was the motivation for this issue.

petermattis · 2017-10-16T13:34:52Z

#19267 indicates a user inserting data by doing:

BEGIN;
INSERT INTO ...;
INSERT INTO ....,
INSERT INTO ...;
...
COMMIT;

I've asked for clarification on how the INSERTs are being performed.

petermattis · 2017-10-16T18:18:21Z

In #19267, the user reports doing:

BEGIN;
INSERT INTO t1 ...;
INSERT INTO t2 ...;
INSERT INTO t3 ...;
INSERT INTO t4 ...;
INSERT INTO t5 ...;
COMMIT;

That entire block appears to be sent as a single query (the user creates a single string and executes it using db.query), though it would be good to verify that the node.js drive is actually doing that.

@nvanbenschoten Note that adding RETURNING NOTHING to those INSERTs apparently didn't improve performance. That's mildly surprising.

nvanbenschoten · 2017-10-16T18:40:40Z

@petermattis I suspect that the issue in #19267 is related to some interplay between the TRUNCATE and the INSERT statements. The test is only inserting 12 rows and is taking upwards of 15 seconds (instead of .33 seconds like it used to). Because of that, I wouldn't expect speeding up the INSERT statements in isolation to make a visible difference.

nvanbenschoten · 2017-10-20T02:46:36Z

One thing I want to characterize here is the difference between gathering KV operations of multiple statements into a single batch and running each KV operation in its own batch but running all KV operations in parallel. I don't expect there to be a big difference between the two, but perhaps I'm missing a subtle detail.

The reason this would be interesting is that in all of the cases where we would consider batching KV operations together, we could also just transparently use parallel statement execution. For instance, if an entire batch of statements like the following arrived over the wire, there's little reason we couldn't run the INSERT statements as parallel statements behind the scenes:

BEGIN;
INSERT INTO kv VALUES (1, 1); 
INSERT INTO kv VALUES (2, 2); 
INSERT INTO kv VALUES (3, 3);
COMMIT;

This is something @andreimatei has been pushing for a while, and it's something that his talk yesterday got me thinking more about.

The only real complication would be error handling. For instance, imagine all three of the INSERTs should result in errors. We should return the error for the first one, but if the third one observes an error first, it may disturb the first one and cause it to get a different error. This issue is shared between the parallel KV batches and the combined KV batch proposals.

andreimatei · 2017-10-23T19:21:44Z

We should return the error for the first one, but if the third one observes an error first, it may disturb the first one and cause it to get a different error. This issue is shared between the parallel KV batches and the combined KV batch proposals.

How true is this for "combined KV batch"? How exactly can the third request in a KV batch influence the result for the first request in that batch?

Btw, as you know, I'm a fan of "transparently using parallel statement execution". We shouldn't let the error semantics wag the dog too much, I'd say.
And apropos nothing, on the use parallel statements vs KV combining batches, one point is that we generally should have a more flexible execution infrastructure and mapping between SQL queries and KV batches, because we also want the opposite of combining - we want large INSERTs to produce more than a single KV batch.

tbg · 2017-10-26T14:13:37Z

Another use case that a customer has asked about is inserting into a table and the corresponding interleaved table in one batch:

i.e. both inserts in the same sql statement, would that be batched in a single raft command? I actually haven't tested this with the pq driver and I'm not sure how the driver returns results for composed statements like that.

INSERT INTO foo VALUES('id', ...);INSERT INTO interleaved_foo VALUES ('id', 1, ...);

bdarnell · 2017-10-26T15:29:50Z

Oracle has an extension for multi-table inserts: INSERT ALL INTO foo VALUES (...) INTO interleaved_foo VALUES (...).

I tried to kludge something together into a single "statement" using our support for subqueries (select * from [insert into foo values (...) returning 1] union select * from [insert into interleaved_foo values(...) returning 1]) but (unsurprisingly) couldn't get anything that could consolidate the writes into a single KV operation.

petermattis · 2017-10-30T13:58:29Z

Another use case that a customer has asked about is inserting into a table and the corresponding interleaved table in one batch.

Just encountered this again this morning. Customer isn't currently using interleaving, but their schema should be using it. If we could insert into both tables in a single query, we'd be able to hit the 1PC code path.

nvanbenschoten · 2017-10-31T07:36:51Z

INSERT ALL INTO foo VALUES (...) INTO interleaved_foo VALUES (...) sounds like a good idea for this situation. If a client workload is depending on the 1PC code path for performance then I'd expect that they'd want to be very explicit that they're going to use it. Because of this, a single multi-table insert statement seems like a more ideal solution than telling users to hope that a semi-colon separated statement string doesn't get split by a client driver or somewhere else along the way such that we can't apply a transparent optimizing transform to coalesce the statements into the same batch. That all said, the latter is still something we'll want to do, regardless of the details of its implementation with a coalesced batch or parallel batches.

Also, is the range split point logic aware of table interleaving? Will it allow parent rows to be split from child rows? In other words, can a single batch containing a parent and child row always rely on using 1PC?

petermattis · 2017-10-31T13:35:23Z

Also, is the range split point logic aware of table interleaving? Will it allow parent rows to be split from child rows? In other words, can a single batch containing a parent and child row always rely on using 1PC?

Yes, I'm pretty sure we don't split ranges at child row boundaries.

bdarnell · 2017-10-31T15:45:33Z

We do split within interleaved child tables - otherwise an entire interleave subtree would have to fit in a single range, which would be extremely limited. The only restriction is that we don't split between column families of the same row.

Once a child table has grown beyond the range size, it's no longer guaranteed that we'll be able to update the parent row and a child row in a 1PC transaction (but you might be able to optimize for this case by choosing the ordering of the child table's primary key).

andreimatei · 2017-11-01T15:48:10Z

@nvanbenschoten I think I don't like the idea of new syntax for these coupled inserts. Or, rather, if there is new syntax to be introduced, I think it should more general, in the form of hints to our execution about the lookahead it should use when batching things together. I.e. a way of expressing, at the level of a session or a statement, that I'm willing to trade (or, rather, risk trading) a little bit of latency in receiving the result of the first INSERT for the benefit of allowing the 2nd INSERT to make it to the server and getting better overall latency for the whole transaction.

FWIW, I'm currently working on removing any and all limitations of our execution related to how the client organized statements in query strings. Whether or not a client driver will split query strings or not will be inconsequential to the execution engine (modulo the network delay between statements that may be introduced as a result). This, by itself, doesn't directly mean we can batch anything more than we do now, but it does mean that, were we to get batching, we'd also have batching across query strings.

bdarnell · 2017-11-01T16:00:27Z

The advantage of "new" syntax (it's not entirely new, since oracle has this syntax, although I don't know if any ORMs can take advantage of it) is that it works even with drivers that don't support sending multiple pipelined statements. We've seen some drivers that parse the query on the client, split it up into separate statements, then send those one at a time. For those drivers, it really needs to be a single statement (the UNION ALL/INSERT RETURNING version I posted before also satisfies this criteria)

andreimatei · 2017-11-01T18:25:22Z

Point taken. If there's precedent for that INSERT syntax, I'm less against it :)

nvanbenschoten · 2018-01-25T17:32:15Z

Moving this to the 2.1 milestone as I don't foresee us making any progress here in the next few weeks.

petermattis · 2018-03-15T16:22:33Z

Related to #16026.

knz · 2018-05-15T17:23:45Z

I think the initial proposal here is unsound if phrased as applicable in the general case, because the SQL logical error (if any, e.g. duplicate key) must be reported for the specific statement that causes it, not later.

This is also related to #25230.

The initial proposal must be altered, to only enable the optimization if it can be proven ahead of time that no SQL logical errors are possible:

the INSERT/UPDATE values do not include PK columns, or the PK is non-unique
no ON CONFLICT clause on INSERT, and no UPSERT
there is no CHECK or computed expression on any modified column
there are no FK constraints
etc

nvanbenschoten · 2018-09-25T18:43:53Z

I'm going to close this. Transactional pipelining gives almost all of the benefits that we wanted from this while avoiding this proposal's issues with contention due to delayed intent generation. For details, see this proposal and why it was not pursued.

#22388 is still a specific but useful extension that I think we should consider if we care about interleaved tables.

petermattis added this to the 1.2 milestone Oct 11, 2017

petermattis assigned nvanbenschoten Oct 11, 2017

nvanbenschoten removed this from the 2.0 milestone Jan 25, 2018

nvanbenschoten added this to the 2.1 milestone Jan 25, 2018

nvanbenschoten mentioned this issue Feb 5, 2018

sql: support multi-table INSERT/UPSERT statements #22388

Closed

petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Mar 15, 2018

knz added the A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. label May 9, 2018

nvanbenschoten closed this as completed Sep 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: batching between statements #19177

sql: batching between statements #19177

petermattis commented Oct 11, 2017 •

edited

Loading

tbg commented Oct 11, 2017

petermattis commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

robert-s-lee commented Oct 11, 2017 via email

bdarnell commented Oct 15, 2017

petermattis commented Oct 16, 2017

petermattis commented Oct 16, 2017

petermattis commented Oct 16, 2017

nvanbenschoten commented Oct 16, 2017

nvanbenschoten commented Oct 20, 2017

andreimatei commented Oct 23, 2017

tbg commented Oct 26, 2017

bdarnell commented Oct 26, 2017

petermattis commented Oct 30, 2017

nvanbenschoten commented Oct 31, 2017

petermattis commented Oct 31, 2017

bdarnell commented Oct 31, 2017

andreimatei commented Nov 1, 2017

bdarnell commented Nov 1, 2017

andreimatei commented Nov 1, 2017

nvanbenschoten commented Jan 25, 2018

petermattis commented Mar 15, 2018

knz commented May 15, 2018

nvanbenschoten commented Sep 25, 2018

sql: batching between statements #19177

sql: batching between statements #19177

Comments

petermattis commented Oct 11, 2017 • edited Loading

tbg commented Oct 11, 2017

petermattis commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

jordanlewis commented Oct 11, 2017

nvanbenschoten commented Oct 11, 2017

robert-s-lee commented Oct 11, 2017 via email

bdarnell commented Oct 15, 2017

petermattis commented Oct 16, 2017

petermattis commented Oct 16, 2017

petermattis commented Oct 16, 2017

nvanbenschoten commented Oct 16, 2017

nvanbenschoten commented Oct 20, 2017

andreimatei commented Oct 23, 2017

tbg commented Oct 26, 2017

bdarnell commented Oct 26, 2017

petermattis commented Oct 30, 2017

nvanbenschoten commented Oct 31, 2017

petermattis commented Oct 31, 2017

bdarnell commented Oct 31, 2017

andreimatei commented Nov 1, 2017

bdarnell commented Nov 1, 2017

andreimatei commented Nov 1, 2017

nvanbenschoten commented Jan 25, 2018

petermattis commented Mar 15, 2018

knz commented May 15, 2018

nvanbenschoten commented Sep 25, 2018

petermattis commented Oct 11, 2017 •

edited

Loading