Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: batching between statements #19177

Closed
petermattis opened this issue Oct 11, 2017 · 30 comments
Closed

sql: batching between statements #19177

petermattis opened this issue Oct 11, 2017 · 30 comments
Assignees
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Milestone

Comments

@petermattis
Copy link
Collaborator

petermattis commented Oct 11, 2017

A query containing multiple DML statements should gather the KV operations into a single batch request, regardless of whether RETURNING NOTHING is specified. For example, consider a query containing 3 INSERT statements (i.e. these statements all arrive in the same pgwire request):

BEGIN;
INSERT INTO kv VALUES (1, 1); 
INSERT INTO kv VALUES (2, 2); 
INSERT INTO kv VALUES (3, 3);
COMMIT;

The INSERT statements are currently executed serially which translates to poor performance. We can make this slightly better using RETURNING NOTHING:

BEGIN;
INSERT INTO kv VALUES (1, 1) RETURNING NOTHING; 
INSERT INTO kv VALUES (2, 2) RETURNING NOTHING; 
INSERT INTO kv VALUES (3, 3) RETURNING NOTHING;
COMMIT;

With RETURNING NOTHING, we'll now parallelize that above work. An even better approach (faster, less resource consumption) is to use a single INSERT statement:

BEGIN;
INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3)
COMMIT;

Unfortunately, not all applications can easily be changed to use a batched insert or RETURNING NOTHING. It seems feasible to convert multiple INSERT statements received at the same time, into a single KV batch request. Lots of refactoring will be necessary in order to perform error handling correctly and deal with complexities like foreign keys, but those don't seem like insurmountable problems.

As a first step, we should quantify the performance difference between the 3 approaches.

@petermattis petermattis added this to the 1.2 milestone Oct 11, 2017
@tbg
Copy link
Member

tbg commented Oct 11, 2017

This can easily be a pessimization as well, since

INSERT INTO kv VALUES (1, 1) RETURNING NOTHING; 
INSERT INTO kv VALUES (2, 2) RETURNING NOTHING; 
INSERT INTO kv VALUES (3, 3) RETURNING NOTHING;

runs three single-statement txns (=possible 1pc)

and

INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3)

runs a three-statement txn. It's certainly going to be faster for that particular schema and with the particular number of three rows, but generally we should be careful about creating "spread out" transactions. Such transactions would be in a single batch, but then DistSender does all the splitting into small batches again, which makes me wonder if it's really going to be all that faster due to the longer-lived intents. The third example, for instance, should even perform worse than the second if all three values live on separate ranges (since DistSender will hold back the EndTransaction and we'll eat extra round trips), unless the SQL overhead really dominates.

I'm also concerned that it'll become less straightforward for users to steer what gets batched up with what and how large the de-facto transactions are.

Looking forward to the explorations. I expect the "right thing" to be fairly complex, but I'm ready to be surprised.

@petermattis
Copy link
Collaborator Author

That's a good point, though I should clarify that the single-query usually is wrapped in BEGIN...COMMIT. I'll adjust the original message.

@jordanlewis
Copy link
Member

Is there an efficient way to do that DistSender work of splitting-by-range earlier in the stack to inform a decision? At least in the case where we're not in a SQL transaction.

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch? If we already know that our input SQL batch isn't in a transaction, we should propagate that information down through whatever new pathway is required here to prevent having to re-txnify them.

@nvanbenschoten
Copy link
Member

Unfortunately, not all applications can easily be changed to use a batched insert or RETURNING NOTHING

This is certainly true, but I wonder how often these same applications are able to batch multiple statements into the same pgwire request.

An alternative which I've discussed with @robert-s-lee before was offering a session variable that made all applicable statements use RETURNING NOTHING semantics, even without the individual statements specifying it. Because of the differences in error handling/return values, this, of course, would need to be opt-in.

It's also possible for us to do this fully transparently with no client intervention for any batch that contains a COMMIT statement. This is fine because we'll necessarily have to wait for all statements to finish before responding to the client, so we can in this case return the result of the parallel executed statements instead of dropping them on the floor.

@nvanbenschoten
Copy link
Member

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch?

@jordanlewis could you expand on this? We run single statements in implicit transactions, and I'm not seeing how we could avoid doing so while still providing intra-statement ACID guarantees, especially if the statement spans ranges.

@jordanlewis
Copy link
Member

@nvanbenschoten we don't need intra-statement ACID guarantees when the input batch wasn't wrapped in BEGIN..END, right? Or do we?

@nvanbenschoten
Copy link
Member

Don't we? With the statement INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3), we can't have the (3,3) insert on one range but the (1,1) and (2,2) fail on another.

@jordanlewis
Copy link
Member

Oh - we're talking about different things in that case. That is a single statement, in my understanding.

I was discussing multi-statement batches of the form:

INSERT INTO kv VALUES (1,1);
INSERT INTO kv VALUES (2,2);

@nvanbenschoten
Copy link
Member

Oh, I see where we were misunderstanding each other. I think @petermattis's correction above was clarifying that (at least for this discussion) we're always talking about batching statements within the same Txn. Batching across Txn boundaries would be much more difficult.

@robert-s-lee
Copy link
Contributor

robert-s-lee commented Oct 11, 2017 via email

@bdarnell
Copy link
Contributor

This is certainly true, but I wonder how often these same applications are able to batch multiple statements into the same pgwire request.

Yeah, I think this is a major concern. My sense is that the ability to send multiple commands at once like this is fairly uncommon (less common than the ability to send single-statement batched inserts). Before we try to quantify the performance differences we should see whether anyone would be able to take advantage of it.

Alternatively, isn't there a "DistSender mode" or sequence of commands that produce a non-transactional batch?

Currently, if the DistSender splits a batch, it wraps it in a transaction (grep for OpRequiresTxnError for context) unless it specifies INCONSISTENT reads or only contains requests that cannot be part of a transaction (using the isTxn method flag). It would be possible to add a flag to the batch header to indicate that the DistSender may split up the batch without wrapping it in a transaction, although it may be tricky to distinguish ambiguous from unambiguous errors when this happens. We'd also need to make sure that each INSERT statement remains atomic (if we're not already in a transaction. So if you did INSERT INTO kv VALUES (1), (2); INSERT INTO kv VALUES (3), the writes for 1 and 2 would need to be atomic. In practice we'd probably have to just disable this optimization for this case).

MySQL has insert ignore that lets failed rows fail but let the good rows to go through.

We have this in the form of ON CONFLICT DO NOTHING.

@petermattis
Copy link
Collaborator Author

Yeah, I think this is a major concern. My sense is that the ability to send multiple commands at once like this is fairly uncommon (less common than the ability to send single-statement batched inserts). Before we try to quantify the performance differences we should see whether anyone would be able to take advantage of it.

Hmm, I thought I recall @robert-s-lee saying that certain ORMs were doing exactly this. That was the motivation for this issue.

@petermattis
Copy link
Collaborator Author

#19267 indicates a user inserting data by doing:

BEGIN;
INSERT INTO ...;
INSERT INTO ....,
INSERT INTO ...;
...
COMMIT;

I've asked for clarification on how the INSERTs are being performed.

@petermattis
Copy link
Collaborator Author

In #19267, the user reports doing:

BEGIN;
INSERT INTO t1 ...;
INSERT INTO t2 ...;
INSERT INTO t3 ...;
INSERT INTO t4 ...;
INSERT INTO t5 ...;
COMMIT;

That entire block appears to be sent as a single query (the user creates a single string and executes it using db.query), though it would be good to verify that the node.js drive is actually doing that.

@nvanbenschoten Note that adding RETURNING NOTHING to those INSERTs apparently didn't improve performance. That's mildly surprising.

@nvanbenschoten
Copy link
Member

@petermattis I suspect that the issue in #19267 is related to some interplay between the TRUNCATE and the INSERT statements. The test is only inserting 12 rows and is taking upwards of 15 seconds (instead of .33 seconds like it used to). Because of that, I wouldn't expect speeding up the INSERT statements in isolation to make a visible difference.

@nvanbenschoten
Copy link
Member

One thing I want to characterize here is the difference between gathering KV operations of multiple statements into a single batch and running each KV operation in its own batch but running all KV operations in parallel. I don't expect there to be a big difference between the two, but perhaps I'm missing a subtle detail.

The reason this would be interesting is that in all of the cases where we would consider batching KV operations together, we could also just transparently use parallel statement execution. For instance, if an entire batch of statements like the following arrived over the wire, there's little reason we couldn't run the INSERT statements as parallel statements behind the scenes:

BEGIN;
INSERT INTO kv VALUES (1, 1); 
INSERT INTO kv VALUES (2, 2); 
INSERT INTO kv VALUES (3, 3);
COMMIT;

This is something @andreimatei has been pushing for a while, and it's something that his talk yesterday got me thinking more about.

The only real complication would be error handling. For instance, imagine all three of the INSERTs should result in errors. We should return the error for the first one, but if the third one observes an error first, it may disturb the first one and cause it to get a different error. This issue is shared between the parallel KV batches and the combined KV batch proposals.

@andreimatei
Copy link
Contributor

We should return the error for the first one, but if the third one observes an error first, it may disturb the first one and cause it to get a different error. This issue is shared between the parallel KV batches and the combined KV batch proposals.

How true is this for "combined KV batch"? How exactly can the third request in a KV batch influence the result for the first request in that batch?

Btw, as you know, I'm a fan of "transparently using parallel statement execution". We shouldn't let the error semantics wag the dog too much, I'd say.
And apropos nothing, on the use parallel statements vs KV combining batches, one point is that we generally should have a more flexible execution infrastructure and mapping between SQL queries and KV batches, because we also want the opposite of combining - we want large INSERTs to produce more than a single KV batch.

@tbg
Copy link
Member

tbg commented Oct 26, 2017

Another use case that a customer has asked about is inserting into a table and the corresponding interleaved table in one batch:

i.e. both inserts in the same sql statement, would that be batched in a single raft command? I actually haven't tested this with the pq driver and I'm not sure how the driver returns results for composed statements like that.

INSERT INTO foo VALUES('id', ...);INSERT INTO interleaved_foo VALUES ('id', 1, ...);

@bdarnell
Copy link
Contributor

Oracle has an extension for multi-table inserts: INSERT ALL INTO foo VALUES (...) INTO interleaved_foo VALUES (...).

I tried to kludge something together into a single "statement" using our support for subqueries (select * from [insert into foo values (...) returning 1] union select * from [insert into interleaved_foo values(...) returning 1]) but (unsurprisingly) couldn't get anything that could consolidate the writes into a single KV operation.

@petermattis
Copy link
Collaborator Author

Another use case that a customer has asked about is inserting into a table and the corresponding interleaved table in one batch.

Just encountered this again this morning. Customer isn't currently using interleaving, but their schema should be using it. If we could insert into both tables in a single query, we'd be able to hit the 1PC code path.

@nvanbenschoten
Copy link
Member

INSERT ALL INTO foo VALUES (...) INTO interleaved_foo VALUES (...) sounds like a good idea for this situation. If a client workload is depending on the 1PC code path for performance then I'd expect that they'd want to be very explicit that they're going to use it. Because of this, a single multi-table insert statement seems like a more ideal solution than telling users to hope that a semi-colon separated statement string doesn't get split by a client driver or somewhere else along the way such that we can't apply a transparent optimizing transform to coalesce the statements into the same batch. That all said, the latter is still something we'll want to do, regardless of the details of its implementation with a coalesced batch or parallel batches.

Also, is the range split point logic aware of table interleaving? Will it allow parent rows to be split from child rows? In other words, can a single batch containing a parent and child row always rely on using 1PC?

@petermattis
Copy link
Collaborator Author

Also, is the range split point logic aware of table interleaving? Will it allow parent rows to be split from child rows? In other words, can a single batch containing a parent and child row always rely on using 1PC?

Yes, I'm pretty sure we don't split ranges at child row boundaries.

@bdarnell
Copy link
Contributor

We do split within interleaved child tables - otherwise an entire interleave subtree would have to fit in a single range, which would be extremely limited. The only restriction is that we don't split between column families of the same row.

Once a child table has grown beyond the range size, it's no longer guaranteed that we'll be able to update the parent row and a child row in a 1PC transaction (but you might be able to optimize for this case by choosing the ordering of the child table's primary key).

@andreimatei
Copy link
Contributor

@nvanbenschoten I think I don't like the idea of new syntax for these coupled inserts. Or, rather, if there is new syntax to be introduced, I think it should more general, in the form of hints to our execution about the lookahead it should use when batching things together. I.e. a way of expressing, at the level of a session or a statement, that I'm willing to trade (or, rather, risk trading) a little bit of latency in receiving the result of the first INSERT for the benefit of allowing the 2nd INSERT to make it to the server and getting better overall latency for the whole transaction.

FWIW, I'm currently working on removing any and all limitations of our execution related to how the client organized statements in query strings. Whether or not a client driver will split query strings or not will be inconsequential to the execution engine (modulo the network delay between statements that may be introduced as a result). This, by itself, doesn't directly mean we can batch anything more than we do now, but it does mean that, were we to get batching, we'd also have batching across query strings.

@bdarnell
Copy link
Contributor

bdarnell commented Nov 1, 2017

The advantage of "new" syntax (it's not entirely new, since oracle has this syntax, although I don't know if any ORMs can take advantage of it) is that it works even with drivers that don't support sending multiple pipelined statements. We've seen some drivers that parse the query on the client, split it up into separate statements, then send those one at a time. For those drivers, it really needs to be a single statement (the UNION ALL/INSERT RETURNING version I posted before also satisfies this criteria)

@andreimatei
Copy link
Contributor

Point taken. If there's precedent for that INSERT syntax, I'm less against it :)

@nvanbenschoten
Copy link
Member

Moving this to the 2.1 milestone as I don't foresee us making any progress here in the next few weeks.

@nvanbenschoten nvanbenschoten removed this from the 2.0 milestone Jan 25, 2018
@nvanbenschoten nvanbenschoten added this to the 2.1 milestone Jan 25, 2018
@petermattis petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Mar 15, 2018
@petermattis
Copy link
Collaborator Author

Related to #16026.

@knz knz added the A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. label May 9, 2018
@knz
Copy link
Contributor

knz commented May 15, 2018

I think the initial proposal here is unsound if phrased as applicable in the general case, because the SQL logical error (if any, e.g. duplicate key) must be reported for the specific statement that causes it, not later.

This is also related to #25230.

The initial proposal must be altered, to only enable the optimization if it can be proven ahead of time that no SQL logical errors are possible:

  • the INSERT/UPDATE values do not include PK columns, or the PK is non-unique
  • no ON CONFLICT clause on INSERT, and no UPSERT
  • there is no CHECK or computed expression on any modified column
  • there are no FK constraints
  • etc

@nvanbenschoten
Copy link
Member

I'm going to close this. Transactional pipelining gives almost all of the benefits that we wanted from this while avoiding this proposal's issues with contention due to delayed intent generation. For details, see this proposal and why it was not pursued.

#22388 is still a specific but useful extension that I think we should consider if we care about interleaved tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-mutations Mutation statements: UPDATE/INSERT/UPSERT/DELETE. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

8 participants