-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new ADO.NET batching API #650
Comments
Tested MySQL 8.0.16 and MariaDB 10.4.5 and neither supports it. (They both return an ERR packet with the message "You have an error in your SQL syntax".) Moreover, it seems extremely unlikely to ever be supported based on the protocol definition: Alternatively, perhaps the commands would have to be prepared individually (once), then could be executed (multiple times) as a batch. However, Thus, for prepared statements it appears that they need to be prepared individually, then executed individually. |
MariaDB invented a |
That's great! Do you think you're going to be able to attempt an implementation over this anytime soon? |
I'd like to, but I suspect it will need a fairly significant overhaul of the current command execution and |
That makes sense, batching isn't just a feature you add easily. FWIW the combination of batching and prepared binary statements could be a big step forward for perf, especially if one day you do decide to persist prepared statements. Although the real question is what kind perf increase preparation delivers on MySQL (on PG it's quite decisive). |
Simple prepared commands are already supported and persisted. I also added it to TFB benchmarks: TechEmpower/FrameworkBenchmarks#4015 However, although local profiling of MySQL Server shows that 17% of statement execution time is spent in SQL parsing (and preparing statements should theoretically eliminate this), there was no appreciable throughput increase in TFB. (And I don't think that any of the top MySQL entrants are using prepared commands to achieve their speed.) |
OK, thanks for the info - I'm guessing more investigation/experimentation would be needed to understand exactly what's going on. It would be quite strange if prepared statements were supported by MySQL but had no (noticeable) perf advantages. |
I've got a simplistic version of batching (using MariaDB's Some benchmark code and results are at https://gist.github.com/bgrainger/0504a52065eeade9e4b30e88b6363f2c. The DbBatch API hasn't caused any problems so far, although I haven't implemented I'm not confident that I can implement CC @roji |
@bgrainger great to see this work! If I'm understanding correctly, at this point your batching implementation only does concatenation batching via text mode, right? And PreparedCommandsBatch would be the one actually batching in binary mode without any concatenation? Timeouts are indeed notoriously tricky, but I'm curious if batching introduces any specific difficulties that you don't encounter with regular non-batched command execution. Re DbException.BatchCommand, it's expected that not all providers will support all features. Just for comparison, PostgreSQL doesn't actually have a batch-specific protocol message: multiple Execute messages are simply pipelined one after the other. This means Npgsql receives the results of earlier Execute messages, and only then receives an error, allowing it to know which statement triggered the failure. PostgreSQL does have a useful protocol sync feature: after all Execute messages, Npgsql inserts a Sync messages, which tells PostgreSQL that in case of error all subsequent messages are discarded until that one. In any case, am looking forward to seeing the results for PreparedCommandsBatch! And obviously post back if you have any feedback on the batching API shape. |
No; for MariaDB, I haven't yet written the code to concatenate SQL for a batch sent to MySQL Server, but that is my planned implementation for non-MariaDB servers.
Yes, that's the plan. BTW, MariaDB allows multiple kinds of commands to be batched; for example a "prepare statement" and "execute statement" can be grouped and executed in one batch. There's also a (separate) bulk insert command that allows one prepared insert statement to be executed with multiple sets of values. I don't see an easy way to express either of these with the |
@divega and I discussed this at some point, calling it "SIMD". In fact, IIRC JDBC's batching API (for prepared statements) allows only this - no way to send heterogeneous statements in a single roundtrip. While "SIMD mode" may be a relatively common usage for batching, assuming prepared statements exist, there doesn't seem to be much extra value in it - one can simply execute the same prepared statement N times with different parameters within a single batch. How exactly that's exposed in a .NET API would be somewhat provider-specific; my expectation is that calling Prepare() on a batch would detect identical SQL and do the right thing. |
@roji Prepared Batched Commands are now implemented. Results updated here: https://gist.github.com/bgrainger/0504a52065eeade9e4b30e88b6363f2c As expected, there is no benefit for MySQL Server, because its protocol requires that prepared statements be sent individually. But for MariaDB, all "Batch" methods are a performance win. Even for MySQL Server, "BatchCommands" is faster than "Commands" because the driver (safely) concatenates multiple commands' statements together into as long a string as possible, and sends it all at once. (This is also new since the last update.) |
Thanks for the update @bgrainger, great to hear and even better to see those perf numbers! You're ahead of me, I haven't yet finished implementing this in Npgsql. Unfortunately the batching API didn't make it into .NET Core 3.0 or .NET Standard 2.1 - but the API will definitely go in. My plan for Npgsql is to expose the API as provider-specific, and when it makes it into .NET Core/Standard to simply multitarget and override for that TFM. You can do the same - I think the API shape itself is done and there's very little chance we'll make any further changes. |
Yes, I saw that it had been omitted from .NET Core 3; I too am planning to ship |
|
Neat, any measured performance advantages? It doesn't look like anything new is going into corefx SqlClient. Everything will go into Microsoft.Data.SqlClient. To start that process i ported over the internal RPC changes that make batch RPC easier/possible in dotnet/SqlClient#209 and once that's integrated i can look at integrating the batching work on top of it. The addition of always encrypted makes thing a little more complicated and has re-opened my consideration of a single-query-multiple-parameters mode |
FYI I probably have some unrelated work for the next few weeks, but after that the plan is definitely to re-tackle bringing in the batching API into corefx. If we can have clear benchmarks from MySQL, SqlClient and Npgsql that would be absolutely great. |
@Wraith2 Yes, see https://gist.github.com/bgrainger/0504a52065eeade9e4b30e88b6363f2c for benchmarks. (Note that for MySQL, there's no appreciable difference of batching over manually concatenating all the SQL and executing it with |
Agreed, as always i'm at the mercy of the sql team on the microsoft side. implementing batching without the rpc rework would be much more complicated and less performant. The RPC rework clashed with the introduction of AlwaysEncrypted and verifying that it's correct will need some careful review. If you also want batching in corefx SqlClient i've still got that branch but with the component being marked as "Archived component - limited churn/contributions" i assumed it wouldn't be going in. |
Yeah, we should definitely forget about System.Data.SqlClient - anything new at this point will be going into Microsoft.Data.SqlClient. Do you have an issue tracking the RPC rework? |
Just the PR linked above. You're right, i should probably raise issues. |
Yeah, that could be helpful. Everyone is now in post-release mode, but things will soon probably quiet down and hopefully the SqlClient team will have the bandwidth to start merging this work. |
MariaDB has opened an issue to remove their They suggest using pipelining instead, but that can break other MySQL servers (e.g. Aurora: #486 (comment)). A new extended server capability feature flag (to indicate that pipelining is supported by the server), or a connection string option that forces pipelining on, might let MySqlConnector use pipelining when it's safe and would provide a speedup. CC @roji |
Interesting... If I understand correctly, pipelining (unlike Regardless, assuming |
I think pipelining could be a viable approach to batching for non- RE: TechEmpower, the latest "ruling" I saw was that pipelining was allowed as long as it didn't require modifications to application code (https://www.techempower.com/blog/2018/10/30/framework-benchmarks-round-17/). I'm not sure if this can be implemented transparently for MySQL (I don't understand how it currently works for Postgres). But this seems like it might preclude the use of |
Yeah, explicit usage of a batching API is definitely disallowed in TechEmpower - though that doesn't preclude the batching API from being implemented over pipelining. However, note that batching typically also implies some semantics as well, and not just a perf improvement, e.g. an earlier failing statement causes later ones to be skipped (could be done via transactions too). I think that when people say pipelining, they usually mean that you can simply execute a new command on a connection before the previous one has completed (and its results consumed). Npgsql definitely doesn't support this, but I'm currently working on exactly that (I'm actually doing multiplexing, which would combine multiple commands from multiple producers/threads into the same physical connection, which is a step further. Some people call this pipelining I think, there's some terminological confusion...). The former version, as I understand it, would indeed require changes to application code: the application would execute a command, and then execute another without waiting for the first to complete; in that sense it's not far from an explicit batching API. I still think there's a lot of perf value in reducing the number of packets (and system calls), and am working in this direction. Batching is one way to go in this direction (with something like |
Great to see this @bgrainger. If you ran into any weirdness/difficulties with the API, don't hesitate to report - there's still time to amend it in ADO.NET. For my understanding, does the implementation make use of the (removed) COM_MULTI, or with JDBC-style protocol pipelining (as per #946)? Or is that work still to be done at some point? |
It seemed that the behaviour of The implementation does still make use of |
Thanks @bgrainger. Yes, SingleRow is indeed expected to behave as described; in general, CommandBehavior is expecetd to behave in the same way as it already does when a semicolon-batching command is executed. |
From comment:
The new API is defined in https://github.com/dotnet/corefx/issues/35135.
It seems like the general approach should be to concatenate all SQL from all commands in the batch into one string, substituting parameter values as we go. (That is, this doesn't provide any advantage over what the user can manually do already, but ensures that
MySqlDbBatch
isn't needlessly inefficient for database-independent code.)Special attention will need to be taken if the user is mixing
CommandType.Text
andStoredProcedure
in one batch, particularly if the stored procedure has out parameters.Prepare
is called?A WIP PR is available in #637.
The text was updated successfully, but these errors were encountered: