-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor INSERT performance with ExecuteNonQuery() and many parameters #974
Comments
ExecuteNonQuery()
and many parameters
What is the real world use case for using so many parameters? |
I've personally come across this with:
|
This is likely to be addressed in .net 6 for most major providers with dotnet/runtime#28633 I'll try to take a look at this when I can get some time but I make no promises. |
Any help on this issue would be greatly appreciated! As a specific example, I discovered this when benchmarking our ETL library since we use it for (portable) batch inserts. For now, we'll have to advise clients to either switch to the ODBC provider, or use the non-portable bulk insert component. Not the end of the world, but certainly less than ideal. |
Thanks for the comparisons. We'll take a look if some improvements can be made. |
This is an important point - one of the major incentives for having this new batching API was to remove the need to send so many parameters (which regardless of perf imposed an upper limit, since no more than 2100 parameters can be sent). |
@roji I'm very much looking forward to the upcoming .NET6 batch API. That said:
|
This is very true, though it's probably going to be possible (and advisable) to expose the batching API even for older TFMs - just without implementing the ADO.NET base classes. This would make the API accessible regardless of which TFM you're using (possibly even .NET Framework); that's my plan for Npgsql/PostgreSQL, in any case.
This is true - and at that point it's also a question of how common it is to have such a large number of columns etc. |
The implementation I've got for SqlClient 3.1/5.0 provides versions of the System.Data base classes needed for this so in theory it can be backported to any TFM the library supports. Of course we've still got to do the work to get the feature into 6.0 and then find the time to get it into SqlClient with encryption support, we might want to light a fire under that @roji 😁 |
Just had a quick look. You're using System.Data.SqlClient. Can you try it with Microsoft.Data.SqlClient from this repository and see if the numbers are any better to start with? |
@Wraith2 I'm already using both |
Ah, I see it now. Had to do some fiddling to profile it and broke something. My mistake. I've found some interesting areas that I was already aware of but were a way down my list of things to investigate. It's mostly string composition and array resizing overhead. As the parameter count goes up it spends more time allocating and re-allocating to grow and there's not a lot the caller can do about that. It can be reworked to avoid intermediate strings and look at pre-sizing of string builders and target arrays in a few places. Two things stand out that you could do for the benchmark. Can you keep pre-boxed ints that are used as parameter values around? Their boxing and unboxing is a big source of GC activity. I know that isn't reflectve of real world data but in real world scenarios if you have strongly typed input data of limited or frequently known ranges (bool, small ints) keeping cached boxed versions of some values around and using those will gain you some perf. If possible reusing SqlParameter instances would be good. They're quite large and you're using a lot of them. |
@Wraith2 I've added (and uploaded the code for) using pre-boxed int values to the test. Here's the original test (boxing of int values, only run 3 batches):
Instead running with pre-boxed int values makes no difference at all:
So creating For profiling work, you might set |
That just means the individual benches aren't running for long enough to be affected by memory issues which is a good thing in general but if you want to run these tasks for a long time preboxing and reusing things will save you some memory. There being no throughput difference isn't surprising I was just making observations. The current limiting factors are parallelism and buffer resizing as I said. |
You're absolutely right... I do have a few other things that need to be done first, but I'll do my best to finally finish this ASAP. |
I made the code execute about 20% faster and it made no difference to throughput. The code isn't the bottleneck, the network is. ODBC just appears to be more efficient about how the parameters are sent somehow. Since i can't change what needs to be put on the wire or how long it takes to send and receive the messages over the network there is no clear way to improve performance here. |
@Wraith2 interesting if this is actually a wire protocol data change - it may be worth comparing what they're doing with wireshark. If SqlClient is using TDS in an inefficient way that might be another thing to track somewhere... |
Sure. One of many many things to investigate at some point. My advice on this one would be to get closer to the sql server and attempt parallel imports. |
You can also try to compare how queries execute on server side by comparing SQL Server Profiler traces. That will tell you whether parameter metadata caching is adding benefit or not on client side. |
@Wraith2 When I run this single test case (and with
Certainly not proven, but it looks like:
The question is - why does the SQL Server process consume so much CPU when it receives these statements Can this be raised with the SQL Server team? |
I can provide you with the dev nuget package for my build if you want to try it out and verify the perf difference for yourself. |
Some more tests:
So it's not a recent issue. |
For shits and giggles you can give this a try and see that it changes nothing if you like. |
@Wraith2 Thanks. As expected by now, Microsoft.Data.SqlClient.3.0.0-dev.zip made no difference whatsoever.
Lovely image though :-) |
Profiling shows that query plans (including operator values) look identical: T-SQL Insert <-- Table Insert <-- Constant Scan The SqlClient and Odbc queries look identical, except for SqlClient using named instead of positional parameters (here with only 3 columns and 2 rows). The parameters being named might well be the reason for the performance difference. SqlClient:
Odbc:
Interestingly, when profiling with many (in this case 32 * 64 = 2048) parameters, using SqlClient the server reports both CPU consumption, physical writes and duration for every query, while using Odbc the server only reports CPU consumption and duration for the first query and no physical writes for any query. Server statistics per INSERT using SqlClient:
Server statistics per INSERT using Odbc:
So possibly the named parameters make SQL Server consume significant CPU on every query, whereas with positional parameters I don't know how to troubleshoot this further into SQL Server, do any of you have a handle on that? |
I think you'll have to open a support case for that. Once we get to the point where batching is possible would you be interested in trying out the preview builds for it? This is a use case that should be able to benefit from it and it would be useful to have some idea how it will react. |
exec sp_executesql N'INSERT INTO insertbenchmark_2_3_MicrosoftSync (d0,d1,d2)
VALUES (@R0_C0,@R0_C1,@R0_C2),(@R1_C0,@R1_C1,@R1_C2)',N'@R0_C0 int,@R0_C1 int,@R0_C2 int,@R1_C0 int,@R1_C1 int,@R1_C2 int',
@R0_C0=0,@R0_C1=1,@R0_C2=2,@R1_C0=3,@R1_C1=4,@R1_C2=5 If you just changed the last line of @name=value pairs to a value list without changing the values clauses would it still work? So the question is if the names are important if the names aren't specified in the value list. If it's still valid syntax (indicating @p\d is not special) then I could quirk the BuildParamList to just not output the names if an appcontext flag were set. |
@Wraith2 Based on my findings, I think that would be an interesting experiment. |
ODBC would only have it if the driver manager handed out a previously pooled connection and this was the first query to be executed. |
Both samples were run the same way from the example app provided above. Table setup was done and then I started capturing at a breakpoint and stopped immediately after the query succeeded. the command objects were newed up each time. Looking at a longer running trace the reset behaviour seems to follow what you said. So scratch that one. Next, no metadata? |
As per MS-TDS Specs, in RPC request, NoMetaData flag is described as: fNoMetaData Whereas (COLMETADATA) where NoMetaData is to be received in response does not apply to RPC Server response, so it's basically a no-op for SqlClient. You can verify response packets too. It certainly feels confusing. Why this option even exists in RPC request then, any thoughts @v-chojas? |
Manually decoded packets with annotations, download and diff to review. The differences are:
|
I wouldn't expect to see any difference with 16 parameters, but try looking at one with 512 parameters. That will cause both the SQL and parameter declaration list to be long enough not to fit in a regular nvarchar (4000 chars/8000 bytes). The expression to calculate the length of the parameter list in chars is |
What difference are you expecting? It's going to take a long time to manually breakdown a large request which is why I chose to use the short version to identify differences. If there's a particular thing you're looking for here hints would be good. If the only difference is string representation I doubt that's something that can be fixed in the driver. ODBC is using a more verbose format and is sending smaller packets so it really wouldn't make sense for it to be quicker than sqlclient. |
Is there any way to profile the SQL Server side to see what it's spending all that CPU time on when statements with many parameters come in (as #974 (comment) shows, that's a very clear difference between SqlClient and Odbc in how the server behaves)? |
It should switch over to a different type because regular nvarchar is limited to 4000 chars. If you post hexdumps of the packets (hex+ASCII) I can tell what's happening pretty easily. What type it's using and how it's sending it (PLP chunks can vary in size, but ODBC will send the parameter declaration list in one chunk) would give a clue about the perf difference. |
ODBC sends SQL and parameter declaration list as nvarchar(max):
SqlClient sends SQL and parameter declaration list as ntext:
The ntext is an old datatype that is deprecated, there is probably an internal conversion on the server side which causes the performance difference. |
I was trying to work out what 63 was and where it was being written in the driver. I'll have to see if I can unknot the spaghetti code around rpc construction enough to identity and possibly change the data type. If I get it working I'll post another build with some appcontext switches to try out. |
@KristianWedberg ok, give this one a try. All rpc parameters are sent as varchar(length) or varchar(max) depending on the size. netcore only at the moment. There are two appcontext switches you can use the [edit] removed link |
@Wraith2 I got an exception when parameters hit 512. This was with only
|
Interesting. So how do you specify an unbounded nvarchar? sqlParam.SqlDbType = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? SqlDbType.NVarChar : SqlDbType.NText;
sqlParam.Value = paramList;
sqlParam.Size = paramList.Length; would become this: sqlParam.SqlDbType = SqlDbType.NVarChar;
sqlParam.Value = paramList;
sqlParam.Size = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? paramList.Length : -1; [edit] |
@Wraith2 |
Now with the latest MDS 3.0.0-dev, and 256 parameters.
This should obviously be tested with >256 parameters (I got the exception to), which showed a large slowdown with the non-varchar version. Because of the higher performance, I'll start running with a larger total volume, so these throughput numbers are not directly comparable with previous tests. Again, the interesting bit is if >256 parametes will slow down or not.
|
I'm trying it but it isn't going well. The MetaType for varchar isn't PLP so i'm going to end up looking for size==-1 all over the place and even then it's going to touch a whole lot of places i'm really not happy about possibly breaking. I can only do the sync version because the async code in |
@KristianWedberg So am I reading the results right that the combination of omitting parameter names and using nvarchar gives better performance than ODBC? |
MDS 3.0.0-dev + Omitting parameter names is consistently 5-10% faster than Odbc at 256 parameters.
Again - |
You can test longer with this. Horrible hackjob, absolutely don't use it for anything other than this testing and only in sync mode or it'll crash like before. |
Success! 🎆 A quick test shows that the latest MDS (and with both switches enabled) scales well all the way to 2048 parameters (at 1,184,713 parameters/s), staying 5-10% faster than Odbc all the way! Full results will take longer. Note: I did realize I had one issue in my testing setup until now: I was still giving MDS less work (since it used to be up to 20x slower). It might not have been the latest change that did the trick, it could e.g. be just the |
OK. Well consider that a proof of concept and now we know it's possible to make it work it'll need to be worked on to make it production capable. I need to learn how the driver is currently handling varchar(max) because it's clearly using some trickery if I've had to add workarounds to make it send a long string correctly. This could take some time. |
To sum up the performance measurements:
Here are repeatable results using the earlier
So performance looks great. Functionality wise, I'll highlight that my benchmark only uses each parameter once, in the same order they're written in the query. I'm guessing reusing a named parameter multiple times in the query requires additional testing. [Edited]
|
Oh, that's surprising but welcome.
yes, definitely. I expect it not to work.
That's got to be down to how the name mapping is implemented on the server. I doubt anyone expected it to reach the limit of parameters, that might be something @v-chojas could explore. As I think I said earlier the falloff curve looks list iteration instead of map lookup which makes sense because list tends to be faster than map when items<~100. |
I'm not able to replicate the improved test results with the latest MDS package (hackjob) from @Wraith2 or the previous package. I've triple checked my MDS DLL and I'm pretty sure I'm using the correct one (modified dates line up to the correct times in my build output) and I've tried with and without the UseODBCParameterFormat switch. Performance still falls off around 256 params on all runs I try. 🤔 @KristianWedberg Can you push any changes you made to the test to your repo? |
@David-Engel Yes, that will be due to running with too small a data volume for the fast MDS. I've refreshed the benchmark code to address that (for me each test case now takes about 1 second). You'll have to upgrade from MDS 2.1.2. |
Description
When inserting rows with
ExecuteNonQuery()
and placeholder parameters, performance(measured as inserted parameters per second) with the
Microsoft.Data.SqlClient
v2.1.1and
System.Data.SqlClient
providers is progressively impacted from about 50+ parametersper statement, both for single rows and for multirow batches.
Inserting with 2048 parameters in the statement is up to 7 times slower than with
128 parameters.
The issue is not present when using the
System.Data.Odbc
provider (or non-SQL Serverproviders & databases), and performance is up to 25 times slower (when using 2048 parameters
per statement) than with the ODBC provider.
I've tested with Windows 10, .NET5.0 and a local SQL Server 2019 database, varying the
number of columns from 1 to 1024, and number of rows in each batch from 1 to 1000.
Note: The 'waviness' at 1000 and 2000 parameters is just an effect of the 1000/1024 and 2000/2048
numbers being separated in the graph - see the write-up for details:
The target table is un-indexed and the inserts uses SQL-92 syntax:
While table valued parameters and bulk inserts don't have this issue, they won't
help when inserting a single or just a few very wide rows at a time, so it would be
really useful to have this addressed (or identify any gremlins on my part!)
Please see the full write-up with charts, tables, BenchmarkDotNet info etc. at:
https://github.com/KristianWedberg/sql-batch-insert-performance
Reproduce
Fully runnable source (using BenchmarkDotNet) and instructions at:
https://github.com/KristianWedberg/sql-batch-insert-performance
Expected behavior
Using more parameters (beyond 128) in an insert row or batch should increase throughput,
measured as the number of parameters inserted per second, just like it does with
System.Data.Odbc
and other non-SqlClient providers.Technical details
The text was updated successfully, but these errors were encountered: