-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce repeating parameters #28768
Comments
This is more complicated than it looks. For example, two different parameters can have the same .NET value, but have a different facet on them (size, precision, scale); in that case we cannot deduplicate them as above. You can start comparing facets as well, but in #28769 you're also slowing down the general case for everyone, since every time a parameter is added you iterate through the entire parameter list to do the check; I don't think this makes sense just for the case where a parameter value is duplicated (that's generally rare). This could maybe be done more efficiently, but the complexity and overhead are probably not worth it; most parameter values are very short (number, short string). One additional thing to keep in mind is that some providers don't support referencing the same parameter twice from SQL (e.g. ODBC uses ? as the parameter placeholder). This in itself isn't a blocker (we could disable this optimization there, and we probably don't support those scenarios well anyway), but it's yet more added complexity. |
Right. I thought this would be better as a postprocessing step. Right before making the final query, substituting parameter names by using the string version of the values that would en up in the query. But then what if someone relied on the parameter names. So it's not significant enough of an optimisation to warrant the cost/complexity... |
One more point here, is that if someone really is sending duplicate parameter values a lot, they should probably reconsider their schema and possibly redesign it in a way which avoids that. In other words, the parameter deduplication only saves on the network bandwidth, but everything is still duplicated in the database itself. So I'm again not so sure this is a good idea. |
Where I noticed this for my case, these were foreign keys. Are there any standardized cases where we could measure how much of an impact this could have? |
@roji This might be interesting as a specific optimization in the update pipeline. All the FK values that are for the same column and come from relationship with the same principal (hence same value) could be a single parameter in the batch. |
That's an interesting idea - you're thinking of a scenario where we insert a principal with many dependents, right? One problem is that this wouldn't work with the new batching API, where the same parameter set isn't shared across different SQL statements (each batch command has its own parameter set). The exception to that is SQL Server bulk insert mode, where we use a single command with MERGE, so that would still work even with the new batching API. Also, foreign keys tend to be particularly small (typically an int, maybe a GUID), so the gains may not be worth it.
Do you mean a scenario where you're inserting many rows in the same SaveChanges which largely have the same data in them? That seems like a very specific scenario... In any case, I'm not sure about the relevance of the "cutoff of using insert/update statements to modify many rows... The SQL Server provider does use MERGE when inserting multiple rows (but not updating), but regardless of whether INSERT or MERGE is used, the parameters are essentially the same. |
Yes or moving dependants from one principal to another. For example 20 students change to a new class at the end of a semester. 20 student rows would get their ClassId column changed, for this 20 duplicate parameters would be made.
Ah so large dataset changes would suffer from the same problem. Sorry I'm not too familiar with everything. Disregard the "cutoff" comment. I assumed large operations woudln't work with similar queries. Another scenario where this might happen is in select statements where you use multiple |
Use a temp parameter table to be used across the transaction. 🙃 |
Not sure I understand this one - where would this |
Also think about default values.
|
@JeremyMahieu Have you measured the impact on performance (via ADO.NET) ? |
No nothing of this nature. Could I use the benchmarks that are in the EFCore project for this? I'd like some sort of standardized scenario. |
You're right, I have never done this, would be vary rare/specific. |
Not really - this is a very specific scenario, none of our benchmarks specifically check for repeated parameter values. But it should be trivial to do an ADO.NET benchmark (best to leave EF out of it) with BenchmarkDotNet, which measures the actual impact on duplicated vs. deduplicated parameters. I suspect you won't get a very meaningful difference. |
Sending the additional parameters probably doesn't have a significant performance impact but the number of rows inserted per batch does have an impact on performance, and the maximum you can insert per batch is limited by the number of parameters you can include in one command. The example above of 21 parameters per row means inserting a max of ~100 rows at a time. It's possible a larger batch would be more optimal, if a reduced number of parameters allowed it. |
@stevendarby that's theoretically true, though typically the 2100 parameter limit is rarely an actual limiting factor for batching. The long-term plan for the 2100 parameter limit is to switch to using the new ADO.NET batching API (#18990); this would mean that instead of a single multistatement command, where all statements share a single parameter collection (as today), we'd batch together multiple commands, each with its own parameter collection. This would mean that the 2100 parameter limit will apply to a single statement, rather than to the entire batch. Unfortunately, for this to work, both EF and SqlClient would have to implement the batching API, which will likely take some time. In any case, trying to increase batch sizes by deduplicating parameter values would only work for the very specific cases where values are actually duplicated. While I'm sure this would possibly help some specific scenarios, it overall seems like a questionable reason to go into parameter deduplication, which has complexity and performance overhead costs that everyone would need to pay. |
Sure, it's a consideration for @JeremyMahieu's benchmarking if they do end up doing that. |
FWIW there is a related issue for reducing parameters in queries - #24476. This issue seems distinct from it, but just mentioning in case it's worth considering them together. |
Note from triage: we don't plan to do anything here as of now. Compelling perf analysis could change that. |
When doing a query where multiple values are inserted, these values are turned into parameters and added multiple times, even if the values are the same.
It would be a shorter query, less network data and shorter logs if the parameters were not repeated. Potentially some performance savings at database side also. At the cost of checking if there parameter already exists.
The text was updated successfully, but these errors were encountered: