Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for STRING_SPLIT when using .Contains() #15593

Closed
jeradrose opened this issue May 3, 2019 · 8 comments
Closed

Add support for STRING_SPLIT when using .Contains() #15593

jeradrose opened this issue May 3, 2019 · 8 comments

Comments

@jeradrose
Copy link

jeradrose commented May 3, 2019

Summary

.Contains() in a .Where() expression translates to a SQL query that uses literal values, causing poor plan caching. SQL Server 2016 supports STRING_SPLIT, which can be used to pass in a single string parameter containing a delimited list of values. This is a feature request to add support so that .Contains() can leverage STRING_SPLIT to drastically improve plan caching.

Steps to repro

Currently when trying to match a list of values, .Contains() within a .Where() clause gets translated to a list of literal values.

For example, this:

var ids = new int[] { 1, 2, 3 };
var users = myDbContext
    .Users
    .Where(u => ids.Contains(u.Id))
    .Select(u => u.Id)
    .ToList();

Gets translated to:

SELECT [u].[Id]
FROM [Users] AS [u]
WHERE [u].[Id] IN (1, 2, 3)

Unfortunately, this creates a massive number of query plans for every combination of values, which often means around one per query that's executed, which has two big downsides:

  1. SQL Server can't leverage plan caching for these queries
  2. For those using Query Store, it blows up the amount of data being collected (which has the downsides of storage usage, as well as performance overhead of reading/writing the Query Store data)

EF6 would translate these to a set of OR expressions with a large number of parameters. This is a bit better for plan caching, but can be expensive for EF, and still requires a plan for each number of parameters.

Proposed solution

One solution that solves this is for EF Core to support the SQL Server 2016 STRING_SPLIT function to allow a single string to be passed in for the list of values in a single parameter, allowing a single plan to be used for each query.

We came up with an extension to use .FromSql() to support STRING_SPLIT:

public static IQueryable<TSource> WhereIn<TSource, TCollection>(
    this IQueryable<TSource> query,
    Expression<Func<TSource, TCollection>> keySelector,
    IEnumerable<TCollection> values
) where TSource : class {
    var key = ((MemberExpression)keySelector.Body).Member;
    var tableName = _EntityTypeTableNameCache[key.DeclaringType];
    var columnName = key.Name;
    var joinedValues = String.Join(',', values);
    var valueCast = (typeof(TCollection) == typeof(int)) ? "CAST(Value AS INT)" : "Value";
 
    return query.FromSql(
        sql: $"SELECT * " +
        $"FROM {tableName} " +
        $"WHERE {columnName} IN (" +
        $"SELECT {valueCast} FROM STRING_SPLIT({{0}}, ',')" +
        $")",
        joinedValues);
}

Which allows us to do the same query as above:

var ids = new int[] { 1, 2, 3 };
var users = myDbContext
    .Users
    .WhereIn(u => u.Id, ids)
    .Select(u => u.Id)
    .ToList();

But instead gets translated to a query using STRING_SPLIT:

exec sp_executesql N'
SELECT [u].[Id]
FROM (
    SELECT * FROM Users WHERE Id IN (SELECT CAST(Value AS INT) FROM STRING_SPLIT(@p0, '',''))
) AS [u]',N'@p0 NVARCHAR(4000)',@p0=N'1,2,3'

Unfortunately, this solution also has some downsides:

  1. It can't be chained with multiple .WhereIn() filters, since .FromSql() can't be chained.
  2. This only works with IQueryable<>, and we don't yet have a solution to work with Expression<Func<>>, which we sometimes use for reusable filtering.
  3. It also doesn't work with ICollection<>, so can't be used on .Any() in subqueries, such as:
    var users = myDbContext.Users.Where(u => u.Addresses.Any(a => a.WhereIn(a => a.Id, ids))).ToList();

So this is a request for updating the .Contains() SQL translation to support STRING_SPLIT natively, which should address all of the above issues.

Note that this is similar to this issue, except I'm actually not asking for a solution for parameterizing each value -- I'm proposing for the values to be sent in as a delimited string in a single parameter, specifically for STRING_SPLIT support in SQL Server 2016.

Further technical details

EF Core version: 2.2.4
Database Provider: Microsoft.EntityFrameworkCore.SqlServer
Operating system: Windows 10 x64
IDE: Visual Studio 2017 15.9.11

@ErikEJ
Copy link
Contributor

ErikEJ commented May 3, 2019

Notice that using this requires the user database to be compatibility level 130, as well as hosted on SQL Server 2016+ https://docs.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-2017

@jeradrose
Copy link
Author

Absolutely. This definitely wasn’t meant to be a fix for everyone, but it’s a really great option for those that are on SQL Server 2016, and can use 130 compatibility level.

@ajcvickers ajcvickers added this to the Backlog milestone May 3, 2019
@jzabroski
Copy link

@jeradrose How do you suggest the query translator behaves when the server reaches it's 2100 parameter limit?

In the mean time, if this is affecting your query plan cache, I suggest setting up an agent job and periodically purging one-time query plans from the cache/store that are more than X minutes old. I've had good luck with this in environments where developers don't understand SQL and create big messes.

@jeradrose
Copy link
Author

@jzabroski what I'm requesting will only require a single parameter for a comma-separated string, regardless of the number of values. So this shouldn't be impacted by the 2100 parameter limit. In fact, I believe that STRING_SPLIT accepts a varchar(max) for the string parameter and returns a table-value list, so it should support a very large number of values.

Re: purging the plan cache, we have looked into this a bit, and actually have set up optimize for ad hoc workloads to relieve the strain a bit on single-use plans, but still would like to be able to reuse plans for IN() queries, as well as have them report/be aggregated in Query Store.

@jzabroski
Copy link

jzabroski commented May 3, 2019

Sorry, good point. Do you know if this is faster than supplying a SqlClient table valued parameter? I guess the other side of things is I imagine most SQL dialects support table valued parameters, so which do you think is most portable?

As for purging the plan cache, YMMV but the basic idea is to get the top 10 most expensive ad-hoc query plans with a single use count, and evict them from the cache by their plan handle. The core thing to stress here is that this will be especially beneficial on servers with 32GB or less memory.

Glen Barry has a good script for identifying the offenders - you just need to take his output and call DBCC FREEPROCCACHE on the individual plan_handle values of the highest cache memory-intensive plans. Good luck.

@bradmarder
Copy link

bradmarder commented May 9, 2019

This is a vastly underrated feature which would make a huge impact.

Please also make this work for .OrderyBy() as well. Something like this.

var ids = new [] { 1, 2, 3 };
users.OrderBy(x => ids.Contains(Id))
SELECT * FROM Users
ORDER BY
	CASE
		WHEN Id in (SELECT value FROM STRING_SPLIT('1|2|3', '|')) THEN 0
		ELSE 1
	END

@roji
Copy link
Member

roji commented Apr 4, 2023

Duplicate of #13617

@roji roji marked this as a duplicate of #13617 Apr 4, 2023
@roji
Copy link
Member

roji commented Apr 4, 2023

Work on implementing Contains via SQL Server OPENJSON is currently in progress; this is similar to STRING_SPLIT but is better in various ways (e.g. quoting issues).

@roji roji removed this from the Backlog milestone Apr 4, 2023
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants