Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Contains with a collection of tuples (e.g. for filtering on composite keys) #11799

Open
olivierr91 opened this issue Apr 24, 2018 · 11 comments

Comments

@olivierr91
Copy link

olivierr91 commented Apr 24, 2018

Currently, the only supported method for using a "Contains" statement on a composite key in EFCore 2.1 is to use a tuple and it is evaluated client-side. EF Core should be able to translate it to SQL.

For example:

return DbContext.PurchaseOrders
                .Where(po => myKeys.Contains(new Tuple<string, int>(po.KeyPart1, po.KeyPart2)))
                .ToList();

Could this be added to the backlog?

@ajcvickers
Copy link
Contributor

Triage: Adding this to the backlog to support some kind of translation for this scenario, although it may not match exactly to the pattern above.

@voroninp
Copy link

Expressions do not support tuple literals, code will look ugly, regular classes (records or anonymous types) serve better, unless C# compiler starts supporting more expressions.

@roji
Copy link
Member

roji commented Dec 23, 2023

@voroninp for specifying value tuples in-line in the query, you can use ValueTuple.Create(1, 2) which isn't too bad. In any case, this issue tracks also translating parameterized (non-inline) tuple lists, where this isn't a problem.

@roji
Copy link
Member

roji commented Dec 23, 2023

In any case, this issue specifically tracks using tuples - other types of complex collections (e.g. records) may make sense - that's #31237.

@clement911
Copy link
Contributor

For reference, we have now started using the same technique that EF8 uses for single-column contains, and applied it to multi-column scenarios.

It looks like this (real code simplified for clarity so excuse any typo):

                 var compositeIds = new[] 
                { 
                    (key1: "v1", key2: 123), 
                    (key1: "v2", key2: 456), 
                    ... 
                };
                string jsonIds = System.Text.Json.JsonSerializer.Serialize(compositeIds);
                var pJsonIds = new SqlParameter("@jsonIds", jsonIds) { Size = -1 };

                const string query = $"""
     SELECT d.* FROM dbo.Data d
     INNER JOIN 
     (
        	SELECT JSON_VALUE(t.VALUE, '$.key1') AS key1, JSON_VALUE(t.VALUE, '$.ke2') AS key2
        	FROM OPENJSON(@jsonIds) t
     ) t
     ON d.key1 = t.key1 AND d.key2 = t.key2
     """;

                var data = await ctx.Set<Data>().FromSqlRaw(query, pJsonIds).ToArrayAsync();

We use JSON_VALUE but I think it would be better to use the WITH clause of OPENJSON instead to extract the 2 columns in their correct data type (the second one should be an int).

This has helped us to lower the number of query plans.

Obviously it's not ideal that we have to build the query ourselves but hopefully EF can implement something similar.

@roji
Copy link
Member

roji commented Feb 23, 2024

@clement911 yep, having EF support this is indeed what this issue tracks. This is unfortunately not trivial to do, so I'm not sure we'll be able to do it for 9.0 - but it's high up on my list of query improvements.

And yes, definitely prefer using OPENJSON with WITH wherever possible - that offers various improvements, both in terms of query performance and SQL complexity.

@crozone
Copy link

crozone commented May 3, 2024

@clement911 I have been using a slightly different approach to solve the equivalent of WHERE IN for multiple columns, for example for the case of a table containing key value pairs.

For example, say there's a table of Property entities and each Property entity is a simple KVP with a string Key and a string Value.

If we have an array of KeyValuePair<string, string>[] searchKvps, we want to find all Property rows that match any of the input searchKvps. Ideally, we might want to do something like:

dbContext.Properties
.Any(p => searchKvps.Any(kvp => p.Key == kvp.Key &&  p.Value == kvp.Value))
.ToListAsync()

But this is currently untranslatable by EF.

Instead, I've been using the following method:

  1. Use a StringBuilder and a loop to construct a raw SELECT UNION SQL query containing the static key value pair values. For example: SELECT "Key1" as key, "Value1" as value UNION "Key2", "Value2" UNION ...

  2. Turn that into an IQueryable<KeyValuePairEntity> using var keyValueDataQuery = dbContext.Database.SqlQueryRaw<KeyValuePairEntity>(queryString, parameters)

Example method that can be placed in a DbContext:

public IQueryable<KeyValuePairEntity> GetKeyValuePairQuery(KeyValuePair<string, string?>[]? source)
{
    if (source is not null && source.Length > 0)
    {
        object?[] parameters = new object[source.Length * 2];
        for (int i = 0; i < source.Length; i++)
        {
            int dataIndex = i * 2;
            parameters[dataIndex] = source[i].Key;
            parameters[dataIndex + 1] = source[i].Value;
        }

        var queryStringBuilder = new System.Text.StringBuilder();
        for (int i = 0; i < source.Length; i++)
        {
            int dataIndex = i * 2;
            if (i == 0)
            {
                queryStringBuilder.Append($"SELECT {{{dataIndex}}} AS {nameof(KeyValuePairEntity.Key)}, {{{dataIndex + 1}}} AS {nameof(KeyValuePairEntity.Value)} ");
            }
            else
            {
                queryStringBuilder.Append($"UNION SELECT {{{dataIndex}}}, {{{dataIndex + 1}}} ");
            }
        }
        return this.Database.SqlQueryRaw<KeyValuePairEntity>(queryStringBuilder.ToString(), parameters!);
    }
    else
    {
        return this.Database.SqlQueryRaw<KeyValuePairEntity>($"SELECT NULL AS {nameof(KeyValuePairEntity.Key)}, NULL AS {nameof(KeyValuePairEntity.Value)} WHERE FALSE", Array.Empty<object>());
    }
}

Helper entity:

public class KeyValuePairEntity
{
    public string Key { get; set; }
    public string? Value { get; set; }
}
  1. Perform an INNER JOIN within the main query against the IQueryable<KeyValuePairEntity>, for example:
IQueryable<KeyValuePairEntity> keyValueDataQuery = dbContext.GetKeyValuePairQuery(kvps);

var matchingProperties = dbContext.Properties
.Join(
    keyValueDataQuery,
    outer => new { outer.Key, outer.Value },
    inner => new { inner.Key, inner.Value },
    (outer, inner) => outer
)
.ToListAsync();

This generates SQL that looks like:

SELECT "c"."id", "c"."key", "c"."value"
FROM "properties" AS "c"
INNER JOIN (
    SELECT @p0 AS Key, @p1 AS Value UNION SELECT @p2, @p3
) AS "p" ON "c"."key" = "p"."key" AND "c"."value" = "p"."value"

This can be further nested inside other queries, and used on relations of other objects with .Any() to accomplish a .Contains() like behaviour, which puts it into a subquery of WHERE EXISTS.

I'm not sure how the performance of SELECT UNION compares to the JSON approach, but it might be more compatible over all databases?

@dharmeshtailor
Copy link

dharmeshtailor commented Jun 3, 2024

Obviously it's not ideal that we have to build the query ourselves but hopefully EF can implement something similar.

I have created a method for your code, that I am using in our code :-) Thanks for sharing.

public IQueryable<T> GetQuery<T>(IEnumerable<T> source)
{
    string jsonParameter = JsonConvert.SerializeObject(source);
    List<string> fieldSelection = new List<string>();
    foreach (var prop in typeof(T).GetProperties())
    {
        fieldSelection.Add($"JSON_VALUE(t.VALUE, '$.{prop.Name}') AS {prop.Name}");
    }
    string sqlQuery = $"SELECT {string.Join(", ", fieldSelection)} FROM OPENJSON({{0}}) t";
    return Context.Database.SqlQueryRaw<T>(sqlQuery, [jsonParameter]);
}

@Hona
Copy link

Hona commented Oct 29, 2024

This would be nice for example

public class User
{
   int Id { get; set; }
   int TenantId { get; set; }
   ...
}

Then querying something like

LookupId[] lookupIds = [new(1, 1), new(1, 2)];

var foundUsers = await db
    .Users.Where(u => lookupIds.Any(l => l.Id == u.Id && l.TenantId == u.TenantId))
    .ToListAsync();

record LookupId(int Id, int TenantId);

I'd assume some variations on .Any, .All, .Contains etc for multiple column 'exists' queries

@stevendarby
Copy link
Contributor

stevendarby commented Dec 12, 2024

In any case, this issue specifically tracks using tuples - other types of complex collections (e.g. records) may make sense - that's #31237.

@roji Can I just check, is that the right issue? It's not entirely clear what that issue is about as it's tagged with area-change-tracking and other non-query related things. Is it more about storing collections? It also says "For value types see #31411" - which is yet another issue. But this issue is also about a value type, so...

As this issue is specifically about querying, is it worth repurposing this as general support for querying on collections - be they tuples or reference types?

As an aside, I've been thinking about this again in light of the possibility EF Core might revert to constants for scalar collection queries (#34347). That might have an impact on how complex collection queries are handled. I posted a workaround for SQL Server the other day (#35288 (comment), far from a complete solution); I'm thinking a constant version of that might involve using a VALUES collection, e.g.:
SELECT Id, Name FROM (VALUES (1, 'Bob'), (3, 'Dave')) AS x (Id, Name))

@roji
Copy link
Member

roji commented Dec 13, 2024

@stevendarby #31237 is about adding support for collections of complex types - in all areas (querying, tracking, saving...); it's very likely we'll do that for EF 10, as a replacement for using owned entity types to model JSON. This issue is specifically about Contains and tuples, as a way of interacting with composite keys and various other similar scenarios (it may make sense to also add inline/parameterized tuple collection support for other LINQ operators, but I'm not sure and Contains is definitely the high-value thing here). In any case, I don't think we're planning for full support for mapping tuple types, i.e. having a tuple property on your entity type, which would be mapped e.g. to a JSON document.

As this issue is specifically about querying, is it worth repurposing this as general support for querying on collections - be they tuples or reference types?

Maybe... At least for now, I think it makes sense to have a narrow focus here, where it concerns tuples; we can always change that in the future when we actually start implementing and understanding exactly how this intersects (and dosn't) with #31237.

Does that all make sense?

As an aside [...]

Yep, absolutely! That's in line with what we already do today with non-tuple, scalar collections:

var i = 8;
_ = await context.Blogs.Where(b => new[] { 1, i, 3 }.Where(x => x > 2).Contains(b.Id)).ToListAsync();

... translates to:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Id] IN (
    SELECT [v].[Value]
    FROM (VALUES (CAST(1 AS int)), (@__i_0), (3)) AS [v]([Value])
    WHERE [v].[Value] > 2
)```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants