Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: LINQ to table parameter #13239

Closed
awr opened this issue Sep 6, 2018 · 20 comments
Closed

Feature Request: LINQ to table parameter #13239

awr opened this issue Sep 6, 2018 · 20 comments
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported

Comments

@awr
Copy link

awr commented Sep 6, 2018

Use Case: Ability to use query against a variable number of inputs, for a batching scenario. Ideally I'd like to have the same query plan used whether there are 2 input values or 100 input values, in conjunction with a linq query.

Hypothetical linq:

public class MyDbContext : DbContext
{
    // ... or however it makes sense to define a udt
    public DbParameter<IdUdt> IdTable { get; set; }
}

public class IdUdt : DbTableParameter<long>
{
    // not sure if this would be necessary, but could enable the With function below
    protected override void Populate(long value)
    {
        this.Id = value;
    }

    public long Id { get; set; }
}

public static class Execution
{
    public static async Task<List<Person>> GetWithTableParameterAsync(IEnumerable<long> ids, CancellationToken cancellationToken) 
    {
        using (var db = new MyDbContext()) {
            var query = from person in db.People
                        join id in db.IdTable.With(ids) on person.PersonId equals id.Id
                        select person;
            return await query.ToListAsync(cancellationToken);
        }
    }
}

Ideally this generates t-sql that looks something like:

declare @p0 dbo.udt_Id
insert into @p0 values(1)
insert into @p0 values(2)
insert into @p0 values(3)
insert into @p0 values(4)

exec sp_executesql N'SELECT 
    [Extent1].[PersonID] AS [Id], 
    [Extent1].[Name] AS [Name]
    FROM [dbo].[People] AS [Extent1]
    INNER JOIN @p0 i ON t.[PersonID] = i.[ID]',N'@ids [dbo].[udt_TableBigintId] READONLY',@ids=@p0

Note that I realize I can already do something similar with a Contains -- something like:

    public static async Task<List<Person>> GetWithInClauseAsync(IEnumerable<long> ids, CancellationToken cancellationToken) 
    {
        using (var db = new MyDbContext()) {
            var query = from person in db.People
                        where ids.Contains(person.PersonId)
                        select person;
            return await query.ToListAsync(cancellationToken);
        }
    }

The difference is in the sql that gets generated, since it embeds the id values into the query (or if using an expression tree walker, I believe it's possible to change this to have n equality checks with sql parameters). I'd prefer a single query plan to either of these solutions.

I also realize that it's possible to use table parameters with raw sql, but the challenge there is that I can't easily inject it into the middle of a complex query if I want to use LINQ.

@ajcvickers ajcvickers modified the milestones: 2.2.0, 2.2.0-preview2 Sep 7, 2018
@awr
Copy link
Author

awr commented Sep 21, 2018

Anything I can do to help this along? I'd be happy to put together a PR if you point me in the general direction of where you'd want to see this (assuming it fits with your direction for the project).

@divega
Copy link
Contributor

divega commented Sep 21, 2018

Hi @awr your issue was assigned to me as investigation in 2.2 with the idea that I would do some thinking to try to decide if this was a duplicate of an existing issue or if there was something in your proposal that we wanted to pursue.

I did a search and I actually couldn't find an active issue in our backlog that covers using TVPs in this way.

We have certainly done some thinking about using TVPs as an implementation detail for things like Enumerable.Contains(). For example, a query like this (based on your examples) would use TVPs automatically:

 public static async Task<List<Person>> GetWithTableParameterAsync(
     IEnumerable<long> ids, 
     CancellationToken cancellationToken) 
 {
     using (var db = new MyDbContext()) {
         var query = 
             from person in db.People
             where ids.Contains(person.PersonId)
                     select person;
         return await query.ToListAsync(cancellationToken);
     }
 }

However, there some challenges with doing this automatically. One of them is the fact that SQL Server requires table types to be declared in the database before you can use them in a TVP. Hence we have come up with other alternative implementations that presumably would have lower impact, like #12777.

But I believe what you are proposing is really interesting. On one hand it could be used to solve the same scenario above, but instead of Enumerable.Contains(), you would explicitly join the two query roots like in your sample code. On the other hand, it seems to be a more general feature that could have other applications.

I think I have been able to come up with two orthogonal new capabilities that we could add which I think would generalize the idea even more:

  1. New SQL Server specific extension method for EntityTypeBuilder and QueryTypeBuilder to indicate you want the corresponding table type to be created. The name could be something like ForSqlServerDeclareTableType(). I like the option of adding this for entity types and query types, which we already support, more than adding yet a third way of declaring a "shape" in the model.

  2. A new FromData() core query operator: This could be very similar to FromSql in the sense that it would be used to override the data source of the query root in that particular query, but instead of specifying that you need to execute some SQL, you would provide the data in-line or though a variable. This method would normally evaluate in memory, but in the particular case of SQL Server , it could cause the creation of a TVP and transference of the data to the server to be processed there. We would need to decide if that behavior would be triggered implicitly, whether it would depend on other parts of the query being evaluated on the server, or if it needs to be required explicitly.
    FromData() could support inputs similar to the existing HasData() method used for data seeding in model definition, and it could support streaming up TVPs if the input is an IEnumerable<T> or IAsynncEnumerable<T>. In case you pass actual entity instances to FromData(), we would need consider if the right behavior of the method would be to clone or actually use the same instances.
    Note that ToQuery() (the API for defining query) can already be used in the model to supply in-memory data.

Assuming we had these capabilities, your example could look something like this:

public class MyDbContext : DbContext
{
    // ... or however it makes sense to define a udt
    public DbQuery<LongRow> LongRows { get; set; }
   
    override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Query<LongRow>().ForSqlSserverDeclareTableType();
    }
}

public class LongRow 
{
    public long Id { get; set; }
}

public static class Execution
{
    public static async Task<List<Person>> GetWithTableParameterAsync(
        IEnumerable<long> ids, 
        CancellationToken cancellationToken) 
    {
        using (var db = new MyDbContext()) {
            var query = from person in db.People
                        join id in db.LongRows.FromData(ids => ids.Select(id => new LongRow {Id = id}))) 
                            on person.PersonId equals id.Id
                        select person;
            return await query.ToListAsync(cancellationToken);
        }
    }
}

Of course we could also have sugar to make this more terse, like an attribute on the type or DbSet or DbQuery property to automatically create the table type, and a version of FromData() that takes a collection of a single scalar and converts.

@divega divega removed this from the 2.2.0 milestone Sep 21, 2018
@divega
Copy link
Contributor

divega commented Sep 21, 2018

Note for triage: I think this belongs in the backlog for now, but since @awr is interested in contributing we can discuss the design in general terms.

@divega
Copy link
Contributor

divega commented Sep 18, 2019

Updated my previous comment to reflect that client evaluation isn't an option anymore after 3.0.

@AndersMalmgren
Copy link

AndersMalmgren commented Sep 18, 2019

I think the syntax should be more close to that of Set()

something along the lines of

IQueryable<TEntity> TableValueParameterSet<TEntityId>(IEnumerable<TEntityId> ids) where TEntityId : struct

By conventon EF could use a table value paramater type named $"ef_core_{nameof(TEntityId)}_tvp" or similar

@AndersMalmgren
Copy link

If EF core eventually support insert / delete / update directly on IQueryables you could even do

ctx.TableValueParameterSet(ids)
  .Join(ctx.Set<MyEntity>(), id => id, e => e.Id, (id, e) => e)
  .Delete();

If we extend the table value parameter mechanics to support custom entity types not just simple value types we could even bulk insert with a crazy high level of performance. Something like.

var batch = Enumerable.Range(0, 2000).Select(i => new MyEntity{ Foo = $"Hello World_{i}" }).ToList();
ctx.TableValueParameterSet(batch)
.Insert();

Possibilities are endless :D

@AndersMalmgren
Copy link

This issue need more love from the ef core team.

@Alexander-Bartosh
Copy link

Alexander-Bartosh commented Jul 26, 2021

Guys very nice Feature request by @awr and very nice implementation suggestion by @divega !
But is there any chance we can see this functionality in the EF core ?
Is there any other approach to build ONE EF LINQ query that is using/filtering on a TVP (e.g. list of Ids) ?

@hauntingEcho
Copy link

hauntingEcho commented May 19, 2022

Rather than using a temp table or a TVP, couldn't this translate to joining either a CTE or a VALUES list? for example:

SELECT a.*
FROM People AS a  
INNER JOIN (VALUES (1), (2), (3), (4)) AS vals_0(Id)
ON a.PersonID = vals_0.Id
;

or:

WITH cte_0(Id) AS (VALUES (1), (2), (3), (4))
SELECT a.*
FROM People AS a  
INNER JOIN cte_0
ON a.PersonID = cte_0.Id
;

My understanding is also that MS SQL's TVP implementation is a bit more specific to them, while WITH and VALUES are more generally supported (so hopefully more work could be re-used across providers)

@roji
Copy link
Member

roji commented May 23, 2022

@hauntingEcho both of the above suggestions involve embedding constant values in SQL, meaning that you get different SQLs for different values. That causes query plan fragmentation, which is bad for performance, and the thing that the OP tried to avoid (see above).

Note that the queries above are mainly about selecting rows with IDs in a given list. For this case specifically (which is usually express via LINQ Contains and translated to SQL IN), see #13617 which discusses some optimization techniques which are cross-database.

@hauntingEcho
Copy link

@roji that makes sense. I may have misread this ticket as being primarily about server-side joins on input data (with a bonus of good query plans from TVP) rather than primarily about the query plan on single-column raw data filters. My queries had only used a single column to simplify the examples.

@willnationsdev
Copy link

Just stumbled upon this issue. I would love to have a robust, built-in solution for this problem. However, until then, I also wanted to share an EF Core extension I found that accomplishes a similar task: EntityFramework.MemoryJoin (for anyone else who needs an immediate solution).

You basically just add a dedicated DBSet to your DbContext with a generic parameter of whatever DTO you want to use for your TVP so-to-speak. If it's protected, then EF Core won't try to generate an entire table for it in the database, but you'll still be able to incorporate instances of it into queries.

By default, they have a QueryModelClass type with 3 properties each of types string, bool, long, double, Guid, and DateTime. Should accommodate most use cases of joining on a single table's columns. However, if you need, say, 4 strings, you can create and provide your own DTO class to use instead.

I haven't successfully run a query that supplies multiple lists in a single request so not sure if that's supported or if I'm just doing things incorrectly. At least gives a general idea of how something like this could be implemented though.

@erwan-joly
Copy link

Just stumbled upon this issue. I would love to have a robust, built-in solution for this problem. However, until then, I also wanted to share an EF Core extension I found that accomplishes a similar task: EntityFramework.MemoryJoin (for anyone else who needs an immediate solution).

You basically just add a dedicated DBSet to your DbContext with a generic parameter of whatever DTO you want to use for your TVP so-to-speak. If it's protected, then EF Core won't try to generate an entire table for it in the database, but you'll still be able to incorporate instances of it into queries.

By default, they have a QueryModelClass type with 3 properties each of types string, bool, long, double, Guid, and DateTime. Should accommodate most use cases of joining on a single table's columns. However, if you need, say, 4 strings, you can create and provide your own DTO class to use instead.

I haven't successfully run a query that supplies multiple lists in a single request so not sure if that's supported or if I'm just doing things incorrectly. At least gives a general idea of how something like this could be implemented though.

sadly it does not seems to make it easy to add new type mapping like Nodatime or Postgres Enums.
Additionally it make all variables use a different name even when the value is unique which result in the query plan being different each times.

Would also love to see a robust solution for this problem

@roji
Copy link
Member

roji commented Nov 3, 2023

@erwan-joly how do you see this as relevant for NodaTime or PG enum support? Both these are already natively supported by the PostgreSQL provider etc.

@erwan-joly
Copy link

@erwan-joly how do you see this as relevant for NodaTime or PG enum support? Both these are already natively supported by the PostgreSQL provider etc.

I think I was unclear, what I meant was the other way around: it would be nice if the proposed solution was supporting those (and any type supported by the provider) not that the support of those types would change in anyway with such a solution.

The suggested solution EntityFramework.MemoryJoin does not work with those types and also create a new set of parameters at every run so it will always get a new query plan.

It only support a small subset of types: https://github.com/neisbut/EntityFramework.MemoryJoin/blob/master/src/EntityFramework.MemoryJoin/Internal/MappingHelper.cs#L321

it would be nice to be able to join with an in memory list for any supported type in the provider. And I believe having this in EF would result in each provider being able to add the support on their side with things like

‘select * from table t
join values(generatedFromMemoryList) as list(id, name,…)
On t.id = list.id and t.name = list.name’

@roji roji removed this from the Backlog milestone Nov 4, 2023
@roji
Copy link
Member

roji commented Nov 4, 2023

Everyone, EF 8.0 is introducing full support for primitive collections, which very much addresses what has been discussed above (see this blog post). Rather than using a SQL Server TVP (Table-Valued Parameter), we've gone with JSON arrays as the way to represent arrays.

For example, consider the following simple LINQ query using Contains:

var names = new[] { "Blog1", "Blog2" };

var blogs = await context.Blogs
    .Where(b => names.Contains(b.Name))
    .ToArrayAsync();

EF Core 7.0 translated this as follows:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (N'Blog1', N'Blog2')

The new 8.0 translation:

Executed DbCommand (49ms) [Parameters=[@__names_0='["Blog1","Blog2"]' (Size = 4000)], CommandType='Text', CommandTimeout='30']

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (
    SELECT [n].[value]
    FROM OPENJSON(@__names_0) WITH ([value] nvarchar(max) '$') AS [n]
)

This allow the same query plan to be reused regardless of parameters. And in addition, any LINQ operators can now be composed over the array parameter, not just Contains.

  • Are widely supported across different databases. TVPs are a SQL Server-only feature.
  • Do not require a type to be defined before being used in a query.
  • Can also be used as columns and not just as parameters. This allows storing e.g. an array of ints as a column in the database, and then using LINQ Contains() over that exactly as one can use it over a parameter.
  • Seem to provide at least better performance compared to TVPs (see this comment for the benchmarks).

So unless I'm mistaken, there's no longer a reason for this issue to be open; unless someone can come up with a specific reason why TVPs should be used (as opposed to a JSON array), EF 8.0 seems to already resolve the issues described here.

@erwan-joly
Copy link

Everyone, EF 8.0 is introducing full support for primitive collections, which very much addresses what has been discussed above (see this blog post). Rather than using a SQL Server TVP (Table-Valued Parameter), we've gone with JSON arrays as the way to represent arrays.

For example, consider the following simple LINQ query using Contains:

var names = new[] { "Blog1", "Blog2" };

var blogs = await context.Blogs
    .Where(b => names.Contains(b.Name))
    .ToArrayAsync();

EF Core 7.0 translated this as follows:

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (N'Blog1', N'Blog2')

The new 8.0 translation:

Executed DbCommand (49ms) [Parameters=[@__names_0='["Blog1","Blog2"]' (Size = 4000)], CommandType='Text', CommandTimeout='30']

SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
WHERE [b].[Name] IN (
    SELECT [n].[value]
    FROM OPENJSON(@__names_0) WITH ([value] nvarchar(max) '$') AS [n]
)

This allow the same query plan to be reused regardless of parameters. And in addition, any LINQ operators can now be composed over the array parameter, not just Contains.

  • Are widely supported across different databases. TVPs are a SQL Server-only feature.
  • Do not require a type to be defined before being used in a query.
  • Can also be used as columns and not just as parameters. This allows storing e.g. an array of ints as a column in the database, and then using LINQ Contains() over that exactly as one can use it over a parameter.
  • Seem to provide at least better performance compared to TVPs (see this comment for the benchmarks).

So unless I'm mistaken, there's no longer a reason for this issue to be open; unless someone can come up with a specific reason why TVPs should be used (as opposed to a JSON array), EF 8.0 seems to already resolve the issues described here.

Seems perfect 🎉

@erwan-joly

This comment was marked as off-topic.

@roji

This comment was marked as off-topic.

@erwan-joly

This comment was marked as off-topic.

@ajcvickers ajcvickers added closed-no-further-action The issue is closed and no further action is planned. and removed type-enhancement area-query shay-loves-labels labels Jan 3, 2024
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-no-further-action The issue is closed and no further action is planned. customer-reported
Projects
None yet
Development

No branches or pull requests

10 participants