Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqliteException "no such column : rX.value" when ordering on subquery value in combination with a "where contains" when upgrading from 7 to 8.0.0-rc.2.23480.1 #32234

Closed
tim-stuyckens-materialise opened this issue Nov 4, 2023 · 9 comments · Fixed by #32456
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-bug
Milestone

Comments

@tim-stuyckens-materialise
Copy link

tim-stuyckens-materialise commented Nov 4, 2023

When upgrading
from

   <PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="7.0.5">
    <PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite" Version="7.0.5" />

to

    <PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="8.0.0-rc.2.23480.1">
    <PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite" Version="8.0.0-rc.2.23480.1" />

the following exception happens when running our test suite when querying

      Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 1: 'no such column: r4.value'.
         at Microsoft.Data.Sqlite.SqliteCommand.PrepareAndEnumerateStatements()+MoveNext()
         at Microsoft.Data.Sqlite.SqliteCommand.GetStatements()+MoveNext()
         at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
         at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
         at Microsoft.Data.Sqlite.SqliteCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
         at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
         at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
         at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
         at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()

Reason is the following part in the query (full query below)
SELECT "r4"."value"
FROM json_each(@__request_RecordDefinitionIds_0) AS "r2"

the actual test value" request.RecordDefinitionIds" in the c# query code is a list with one Guid. Previously this was translated as a regular equals (value was inline, as explained in https://devblogs.microsoft.com/dotnet/announcing-ef8-preview-4/ )
AND "t5"."RecordDefinitionId" = 'ACA990DF-C8BC-47FD-9CAF-B2FE4563D5EE'

C# code

query

            var pageNr = Math.Max(0, request.PageNr - 1);
            var iterationValuesQuery = _context.IterationValues
                .Include(x => x.Iteration)
                .ThenInclude(x => x.Record)
                .ThenInclude(x => x.RecordObject)
                .Where(r => request.RecordDefinitionIds.Contains(r.Iteration.Record.RecordDefinitionId));

            iterationValuesQuery = WhereMatchCriteria(request, iterationValuesQuery);

            var recordIdsWithMatchCriteriaCountQuery =  iterationValuesQuery
                .GroupBy(x => new { x.IterationId, x.Iteration.RecordId })
                .Where(x => x.Count() >= request.Criteria.Count)
                .Select(x => new
                {
                    x.Key.RecordId,
                    x.Key.IterationId,
                    //for debugging
                    Count = x.Count(),
                    //for sorting
                    MaxTimestamp = x.Select(xx=>xx.Iteration.Timestamp).Max(),
                });

            var recordIdsWithMatchCriteriaCount = await  recordIdsWithMatchCriteriaCountQuery.OrderBy(x => x.MaxTimestamp)
                .Skip(pageNr * request.PageSize)
                .Take(request.PageSize)
                .ToListAsync(cancellationToken);

Context.OnModelCreating

       var recordBuilder = modelBuilder.Entity<Record>().ToTable("Records", pdbSchema);
       recordBuilder.HasQueryFilter(x => x.TenantId == _tenantAccessor.GetTenantId());
       recordBuilder.HasMany(x => x.Iterations).WithOne(x => x.Record).IsRequired().OnDelete(DeleteBehavior.ClientCascade);

       var iterationBuilder = modelBuilder.Entity<Iteration>().ToTable("Iterations", pdbSchema);
       iterationBuilder.HasQueryFilter(x => x.TenantId == _tenantAccessor.GetTenantId());
       iterationBuilder.HasMany(x => x.IterationValues)
           .WithOne(x => x.Iteration).IsRequired().OnDelete(DeleteBehavior.ClientCascade);
       iterationBuilder.HasOne(x => x.Record).WithMany(x => x.Iterations).IsRequired();

       var iterationValueBuilder = modelBuilder.Entity<IterationValue>().ToTable("IterationValues", pdbSchema);
       iterationValueBuilder.HasQueryFilter(x => x.TenantId == _tenantAccessor.GetTenantId());

        var recordDefinitionBuilder = modelBuilder.Entity<RecordDefinition>()
            .ToTable("RecordDefinitions", pdbSchema);
        recordDefinitionBuilder.HasMany(x => x.Records)
            .WithOne(x => x.RecordDefinition)
            .OnDelete(DeleteBehavior.ClientNoAction);

SQL Sqlite 8.0.0-rc.2.23480.1

      SELECT "t"."RecordId", "i"."IterationId", COUNT(*) AS "Count", (
          SELECT MAX("t6"."Timestamp")
          FROM "IterationValues" AS "i4"
          INNER JOIN (
              SELECT "i5"."Id", "i5"."DatCre", "i5"."DatLu", "i5"."IterationState", "i5"."Number", "i5"."RecordId", "i5"."TenantId", "i5"."Timestamp", "i5"."UserCre", "i5"."UserLu"
              FROM "Iterations" AS "i5"
              WHERE "i5"."TenantId" = @__ef_filter__p_0
          ) AS "t4" ON "i4"."IterationId" = "t4"."Id"
          INNER JOIN (
              SELECT "r3"."Id", "r3"."DatCre", "r3"."DatLu", "r3"."EquipmentId", "r3"."IterationState", "r3"."RecordDefinitionId", "r3"."RecordObjectId", "r3"."TenantId", "r3"."UserCre", "r3"."UserLu"
              FROM "Records" AS "r3"
              WHERE "r3"."TenantId" = @__ef_filter__p_0
          ) AS "t5" ON "t4"."RecordId" = "t5"."Id"
          INNER JOIN (
              SELECT "i6"."Id", "i6"."DatCre", "i6"."DatLu", "i6"."IterationState", "i6"."Number", "i6"."RecordId", "i6"."TenantId", "i6"."Timestamp", "i6"."UserCre", "i6"."UserLu"
              FROM "Iterations" AS "i6"
              WHERE "i6"."TenantId" = @__ef_filter__p_0
          ) AS "t6" ON "i4"."IterationId" = "t6"."Id"
          WHERE "i4"."TenantId" = @__ef_filter__p_0 AND "t5"."RecordDefinitionId" IN (
              SELECT "r4"."value"
              FROM json_each(@__request_RecordDefinitionIds_0) AS "r4"
          ) AND "i"."IterationId" = "i4"."IterationId" AND "t"."RecordId" = "t4"."RecordId") AS "MaxTimestamp"
      FROM "IterationValues" AS "i"
      INNER JOIN (
          SELECT "i0"."Id", "i0"."RecordId"
          FROM "Iterations" AS "i0"
          WHERE "i0"."TenantId" = @__ef_filter__p_0
      ) AS "t" ON "i"."IterationId" = "t"."Id"
      INNER JOIN (
          SELECT "r"."Id", "r"."RecordDefinitionId"
          FROM "Records" AS "r"
          WHERE "r"."TenantId" = @__ef_filter__p_0
      ) AS "t0" ON "t"."RecordId" = "t0"."Id"
      WHERE "i"."TenantId" = @__ef_filter__p_0 AND "t0"."RecordDefinitionId" IN (
          SELECT "r0"."value"
          FROM json_each(@__request_RecordDefinitionIds_0) AS "r0"
      )
      GROUP BY "i"."IterationId", "t"."RecordId"
      HAVING COUNT(*) >= @__request_Criteria_Count_1
      ORDER BY (
          SELECT MAX("t6"."Timestamp")
          FROM "IterationValues" AS "i4"
          INNER JOIN (
              SELECT "i5"."Id", "i5"."DatCre", "i5"."DatLu", "i5"."IterationState", "i5"."Number", "i5"."RecordId", "i5"."TenantId", "i5"."Timestamp", "i5"."UserCre", "i5"."UserLu"
              FROM "Iterations" AS "i5"
              WHERE "i5"."TenantId" = @__ef_filter__p_0
          ) AS "t4" ON "i4"."IterationId" = "t4"."Id"
          INNER JOIN (
              SELECT "r3"."Id", "r3"."DatCre", "r3"."DatLu", "r3"."EquipmentId", "r3"."IterationState", "r3"."RecordDefinitionId", "r3"."RecordObjectId", "r3"."TenantId", "r3"."UserCre", "r3"."UserLu"
              FROM "Records" AS "r3"
              WHERE "r3"."TenantId" = @__ef_filter__p_0
          ) AS "t5" ON "t4"."RecordId" = "t5"."Id"
          INNER JOIN (
              SELECT "i6"."Id", "i6"."DatCre", "i6"."DatLu", "i6"."IterationState", "i6"."Number", "i6"."RecordId", "i6"."TenantId", "i6"."Timestamp", "i6"."UserCre", "i6"."UserLu"
              FROM "Iterations" AS "i6"
              WHERE "i6"."TenantId" = @__ef_filter__p_0
          ) AS "t6" ON "i4"."IterationId" = "t6"."Id"
          WHERE "i4"."TenantId" = @__ef_filter__p_0 AND "t5"."RecordDefinitionId" IN (
              SELECT "r4"."value"
              FROM json_each(@__request_RecordDefinitionIds_0) AS "r2"
          ) AND "i"."IterationId" = "i4"."IterationId" AND "t"."RecordId" = "t4"."RecordId")
      LIMIT @__p_3 OFFSET @__p_2

Sql sqlite 7.05

      SELECT "t"."RecordId", "i"."IterationId", COUNT(*) AS "Count", (
          SELECT MAX("t6"."Timestamp")
          FROM "IterationValues" AS "i4"
          INNER JOIN (
              SELECT "i5"."Id", "i5"."DatCre", "i5"."DatLu", "i5"."IterationState", "i5"."Number", "i5"."RecordId", "i5"."TenantId", "i5"."Timestamp", "i5"."UserCre", "i5"."UserLu"
              FROM "Iterations" AS "i5"
              WHERE "i5"."TenantId" = @__ef_filter__p_1
          ) AS "t4" ON "i4"."IterationId" = "t4"."Id"
          INNER JOIN (
              SELECT "r1"."Id", "r1"."DatCre", "r1"."DatLu", "r1"."EquipmentId", "r1"."IterationState", "r1"."RecordDefinitionId", "r1"."RecordObjectId", "r1"."TenantId", "r1"."UserCre", "r1"."UserLu"
              FROM "Records" AS "r1"
              WHERE "r1"."TenantId" = @__ef_filter__p_2
          ) AS "t5" ON "t4"."RecordId" = "t5"."Id"
          INNER JOIN (
              SELECT "i6"."Id", "i6"."DatCre", "i6"."DatLu", "i6"."IterationState", "i6"."Number", "i6"."RecordId", "i6"."TenantId", "i6"."Timestamp", "i6"."UserCre", "i6"."UserLu"
              FROM "Iterations" AS "i6"
              WHERE "i6"."TenantId" = @__ef_filter__p_1
          ) AS "t6" ON "i4"."IterationId" = "t6"."Id"
          WHERE "i4"."TenantId" = @__ef_filter__p_0 AND "t5"."RecordDefinitionId" = 'ACA990DF-C8BC-47FD-9CAF-B2FE4563D5EE' AND "i"."IterationId" = "i4"."IterationId" AND "t"."RecordId" = "t4"."RecordId") AS "MaxTimestamp"
      FROM "IterationValues" AS "i"
      INNER JOIN (
          SELECT "i0"."Id", "i0"."RecordId"
          FROM "Iterations" AS "i0"
          WHERE "i0"."TenantId" = @__ef_filter__p_1
      ) AS "t" ON "i"."IterationId" = "t"."Id"
      INNER JOIN (
          SELECT "r"."Id", "r"."RecordDefinitionId"
          FROM "Records" AS "r"
          WHERE "r"."TenantId" = @__ef_filter__p_2
      ) AS "t0" ON "t"."RecordId" = "t0"."Id"
      WHERE "i"."TenantId" = @__ef_filter__p_0 AND "t0"."RecordDefinitionId" = 'ACA990DF-C8BC-47FD-9CAF-B2FE4563D5EE'
      GROUP BY "i"."IterationId", "t"."RecordId"
      HAVING COUNT(*) >= @__request_Criteria_Count_1
      ORDER BY (
          SELECT MAX("t6"."Timestamp")
          FROM "IterationValues" AS "i4"
          INNER JOIN (
              SELECT "i5"."Id", "i5"."DatCre", "i5"."DatLu", "i5"."IterationState", "i5"."Number", "i5"."RecordId", "i5"."TenantId", "i5"."Timestamp", "i5"."UserCre", "i5"."UserLu"
              FROM "Iterations" AS "i5"
              WHERE "i5"."TenantId" = @__ef_filter__p_1
          ) AS "t4" ON "i4"."IterationId" = "t4"."Id"
          INNER JOIN (
              SELECT "r1"."Id", "r1"."DatCre", "r1"."DatLu", "r1"."EquipmentId", "r1"."IterationState", "r1"."RecordDefinitionId", "r1"."RecordObjectId", "r1"."TenantId", "r1"."UserCre", "r1"."UserLu"
              FROM "Records" AS "r1"
              WHERE "r1"."TenantId" = @__ef_filter__p_2
          ) AS "t5" ON "t4"."RecordId" = "t5"."Id"
          INNER JOIN (
              SELECT "i6"."Id", "i6"."DatCre", "i6"."DatLu", "i6"."IterationState", "i6"."Number", "i6"."RecordId", "i6"."TenantId", "i6"."Timestamp", "i6"."UserCre", "i6"."UserLu"
              FROM "Iterations" AS "i6"
              WHERE "i6"."TenantId" = @__ef_filter__p_1
          ) AS "t6" ON "i4"."IterationId" = "t6"."Id"
          WHERE "i4"."TenantId" = @__ef_filter__p_0 AND "t5"."RecordDefinitionId" = 'ACA990DF-C8BC-47FD-9CAF-B2FE4563D5EE' AND "i"."IterationId" = "i4"."IterationId" AND "t"."RecordId" = "t4"."RecordId")
      LIMIT @__p_3 OFFSET @__p_2

provider and version information

EF Core version:
Database provider: (e.g. Microsoft.EntityFrameworkCore.Sqlite)
Target framework: (e.g. .NET 8.0)
Operating system:
IDE: 17.8.0 Preview 5.0

@tim-stuyckens-materialise tim-stuyckens-materialise changed the title SqliteException "no such column : rX.value" when ordering on subquery value in combination with a "where contains" SqliteException "no such column : rX.value" when ordering on subquery value in combination with a "where contains" when upgrading from 7 to 8.0.0-rc.2.23480.1 Nov 4, 2023
@ajcvickers
Copy link
Contributor

ajcvickers commented Nov 6, 2023

@roji This looks like a regression from 7.0 with JSON-based Contains on SQLIte. Runnable repro below; still fails on daily:

using (var context = new SomeDbContext())
{
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    var definitionIds = new[] { 1 };
    
    var pageNr = 1;
    var crCount = 2;
    
    var iterationValuesQuery = context.IterationValues
        .Include(x => x.Iteration)
        .ThenInclude(x => x.Record)
        .ThenInclude(x => x.RecordObject)
        .Where(r => definitionIds.Contains(r.Iteration!.Record.RecordDefinitionId));

    var recordIdsWithMatchCriteriaCountQuery =  iterationValuesQuery
        .GroupBy(x => new { x.IterationId, x.Iteration.RecordId })
        .Where(x => x.Count() >= crCount)
        .Select(x => new
        {
            x.Key.RecordId,
            x.Key.IterationId,
            //for debugging
            Count = x.Count(),
            //for sorting
            MaxTimestamp = x.Select(xx=>xx.Iteration.Timestamp).Max(),
        });

    var ps = 20;
    
    var recordIdsWithMatchCriteriaCount = await  recordIdsWithMatchCriteriaCountQuery.OrderBy(x => x.MaxTimestamp)
        .Skip(pageNr * ps)
        .Take(ps)
        .ToListAsync();
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlite("Data Source=test.db")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    public DbSet<IterationValue> IterationValues => Set<IterationValue>();

    private int GetTenantId() => 1;

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        var pdbSchema = "pdb";
        var recordBuilder = modelBuilder.Entity<Record>().ToTable("Records", pdbSchema);
        recordBuilder.HasQueryFilter(x => x.TenantId == GetTenantId());
        recordBuilder.HasMany(x => x.Iterations).WithOne(x => x.Record).IsRequired().OnDelete(DeleteBehavior.ClientCascade);

        var iterationBuilder = modelBuilder.Entity<Iteration>().ToTable("Iterations", pdbSchema);
        iterationBuilder.HasQueryFilter(x => x.TenantId == GetTenantId());
        iterationBuilder.HasMany(x => x.IterationValues)
            .WithOne(x => x.Iteration).IsRequired().OnDelete(DeleteBehavior.ClientCascade);
        iterationBuilder.HasOne(x => x.Record).WithMany(x => x.Iterations).IsRequired();

        var iterationValueBuilder = modelBuilder.Entity<IterationValue>().ToTable("IterationValues", pdbSchema);
        iterationValueBuilder.HasQueryFilter(x => x.TenantId == GetTenantId());

        var recordDefinitionBuilder = modelBuilder.Entity<RecordDefinition>()
            .ToTable("RecordDefinitions", pdbSchema);
        recordDefinitionBuilder.HasMany(x => x.Records)
            .WithOne(x => x.RecordDefinition)
            .OnDelete(DeleteBehavior.ClientNoAction);
    }
}

public class RecordDefinition
{
    public int Id { get; set; }
    public List<Record> Records { get; set; } = new();
}

public class IterationValue
{
    public int Id { get; set; }
    public int TenantId { get; set; }
    public Iteration? Iteration { get; set; }
    public int IterationId { get; set; }
}

public class Iteration
{
    public int Id { get; set; }
    public int TenantId { get; set; }

    public int RecordId { get; set; }
    public Record Record { get; set; } = null!;
    
    public List<IterationValue> IterationValues { get; set; } = new();
    public DateTime Timestamp { get; set; }
}

public class Record
{
    public int Id { get; set; }
    public int TenantId { get; set; }
    public List<Iteration> Iterations { get; set; } = new();
    public RecordDefinition RecordDefinition { get; set; }
    public RecordObject RecordObject { get; set; }
    public int RecordDefinitionId { get; set; }
}

public class RecordObject
{
    public int Id { get; set; }
}

@roji
Copy link
Member

roji commented Nov 6, 2023

@ajcvickers thanks for the repro - I'll put this on my high-priority list.

@roji
Copy link
Member

roji commented Nov 10, 2023

Done some deep investigation into this, here are some preliminary observations.

Minimal repro query:

_ = await context.IterationValues
    .Where(r => ids.Contains(r.IterationId))
    .GroupBy(x => x.IterationId)
    .Select(x => new
    {
        x.Key,
        MaxTimestamp = x.Select(x => x.Iteration.Timestamp).Max(),
    })
    .OrderBy(x => x.MaxTimestamp)
    .ToListAsync();
Full minimal repro code
await using var context = new BlogContext();

await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();

var ids = new[] { 1 };

_ = await context.IterationValues
    .Where(r => ids.Contains(r.IterationId))
    .GroupBy(x => x.IterationId)
    .Select(x => new
    {
        x.Key,
        MaxTimestamp = x.Select(x => x.Iteration.Timestamp).Max(),
    })
    .OrderBy(x => x.MaxTimestamp)
    .ToListAsync();

public class BlogContext : DbContext
{
    public DbSet<IterationValue> IterationValues => Set<IterationValue>();

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class IterationValue
{
    public int Id { get; set; }
    public Iteration? Iteration { get; set; }
    public int IterationId { get; set; }
}

public class Iteration
{
    public int Id { get; set; }

    public List<IterationValue> IterationValues { get; set; } = new();
    public DateTime Timestamp { get; set; }
}
SELECT [i].[IterationId] AS [Key], (
    SELECT MAX([i5].[Timestamp])
    FROM [IterationValues] AS [i4]
    INNER JOIN [Iteration] AS [i5] ON [i4].[IterationId] = [i5].[Id]
    WHERE [i4].[IterationId] IN (
        SELECT [i6].[value]
        FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i6]
    ) AND [i].[IterationId] = [i4].[IterationId]) AS [MaxTimestamp]
FROM [IterationValues] AS [i]
WHERE [i].[IterationId] IN (
    SELECT [i0].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i0]
)
GROUP BY [i].[IterationId]
ORDER BY (
    SELECT MAX([i5].[Timestamp])
    FROM [IterationValues] AS [i4]
    INNER JOIN [Iteration] AS [i5] ON [i4].[IterationId] = [i5].[Id]
    WHERE [i4].[IterationId] IN (
        SELECT [i6].[value] -- Incorrect [i6] - should be [i3]
        FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i3]
    ) AND [i].[IterationId] = [i4].[IterationId])
  • MaxTimestamp here is both projected and ordered by; this is why we get duplicated SQL under both the ORDER BY and the SELECT.
  • Crucially, since MaxTimestamp is a scalar subquery, that means we have a single SelectExpression (inside the subquery) that's referenced from two different parts in the query (projection, ordering).
  • During translation, we run RelationalInferredTypeMappingApplier on the query (but this can be any visitor). That visitor updates the OPENJSON expression (applying the type mapping), meaning that the SelectExpression inside the IN gets rewritten.
  • And so we end up in SelectExpression.VisitChildren, where we create a new SelectExpression to replace the one that has changed (code).
  • We copy the list of table references for the new SelectExpression, but the TableReferenceExpressions themselves stay the same.
    • The point of TableReferenceExpression is for ColumnExpressions to reference tables indirectly - via their containing SelectExpression and alias; this allows us to cheaply modify a SelectExpression's tables - but also the SelectExpression itself - without having to recursively hunt down all ColumnExpressions within that SelectExpression and update them.
  • At this point, we must go over the TableReferenceExpressions and update them to point to the new SelectExpression (code). But since there are two references to the SelectExpression, the other SelectExpression still references these TableReferenceExpression instances, which we've just updated to point to the new SelectExpression.
  • As a result, we end up with the other SelectExpression pointing to the original table, but having a column referencing a TableReferenceExpression that points to the new SelectExpression (and through it, to the new table). This is a referential integrity bug.

This seems like an infrastructural bug that was present before, and can be triggered regardless of the recent Contains/JSON changes; any time a visitor causes a SelectExpression to be replaced, and that SelectExpression is referenced from multiple places, this bug would manifest itself.

One easier/short-term way to fix this, is to recursively to over the new SelectExpression and replace the TableReferenceExpressions themselves (rather than updating them, since they may be shared). This would be relatively inefficient (this needs to be done any time we replace a SelectExpression), and makes the concept of TableReferenceExpression quite redundant (ColumnExpressions might as well just reference directly, and we'd update them).

However, more generally, we should not end up in a situation where the same SelectExpression gets referenced from two places in the query; this is very inefficient SQL, since e.g. the potentially expensive subquery above needs to be evaluated twice. Instead, we should perform a pushdown to a subquery and apply the OrderBy over that. If we never have a SelectExpression referenced from multiple places, this problem wouldn't occur.

@roji
Copy link
Member

roji commented Nov 12, 2023

Note specifically on this bug:

As a result, we end up with the other SelectExpression pointing to the original table, but having a column referencing a TableReferenceExpression that points to the new SelectExpression (and through it, to the new table). This is a referential integrity bug.

This in itself isn't necessarily a bug; having a column from one select pointing to the table of another select is OK, as long as things are fully immutable (as expression trees where meant to be). The problem here comes from the fact that table alias uniquification actually mutates the TableReferenceExpression; this is what makes the sharing of TableReferenceExpressions across SelectExpressions problematic.

So ignoring the more general problem of duplicate subqueries in SQL (split out to #32277), a targeted fix here would either:

  1. Duplicate the TableReferenceExpressions when duplicating the SelectExpression in SelectExpression.VisitChildren. Note that there are other places which duplicate SelectExpression (not just VisitChildren), they have the same bug (at least in theory).
  2. Make AliasUniquifier not mutate, but rather replace the ColumnExpression. This starts to move us in the direction of immutability, and away from the original design of TableReferenceExpression; it also adds more compile-time processing.

@roji
Copy link
Member

roji commented Nov 13, 2023

Removing servicing-consider as a risk here is likely to be non-trivial and somewhat risky. One targeted workaround for the very specific regression above (with Contains) is to set the SQL Server compatibility level to a low value, which will tricker reverting back to the previous transaction (which has no subquery and therefore doesn't trigger this).

@ajcvickers ajcvickers added this to the 9.0.0 milestone Nov 16, 2023
@tim-stuyckens-materialise
Copy link
Author

Hi,
If I understand the compatibility workaround correctly this will work for MS SQL (similar to what is described at https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-8.0/breaking-changes#mitigations )

Any suggestion on how this can be mitigated on SQLite or MySQL?
Context: We use SQLite in our integration tests (this ticket was also about SQLite).

@zlepper
Copy link

zlepper commented Nov 22, 2023

This is also hitting us when updating from dotnet 7 til 8. While the compatibility workaround could work for sqlserver, it doesn't exactly help us for sqlite, and is completely blocking is from upgrading to dotnet 8.

@roji
Copy link
Member

roji commented Nov 22, 2023

Everyone, thanks for reporting, I'll look at patching this for 8.0.

roji added a commit to roji/efcore that referenced this issue Nov 29, 2023
roji added a commit to roji/efcore that referenced this issue Nov 29, 2023
roji added a commit to roji/efcore that referenced this issue Nov 29, 2023
roji added a commit to roji/efcore that referenced this issue Dec 2, 2023
roji added a commit to roji/efcore that referenced this issue Dec 2, 2023
@roji roji removed this from the 9.0.0 milestone Dec 3, 2023
@roji
Copy link
Member

roji commented Dec 3, 2023

Reopening to consider patching.

@roji roji reopened this Dec 3, 2023
roji added a commit to roji/efcore that referenced this issue Dec 4, 2023
roji added a commit to roji/efcore that referenced this issue Dec 4, 2023
@ajcvickers ajcvickers added this to the 8.0.x milestone Dec 5, 2023
@ajcvickers ajcvickers added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. Servicing-approved and removed Servicing-consider labels Dec 5, 2023
@ajcvickers ajcvickers modified the milestones: 8.0.x, 8.0.2 Dec 6, 2023
roji added a commit to roji/efcore that referenced this issue Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants