Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OR clause allowing NULL is generated when joining on NullableProperty.Value #27071

Closed
sam-wheat opened this issue Dec 28, 2021 · 17 comments
Closed

Comments

@sam-wheat
Copy link

Linq below joins h.ModuleId (long?) to cmc.ModuleId (string). Both IDs can be null.

var query = from h in db.BatteryHistories
join cmc in db.CurrentMConfigurations on new { ModuleId = h.ModuleId.ToString() } equals new { ModuleId = cmc.ModuleId }
join m in db.Cu04tb09Ms on  new { M_ID = cmc.M_Number.ToString(), SvcType = cmc.ServiceType  } equals new { M_ID = m.IdMtr, SvcType = m.CdSvcTyp } 

Generated SQL:

SELECT g.*
FROM [BatteryHistory] AS [g]
INNER JOIN [CurrentMConfiguration] AS [c] ON (CONVERT(VARCHAR(20), [g].[ModuleId]) = [c].[ModuleId]) OR ([g].[ModuleId] IS NULL AND [c].[ModuleId] IS NULL)
INNER JOIN [CU04TB09_M] AS [c0] ON (CONVERT(VARCHAR(100), [c].[M_Number]) = [c0].[ID_MTR]) AND ([c].[ServiceType] = [c0].[CD_SVC_TYP])

Linq below joins h.ModuleId.Value (long) to cmc.ModuleId (string). Linq does not allow nulls since h.ModuleId.Value cannot be null. However generated SQL will still select nulls: "OR ([g].[ModuleId] IS NULL AND [c].[ModuleId] IS NULL)"

var query = from h in db.BatteryHistories
join cmc in db.CurrentMConfigurations on new { ModuleId = h.ModuleId.Value.ToString() } equals new { ModuleId = cmc.ModuleId }
join m in db.Cu04tb09Ms on  new { M_ID = cmc.M_Number.ToString(), SvcType = cmc.ServiceType  } equals new { M_ID = m.IdMtr, SvcType = m.CdSvcTyp } 

Generated SQL:

SELECT g.*
FROM [BatteryHistory] AS [g]
INNER JOIN [CurrentMConfiguration] AS [c] ON (CONVERT(VARCHAR(20), [g].[ModuleId]) = [c].[ModuleId]) OR ([g].[ModuleId] IS NULL AND [c].[ModuleId] IS NULL)
INNER JOIN [CU04TB09_M] AS [c0] ON (CONVERT(VARCHAR(100), [c].[M_Number]) = [c0].[ID_MTR]) AND ([c].[ServiceType] = [c0].[CD_SVC_TYP])

Noteably, no null handling is generated for the following Linq. Because cmc.M_Number is not nullable? This seems to be inconsistent with the above. If consistent, EF would handle NULL condition where nullable fields are compared:

join m in db.Cu04tb09Ms on  new { M_ID = cmc.M_Number.ToString(), SvcType = cmc.ServiceType  } equals new { M_ID = m.IdMtr, SvcType = m.CdSvcTyp }

Generated SQL:

INNER JOIN [CU04TB09_M] AS [c0] ON (CONVERT(VARCHAR(100), [c].[M_Number]) = [c0].[ID_MTR]) AND ([c].[ServiceType] = [c0].[CD_SVC_TYP])

Classes (all strings are nullable):

public partial class BatteryHistory
{
	public long? ModuleId { get; set; }
}


public partial class CurrentMConfiguration
{
	public decimal M_Number { get; set; }
	public string ModuleId { get; set; }
	public string ServiceType { get; set; }
}

public partial class Cu04tb09M
{
    public string IdMtr { get; set; }
	public string IdCommMod { get; set; }
	public string CdSvcTyp { get; set; }
}

Versions:
dotnet 5.0
EF 5.0.13

@maumar
Copy link
Contributor

maumar commented Dec 30, 2021

related issue: #18808

This is by design. EF follows the join semantics of linq to objects here - if join keys are anonymous objects, the nulls actually get matched in the join results. Linq to objects code:

        var entity1s = new List<Entity1>
        {
            new Entity1 { Id = 1, Name = "1" },
            new Entity1 { Id = 2, Name = "2" },
            new Entity1 { Id = 3, Name = null }
        };

        var entity2s = new List<Entity2>
        {
            new Entity2 { Id = 1, Name = "1" },
            new Entity2 { Id = 2, Name = null },
            new Entity2 { Id = 3, Name = "3" }
        };

        var l2o_null_match = from e1 in entity1s
                             join e2 in entity2s on new { x = e1.Name } equals new { x = e2.Name }
                             select new { e1, e2 };

        // matched: 1 and null
        var result_null_match = l2o_null_match.ToList();

        var l2o_null_not_match = from e1 in entity1s
                                 join e2 in entity2s on e1.Name equals e2.Name
                                 select new { e1, e2 };

        // matched: 1
        var result_null_not_match = l2o_null_not_match.ToList();

@maumar
Copy link
Contributor

maumar commented Dec 30, 2021

wrt second example (new { ModuleId = h.ModuleId.Value.ToString() } equals new { ModuleId = cmc.ModuleId }) - EF allows accessing nullable objects - we don't throw null refs, outside the projection. So h.ModuleId.Value.ToString still returns null for us if h.ModuleId was null.

wrt third example: EF has two "modes" of expanding null semantics - optimized and full. Optimized doesn't distinguish between null and false, whereas full expansion completely gets rid of nulls from the result (we convert everything to 2-value logic). Full expansion is used in the projection or when the comparison is negated. Simplified expansion can be used in regular predicates like:

entities.Where(e => e.SomeNullValue == e.SomeOtherNullValue)
entities.Where(e => e.SomeNullValue == e.SomeNonNullValue)

That second case is what you have in your example. Since one side can be null but the other never can, we can skip the null checks completely. If the left side value is null, the result of comparison is null, which is good enough for us. We don't need false here, since null will also filter out the results. For join predicate comparison we are using the simplified expansion that's why you don't see the null check.

if the comparison was negated:

entities.Where(e => e.SomeNullValue != e.SomeNonNullValue)

we need to do full expansion and it would look something like this: WHERE e.SomeNullValue <> e.SomeNonNullValue OR e.SomeNullValue IS NULL. If both sides of the comparison are nullable the expansion is more complicated.

@sam-wheat
Copy link
Author

Hi Maurycy, thank you for looking at this issue. Based on your comments please allow me to add addtional details.

Let us agree on the correct and desired behavior, which you stated in your reply:

Since one side can be null but the other never can, we can skip the null checks completely.

The small queries I have included below do not indicate that the desired behavior is being applied consistently. Please explain how this behavior is being applied in the example queries I have provided.

Your reference to #18808 is not clear to me as we are not comparing object references here.

the nulls actually get matched in the join results

The issue I am raising in this ticket is that I am joining on NullableColumn.Value. By definition the value being compared cannot be null, therefore a null check is not desirable.

So h.ModuleId.Value.ToString still returns null for us if h.ModuleId was null.

we don't throw null refs, outside the projection.

By using .Value I am explictly telling EF to treat the field as if it cannot be null. The expectation is not that an exception will be thrown but that EF will treat the value as non-nullable for purpose of comparison. WHERE NullableColumn IS NOT NULL is expected to appear in the WHERE clause - not as an OR clause in the join. Referring to .Value should be no different than referring to a column that has [Required] attribute set.

[Required] is set on the Address column. No null check is generated as expected:

// Both sides not nullable
var query30 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Address } equals new { Count = e2.Address } select new { e1, e2 };
var sql30 = query30.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]

[Required] is not set on the Name column. A null check is generated as expected:

// Both sides nullable
var query31 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Name } equals new { Count = e2.Name } select new { e1, e2 };
var sql31 = query31.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Name] = [e0].[Name]) OR ([e].[Name] IS NULL AND [e0].[Name] IS NULL)

Both sides are nullable in the query below but no null check is generated:

// Both sides nullable
var query3 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count equals e2.Count select new { e1, e2 };
var sql3 = query3.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

This query comparing Required strings does not generate a null check:

// Both sides not nullable 
var query24 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Address } equals new { ID = e2.Address } select new { e1, e2 };
var sql24 = query24.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]

However this query that is expected to compare values will allow nulls:

// Both sides not nullable 
var query33 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count.Value } equals new { ID = e2.Count.Value } select new { e1, e2 };
var sql33 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)

This is the same join as the query above however in this query no null check is generated:

// Both sides not nullable
var query34 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.Value  equals e2.Count.Value select new { e1, e2 };
var sql34 = query34.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

Tests: I find no consistent pattern of how null checks are generated.

class Program
{
    static void Main(string[] args)
    {
        Db db = new Db(new DbContextOptionsBuilder<Db>().UseSqlServer("Server=.;Database=EFNullComparison;Trusted_Connection=True;").Options);
        db.Database.EnsureDeleted();
        db.Database.EnsureCreated();
        db.Entity1s.Add(new Entity1 { Name = "1", Address = "1", Count = 1 });
        db.Entity1s.Add(new Entity1 { Name = "2", Address = "2", Count = 2 });
        db.Entity1s.Add(new Entity1 { Name = null, Address = "", Count = null });

        db.Entity2s.Add(new Entity2 { Name = "1", Address = "1", Count = 1 });
        db.Entity2s.Add(new Entity2 { Name = null, Address = "", Count = null });
        db.Entity2s.Add(new Entity2 { Name = "3", Address = "3", Count = 3 });

        db.SaveChanges();

        // Both sides nullable
        var query1 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Name select new { e1, e2 };
        var sql1 = query1.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Name] = [e0].[Name]

        // Both sides nullable
        var query21 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Name} select new { e1, e2 };
        var sql21 = query21.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Name] = [e0].[Name]) OR ([e].[Name] IS NULL AND [e0].[Name] IS NULL)

        // One side nullable
        var query22 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Address } select new { e1, e2 };
        var sql22 = query22.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]

        // One side nullable
        var query23 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Address select new { e1, e2 };
        var sql23 = query23.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]

        // Both sides not nullable
        var query2 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Address equals e2.Address select new { e1, e2 };
        var sql2 = query2.ToQueryString(); // INNER JOIN [Entity2s] AS[e0] ON[e].[Address] = [e0].[Address]

        // Both sides not nullable (compare to query21)
        var query24 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Address } equals new { ID = e2.Address } select new { e1, e2 };
        var sql24 = query24.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]

        // Both sides nullable
        var query3 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count equals e2.Count select new { e1, e2 };
        var sql3 = query3.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

        // Both sides nullable
        var query32 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count }  equals new { ID = e2.Count } select new { e1, e2 };
        var sql32 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)

        // Both sides not nullable (compare to query24)
        var query33 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count.Value } equals new { ID = e2.Count.Value } select new { e1, e2 };
        var sql33 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)

        // Both sides not nullable
        var query34 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.Value  equals e2.Count.Value select new { e1, e2 };
        var sql34 = query34.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]



        // Both sides not nullable
        var query4 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.Value equals e2.Count.Value select new { e1, e2 };
        var sql4 = query4.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

        // One side nullable
        var query5 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.ID equals e2.Count select new { e1, e2 };
        var sql5 = query5.ToQueryString(); // INNER JOIN[Entity2s] AS[e0] ON[e].[ID] = [e0].[Count]

        // Both sides not nullable
        var query6 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.ID.ToString() equals e2.Address select new { e1, e2 };
        var sql6 = query6.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON CONVERT(VARCHAR(11), [e].[ID]) = [e0].[Address]

        // Both sides not nullable
        var query7 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.ID } equals new { ID = e2.ID } select new { e1, e2 };
        var sql7 = query7.ToQueryString(); // INNER JOIN[Entity2s] AS[e0] ON[e].[ID] = [e0].[ID]

        // Both sides not nullable
        var query8 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.ID.ToString() } equals new { ID = e2.Address } select new { e1, e2 };
        var sql8 = query8.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON CONVERT(VARCHAR(11), [e].[ID]) = [e0].[Address]

        // One side nullable
        var query9 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count.ToString() } equals new { ID = e2.Address } select new { e1, e2 };
        var sql9 = query9.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON CONVERT(VARCHAR(11), [e].[Count]) = [e0].[Address]

        // Both sides nullable
        var query25 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Count.ToString() } equals new { Count = e2.Name } select new { e1, e2 };
        var sql25 = query25.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON (CONVERT(VARCHAR(11), [e].[Count]) = [e0].[Name]) OR ([e].[Count] IS NULL AND [e0].[Name] IS NULL)

        // One side nullable
        var query26 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.ToString() equals e2.Address select new { e1, e2 };
        var sql26 = query26.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON CONVERT(VARCHAR(11), [e].[Count]) = [e0].[Address]

        // One side nullable
        var query27 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count equals Convert.ToInt32(e2.Address) select new { e1, e2 };
        var sql27 = query27.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = CONVERT(int, [e0].[Address])

        // Both sides not nullable
        var query28 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Address = e1.Count.Value } equals new { Address = Convert.ToInt32(e2.Address) } select new { e1, e2 };
        var sql28 = query28.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = CONVERT(int, [e0].[Address])

        // One side nullable
        var query29 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Count.ToString() } equals new { Count = e2.Address } select new { e1, e2 };
        var sql29 = query25.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON (CONVERT(VARCHAR(11), [e].[Count]) = [e0].[Name]) OR ([e].[Count] IS NULL AND [e0].[Name] IS NULL)

        // Both sides not nullable
        var query30 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Address } equals new { Count = e2.Address } select new { e1, e2 };
        var sql30 = query30.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]

        // Both sides nullable
        var query31 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Count = e1.Name } equals new { Count = e2.Name } select new { e1, e2 };
        var sql31 = query31.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Name] = [e0].[Name]) OR ([e].[Name] IS NULL AND [e0].[Name] IS NULL)

        // One side nullable
        var query10 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.LongCount.ToString() } equals new { ID = e2.Address } select new { e1, e2 };
        var sql10 = query10.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Address]

        // Both sides nullable
        var query11 = from e1 in db.Entity1s join e2 in db.Entity2s on new { LongCount = e1.LongCount.ToString() } equals new { LongCount = e2.LongCount.ToString() } select new { e1, e2 };
        var sql11 = query11.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = CONVERT(VARCHAR(20), [e0].[LongCount])) OR ([e].[LongCount] IS NULL AND [e0].[LongCount] IS NULL)

        // Both sides nullable
        var query14 = from e1 in db.Entity1s join e3 in db.Entity3s on new { LongCount = e1.LongCount.ToString() } equals new { LongCount = e3.LongCount } select new { e1, e3 };
        var sql14 = query14.ToQueryString(); // INNER JOIN [Entity3s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[LongCount]) OR ([e].[LongCount] IS NULL AND [e0].[LongCount] IS NULL)

        // Both sides nullable
        var query15 = from e1 in db.Entity1s join e3 in db.Entity3s on new { LongCount = e1.LongCount.ToString() } equals new { LongCount = e3.Name } select new { e1, e3 };
        var sql15 = query15.ToQueryString(); // INNER JOIN [Entity3s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Name]) OR ([e].[LongCount] IS NULL AND [e0].[Name] IS NULL)

        // Both sides nullable
        var query16 = from e1 in db.Entity1s join e3 in db.Entity3s on new { X = e1.LongCount.ToString() } equals new { X = e3.Name } select new { e1, e3 };
        var sql16 = query16.ToQueryString(); // INNER JOIN [Entity3s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Name]) OR ([e].[LongCount] IS NULL AND [e0].[Name] IS NULL)
        
        // Both sides nullable
        var query17 = from e1 in db.Entity1s join e3 in db.Entity3s on new { X = e1.LongCount.ToString() } equals new { X = e3.Name } select new { e1, e3 };
        var sql17 = query17.ToQueryString(); // INNER JOIN [Entity3s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Name]) OR ([e].[LongCount] IS NULL AND [e0].[Name] IS NULL)
        
        // Both sides nullable
        var query18 = from e1 in db.Entity1s join e2 in db.Entity2s on new { X = e1.LongCount.ToString() } equals new { X = e2.Name } select new { e1, e2 };
        var sql18 = query18.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Name]) OR ([e].[LongCount] IS NULL AND [e0].[Name] IS NULL)

        // One side nullable
        var query19 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.LongCount.ToString() } equals new { ID = e2.Address } select new { e1, e2 };
        var sql19 = query19.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Address]
        
        // Both sides nullable
        var query20 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.LongCount.ToString() } equals new { ID = e2.Name } select new { e1, e2 };
        var sql20 = query20.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON (CONVERT(VARCHAR(20), [e].[LongCount]) = [e0].[Name]) OR ([e].[LongCount] IS NULL AND [e0].[Name] IS NULL)
    }


}

public class Entity1 
{
    public int ID { get; set; }
    public int? Count { get; set; }
    public long? LongCount { get; set; }
    public string Name { get; set; }
    [Required]
    public string Address { get; set; }
}

public class Entity2
{
    public int ID { get; set; }
    public int? Count { get; set; }
    public long? LongCount { get; set; }
    public string Name { get; set; }
    [Required]
    public string Address { get; set; }
}

public class Entity3
{
    public int? Count { get; set; }
    public string LongCount { get; set; }
    public string Name { get; set; }
    [Required]
    public string Address { get; set; }
}

public partial class CurrentConfiguration
{
    public string ServiceType { get; set; }
    public string SerialNumber { get; set; }
    public string ModuleId { get; set; }
}

public partial class BatteryHistory
{
    public long? ModuleId { get; set; }
}

public partial class CurrentConfiguration2
{
    public int ID { get; set; }
    public string ModuleId2 { get; set; }
}

public partial class BatteryHistory2
{
    public int ID { get; set; }
    public long? ModuleId2 { get; set; }
}

public class Db : DbContext
{
    public virtual DbSet<Entity1> Entity1s { get; set; }
    public virtual DbSet<Entity2> Entity2s { get; set; }
    public virtual DbSet<Entity3> Entity3s { get; set; }
    public virtual DbSet<CurrentConfiguration> CurrentConfigurations { get; set; }
    public virtual DbSet<BatteryHistory> BatteryHistories { get; set; }
    public virtual DbSet<CurrentConfiguration2> CurrentConfigurations2 { get; set; }
    public virtual DbSet<BatteryHistory2> BatteryHistories2 { get; set; }


    public Db(DbContextOptions<Db> options)
     : base(options)
    {
    }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        if (!optionsBuilder.IsConfigured)
        {
            optionsBuilder.UseSqlServer("Name=BatteryAnalysis");
        }
    }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.HasAnnotation("Relational:Collation", "SQL_Latin1_General_CP1_CI_AS");

        modelBuilder.Entity<Entity2>(entity =>
        {
            entity.HasKey(e => new { e.ID, e.Address });

            entity.ToTable("Entity2s", "SAP");

            entity.Property(e => e.Name)
                .HasMaxLength(1)
                .IsUnicode(false)
                .IsFixedLength(true);

    
            entity.Property(e => e.Address).HasMaxLength(18);

        });

        modelBuilder.Entity<Entity3>(entity =>
        {
            entity.HasNoKey();
        });


        modelBuilder.Entity<CurrentConfiguration>(entity =>
        {
            entity.HasKey(e => new { e.ServiceType, e.SerialNumber });

            entity.ToTable("CurrentConfiguration", "SAP");

            entity.Property(e => e.ServiceType)
                .HasMaxLength(1)
                .IsUnicode(false)
                .IsFixedLength(true);

            entity.Property(e => e.SerialNumber).HasMaxLength(18);

            entity.Property(e => e.ModuleId).HasMaxLength(18);

        });

        modelBuilder.Entity<BatteryHistory>(entity =>
        {
            entity.HasNoKey();
            entity.ToTable("BatteryHistory", "OWCE");
        });



       
    }
}	

@sam-wheat
Copy link
Author

Related: #27072

It is very suboptimal, since due to the unnecessary OR ((t."Id" IS NULL) AND (t1."ParentId" IS NULL)) clause in the inner query,

WHERE (t."Id" = t1."ParentId") OR ((t."Id" IS NULL) AND (t1."ParentId" IS NULL))

@roji
Copy link
Member

roji commented Jan 4, 2022

By using .Value I am explictly telling EF to treat the field as if it cannot be null. The expectation is not that an exception will be thrown but that EF will treat the value as non-nullable for purpose of comparison.

This is not how EF Core is currently designed. One important thing to note in this context, is that there's no analogous gesture for reference types - there's no way to explicitly tell EF Core that the field cannot be null. The new bang (!) operator does that logically, but is pure compiler syntax which does not exist in the expression tree which EF Core receives.

Note also that you can disable null compensation entirely by using relational null semantics; when using that, EF Core doesn't add any additional null checks, and your C# code gets translated to SQL as-is (nullability-wise). However, this is not a mode we generally recommend.

@sam-wheat
Copy link
Author

sam-wheat commented Jan 5, 2022

1.) My choice of words "the field cannot be null" is arguably poor. Most generally I want to know if there is there a way to tell EF to not generate ... OR t1.f1 IS NULL AND t2.f1 IS NULL. I think you understand my intent but I want to be clear. In my case that OR cluase makes my query time out (runs in 11 seconds without the OR).

2.) If the answer to 1.) is no, is it possible in a future release (edit) to not generate the OR if one side of the join expression results in a value type or if the join contains null coalescing logic? Adding ... WHERE t1.f1 IS NOT NULL for each nullable field in the join would be a nice touch also I think.

3.) Can you please explain the existing logic and intent as demonstrated by the examples I have provided?

@roji
Copy link
Member

roji commented Jan 6, 2022

1.) My choice of words "the field cannot be null" is arguably poor. Most generally I want to know if there is there a way to tell EF to not generate ... OR t1.f1 IS NULL AND t2.f1 IS NULL. I think you understand my intent but I want to be clear. In my case that OR cluase makes my query time out (runs in 11 seconds without the OR).

@sam-wheat when it generates a SQL equality (a = b), EF Core adds the nullability check if both side are nullable; This is simply to maintain the regular C# behavior where a = null returns true when a is null (but not in SQL). Now, what matters isn't the C# expression, but rather the SQL expression; so X.Value in SQL is simply the column X, and if that's nullable, the extra null checks are added.

Are you seeing a case where both sides of the SQL expression aren't nullable, but EF Core is still adding the checks? If so, can you provide the clear, runnable repro for that (including the model)?

Otherwise, if you don't want the extra checks even when the sides are nullable, then as I wrote below you can opt into relational null semantics.

2.) If the answer to 1.) is no, is it possible in a future release (edit) to not generate the OR if one side of the join expression results in a value type or if the join contains null coalescing logic? Adding ... WHERE t1.f1 IS NOT NULL for each nullable field in the join would be a nice touch also I think.

We could do an optimization like that, but would that really be significantly better? That would remove the extra SQL check from the JOIN clause, only to add one in a new WHERE clause...

3.) Can you please explain the existing logic and intent as demonstrated by the examples I have provided?

See my explanation above in 1 - by default EF Core recreates C# null semantics via the SQL it generates, so that users' LINQ queries return the same results from the database that they would if they were executed in-memory.

@roji
Copy link
Member

roji commented Jan 6, 2022

@sam-wheat one additional comment... Are you looking to remove the extra nullability checks because of performance? Because generally speaking they should not have a significant impact there (though that can always vary across databases, query shapes...). If so, have you measured performance with and without those checks, and confirmed there's an issue?

@sam-wheat
Copy link
Author

have you measured performance with and without those checks, and confirmed there's an issue?:

Query 2 runs in 2:12 and returns 7.6m rows. I manually commented out the OR clause.
Query 1 is the exact same except for the OR clause. Time to run is unknown as I am cancelling it after 5 mins.

var query = from h in db.Histories
join cmc in db.CMC on new { ModuleId = h.ModuleId.ToString() } equals new { ModuleId = cmc.ModuleId }
where h.ModuleId != null 
orderby h.ModuleId ascending
select new { H_ModuleID = h.ModuleId, C_ModuleID = cmc.ModuleId };

string sql = query.ToQueryString();

SELECT[g].[ModuleId] AS[H_ModuleID], [c].[ModuleId] AS[C_ModuleID]
FROM[OWCE].[History] AS[g]
INNER JOIN[SAP].[CMC] AS[c] ON(CONVERT(VARCHAR(20), [g].[ModuleId]) = [c].[ModuleId]) OR([g].[ModuleId] IS NULL AND[c].[ModuleId] IS NULL)
WHERE[g].[ModuleId] IS NOT NULL
ORDER BY[g].[ModuleId]

InkedCapture_L2
InkedCapture_LI

We could do an optimization like that, but would that really be significantly better?

I beleive it would be better in terms of control given to the developer. The following are very different queries. AFAIK it is not possible to achieve the latter when using a complex join:

... from t1 in db.t1s join t2 in db.t2s on new { N = t1.Nullable } equals new { N = t2.Nullable }...
... FROM T1 JOIN T2 ON T1.NULLABLE = T2.NULLABLE OR (T1.NULLABLE IS NULL AND T2.NULLABLE IS NULL)


... from t1 in db.t1s join t2 in db.t2s on new { N = t1.Nullable.Value } equals { N = t2.Nullable.Value }...
... FROM T1 JOIN T2 ON T1.NULLABLE = T2.NULLABLE WHERE T1.NULLABLE IS NOT NULL 

Runnable...

class Program
{
    private static List<Entity1> MemoryEntity1s;
    private static List<Entity2> MemoryEntity2s;

    static void Main(string[] args)
    {
        MemoryEntity1s = new List<Entity1>
        {
            new Entity1 { Name = "1", Address = "1", Count = 1 },
            new Entity1 { Name = "2", Address = "2", Count = 2 },
            new Entity1 { Name = null, Address = "", Count = null }
        };

        MemoryEntity2s = new List<Entity2>
        {
            new Entity2 { Name = "1", Address = "1", Count = 1 },
            new Entity2 { Name = null, Address = "", Count = null },
            new Entity2 { Name = "3", Address = "3", Count = 3 }
        };

        Db db = new Db(new DbContextOptionsBuilder<Db>().UseSqlServer("Server=.;Database=EFNullComparison;Trusted_Connection=True;").Options);
        db.Database.EnsureDeleted();
        db.Database.EnsureCreated();
        
        MemoryEntity1s.ForEach(e => db.Entity1s.Add(e));
        MemoryEntity2s.ForEach(e => db.Entity2s.Add(e));

        db.SaveChanges();

        var query1 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name, Count = e1.Count } equals new { Name = e2.Name, Count = e2.Count } 
                     where !string.IsNullOrEmpty(e1.Name) && !string.IsNullOrEmpty(e2.Name) && e1.Count.HasValue && e2.Count.HasValue
                     select new { E1Value = e1.Name, E2Value = e2.Name };

        var sql1 = query1.ToQueryString();
        var sqlResult1 = query1.ToList();


        // SELECT [e].[Name] AS [E1Value], [e0].[Name] AS [E2Value]
        // FROM[Entity1s] AS[e]
        // INNER JOIN[Entity2s] AS[e0] ON(([e].[Name] = [e0].[Name]) OR([e].[Name] IS NULL AND[e0].[Name] IS NULL)) AND(([e].[Count] = [e0].[Count]) OR([e].[Count] IS NULL AND[e0].[Count] IS NULL))
        // WHERE((([e].[Name] IS NOT NULL AND(([e].[Name] <> N'') OR[e].[Name] IS NULL)) AND([e0].[Name] IS NOT NULL AND(([e0].[Name] <> N'') OR[e0].[Name] IS NULL))) AND[e].[Count] IS NOT NULL) AND[e0].[Count] IS NOT NULL


        var query2 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Name } 
                     where ! string.IsNullOrEmpty(e1.Name) && ! string.IsNullOrEmpty(e2.Name) 
                     select new { E1Value = e1.Name, E2Value = e2.Name };
        
        var sql2 = query2.ToQueryString(); 
        var sqlResult2 = query2.ToList();

        // SELECT[e].[Name] AS[E1Value], [e0].[Name] AS[E2Value]
        // FROM[Entity1s] AS[e]
        // INNER JOIN[Entity2s] AS[e0] ON([e].[Name] = [e0].[Name]) OR([e].[Name] IS NULL AND[e0].[Name] IS NULL)
        // WHERE([e].[Name] IS NOT NULL AND(([e].[Name] <> N'') OR[e].[Name] IS NULL)) AND([e0].[Name] IS NOT NULL AND(([e0].[Name] <> N'') OR[e0].[Name] IS NULL))
    }
}

public class Entity1 
{
    public int ID { get; set; }
    public int? Count { get; set; }
    public string Name { get; set; }
    [Required]
    public string Address { get; set; }
}

public class Entity2
{
    public int ID { get; set; }
    public int? Count { get; set; }
    public string Name { get; set; }
    [Required]
    public string Address { get; set; }
}


public class Db : DbContext
{
    public virtual DbSet<Entity1> Entity1s { get; set; }
    public virtual DbSet<Entity2> Entity2s { get; set; }

    public Db(DbContextOptions<Db> options) : base(options) { }
}

@roji
Copy link
Member

roji commented Jan 7, 2022

We could do an optimization like that, but would that really be significantly better?

I beleive it would be better in terms of control given to the developer. The following are very different queries. AFAIK it is not possible to achieve the latter when using a complex join:

The question if these two SQLs are (significantly) different in terms of performance - is this something you've verified?

@roji
Copy link
Member

roji commented Jan 7, 2022

One more thing that could be helpful here, is the query plan for the various SQLs being discussed - could you post these? An easy way of getting a query plan is to execute SET SHOWPLAN_ALL ON and then to execute your SQL - SQL Server returns the plan instead of actually executing.

@sam-wheat
Copy link
Author

sam-wheat commented Jan 7, 2022

The question if these two SQLs are (significantly) different in terms of performance - is this something you've verified?

The difference of the OR clause is verified and the results are shown in my prior post.

I let it run while I was typing it is still going after over 20 mins:
Capture3

@sam-wheat
Copy link
Author

Understood. The difference of the queries is below. WHERE[g].[ModuleId] IS NOT NULL is common to both.

INNER JOIN[SAP].[CMC] AS[c] ON(CONVERT(VARCHAR(20), [g].[ModuleId]) = [c].[ModuleId]) OR([g].[ModuleId] IS NULL AND[c].[ModuleId] IS NULL)

INNER JOIN[SAP].[CMC] AS[c] ON(CONVERT(VARCHAR(20), [g].[ModuleId]) = [c].[ModuleId]) -- OR([g].[ModuleId] IS NULL AND[c].[ModuleId] IS NULL)

@sam-wheat
Copy link
Author

Requested:

Capture4

@roji
Copy link
Member

roji commented Jan 11, 2022

Triage discussion: @maumar to take a look if there's a possible workaround for avoiding the null compensation in the above join condition.

@roji roji closed this as completed Jan 11, 2022
@roji roji reopened this Jan 11, 2022
@maumar
Copy link
Contributor

maumar commented Jan 20, 2022

best way we currently have to avoid null compensation is to use relational null semantics like @roji pointed out, and manually add null compensation terms for the comparisons we actually want to behave like c#. e.g.:

///ctx is set up with UseRelationalNulls option enabled
ctx.Customers.Where(c => c.Relational1 == c.Relational2 && (c.CSharp1 == c.CSharp2 || (c.CSharp1 == null && c.CSharp2 == null)))

In case of join key comparison, relational null semantics are used as long as keys are not wrapped in anonymous objects (but this doesn't obviously work for composite key comparisons)

wrt consistency in @sam-wheat 's examples above, EF has several "dimensions" which decide if/how comparisons are translated:

  1. should we use relational or c# semantics

1.1 by default we mimic c# semantics so we will add extra terms so that null == null is not filtered out.
there are two exceptions to this:
1.2 using UseRelationalNulls context option, which globally applies relational-style comparison regardless of anything else
1.3 join key comparison - these per linq to object implementation use relational null semantics, null keys won't match, hence the query comprehension syntax uses a keyword equals, rather than just a == operator
however there is an exception to that in EF -
1.4 when the join keys are wrapped in anonymous type - in that case (on linq to objects) join equals comparison matches two anonymous objects which contain null values (like new { x = null }) - so on EF we apply c# null semantics expansion in this case.

  1. should we use simplified or "full" expansion

2.1 simplified expansion is the most common case, happens inside predicates where we don't care between false and null result. In that case when only one side is nullable and the other one is not we can completely skip the extra terms, if the nullable side is null, the result of comparison is null, but that's just as good as false for us
2.2 full expansion happens when we need precise result of the comparison, either true or false. we do it outside predicate (in projection - we don't want to return nulls for queries like customers.Select(c => c.SomeNullableValue == c.SomeOtherValue)), we also do this inside predicate in some cases, e.g. if the comparison is negated.

  1. what are the extra terms we need to add
    depends on simplified vs full expansion and if one or both elements are nullable
    for simplified we only need to add anything if both sides are nullable:

3.1 ?a == ?b -> a == b || (a == null && b == null)
if only one element is nullable, we don't need to do anything because if the nullable element is null we get null back which is fine, and since the other side can't ever be null, we never expect true result in this case, so:

3.2. ?a == b -> a == b as well as a == ?b -> a == b

3.3 for full expansion the translation is a bit tricky, you can see them all (along with truth tables) starting here, in case you are interested: https://github.com/dotnet/efcore/blob/main/src/EFCore.Relational/Query/SqlNullabilityProcessor.cs#L1804

now looping back to the cases you provided earlier:

        // Both sides nullable
        var query1 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Name select new { e1, e2 };
        var sql1 = query1.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Name] = [e0].[Name]

//maumar: 
//1.3 join key comparison -> using relational semantics -> no need for the extra terms
        // Both sides nullable
        var query21 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Name} select new { e1, e2 };
        var sql21 = query21.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Name] = [e0].[Name]) OR ([e].[Name] IS NULL AND [e0].[Name] IS NULL)

//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//3.1 both terms nullable so we do a == b || (a == null && b == null)
        // One side nullable
        var query22 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Address } select new { e1, e2 };
        var sql22 = query22.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]

//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//3.2 only one side is nullable, no need for the extra term
        // One side nullable
        var query23 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Address select new { e1, e2 };
        var sql23 = query23.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]

//maumar:
//1.3 join key comparison -> using relational semantics -> no need for the extra terms
        // Both sides not nullable
        var query2 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Address equals e2.Address select new { e1, e2 };
        var sql2 = query2.ToQueryString(); // INNER JOIN [Entity2s] AS[e0] ON[e].[Address] = [e0].[Address]

//maumar:
//1.3 join key comparison -> using relational semantics -> no need for the extra terms
        // Both sides not nullable (compare to query21)
        var query24 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Address } equals new { ID = e2.Address } select new { e1, e2 };
        var sql24 = query24.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]

//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//both terms are non-nullable so no need to do anything here
        // Both sides nullable
        var query3 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count equals e2.Count select new { e1, e2 };
        var sql3 = query3.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

//maumar:
//same as query1
        // Both sides nullable
        var query32 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count }  equals new { ID = e2.Count } select new { e1, e2 };
        var sql32 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)

//maumar:
//same as query21
        // Both sides not nullable (compare to query24)
        var query33 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count.Value } equals new { ID = e2.Count.Value } select new { e1, e2 };
        var sql33 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)

//maumar:
//per earlier comment, EF ignores the .Value and still treats Count as nullable property, so this is the same as query21 and query32
        // Both sides not nullable
        var query34 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.Value  equals e2.Count.Value select new { e1, e2 };
        var sql34 = query34.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]

//maumar:
//same as query1 - we ignore .Value, so both sides are nullable, but its inside join key comparion and not wrapped in anonymous object, so we can do relational semantics

@maumar
Copy link
Contributor

maumar commented Jan 25, 2022

closing this in favor of doc issue

@maumar maumar closed this as completed Jan 25, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants