Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COALESCE in array Contains prevents index usage #1152

Closed
skynet2 opened this issue Dec 5, 2019 · 11 comments · Fixed by #1163
Closed

COALESCE in array Contains prevents index usage #1152

skynet2 opened this issue Dec 5, 2019 · 11 comments · Fixed by #1163
Assignees
Labels
breaking change Represents a breaking change performance
Milestone

Comments

@skynet2
Copy link

skynet2 commented Dec 5, 2019

Hello,
Today i updated to the latest version (3.1.0).

After the upgrade, I can see some very strange behavior in queries which leads to a sequence scan instead of index scan as previously.

Details:
Example with 3.1.0

SELECT o.id
FROM odds AS o
WHERE COALESCE(o.id = ANY ($1), FALSE)

(where $1 = '{1,2,3,4}')

That leads to sequence scan because of coalesce.
image

Example with 3.0.1

SELECT o.id
FROM odds AS o
WHERE o.id IN (1, 2, 3, 4, 5)

image

That's is the commit where that change was introduced d7863b8#diff-9091bf8c282a54bfeecb802bb7705561

cc @roji

Thanks,
Stas

@roji
Copy link
Member

roji commented Dec 6, 2019

/cc @smitpatel @maumar on a case where null semantics degrades performance because an index isn't used. Am just curious whether we're aware of other cases like this in general (which aren't a result of superfluous checks we haven't yet optimized away).

@roji
Copy link
Member

roji commented Dec 6, 2019

@skynet2 just to be sure, can you please post the full LINQ query? I'm specifically interested in whether your array is a constant (specified inline in the query, not as an external variable) and whether id is nullable.

FYI the COALESCE is necessary in order to preserve the correct behavior in the face of nulls - but I'll think about what we can do if this has a significant perf impact.

@skynet2
Copy link
Author

skynet2 commented Dec 6, 2019

Hi @roji

        public class Odd2
        {
            public long Id { get; set; }
        }
        public class TestContext : DbContext
        {
            public DbSet<Odd2> Odds { get; set; }

            public TestContext(DbContextOptions<TestContext> options) : base(options)
            {
                
            }

            protected override void OnModelCreating(ModelBuilder modelBuilder)
            {
                modelBuilder.Entity<Odd2>(opt =>
                {
                    opt.HasKey(x => x.Id);
                    opt.Property(x => x.Id).HasColumnName("id");
                    opt.ToTable("odds");
                });
            }
        }

        public static async Task Test()
        {
            var collection = new ServiceCollection();

            collection.AddDbContext<TestContext>(o =>
                {
                    o.UseNpgsql(
                        "conntection-string");
                });
            
            var provider = collection.BuildServiceProvider();

            var context = provider.GetService<TestContext>();
            var ids = new long[] {1, 2, 3, 4, 5};
            var items = context.Odds.Where(y => ids.Contains(y.Id)).ToArray();
        }                                                                                                                       

Thanks,
Stas

@smitpatel
Copy link

I am not able to understand the difference here. Both of the SQL relates to different tables and selecting different data. What is the expected SQL vs generated SQL? It may be slightly related to the fact that null semantics are not taken care for custom expression.

@roji
Copy link
Member

roji commented Dec 7, 2019

Yeah, the LINQ and SQL queries above don't seem to correspond to the SQL. @skynet2, in general, what we need are clear, minimal code samples which show issues, anything else is confusing and creates more work for us.

However, it's not surprising that adding the COALESCE causes PG to stop using the index on id:

DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT PRIMARY KEY);
INSERT INTO data (id) VALUES (1), (2), (3);

EXPLAIN SELECT * FROM data WHERE id = ANY('{1, 100, 1000}');
EXPLAIN SELECT * FROM data WHERE COALESCE(id = ANY('{1, 100, 1000}'));

The version without the COALESCE:

Bitmap Heap Scan on data  (cost=8.49..15.60 rows=3 width=4)
  Recheck Cond: (id = ANY ('{1,100,1000}'::integer[]))
  ->  Bitmap Index Scan on data_pkey  (cost=0.00..8.49 rows=3 width=0)
        Index Cond: (id = ANY ('{1,100,1000}'::integer[]))

The version with the COALESCE:

Seq Scan on data  (cost=0.00..45.06 rows=1275 width=4)
  Filter: COALESCE((id = ANY ('{1,100,1000}'::integer[])))

FWIW this is exactly the thing that made me open dotnet/efcore#17598 (this case is documented there); once we have a more flexible parameter sniffing architecture, we could look at array parameters at execution time and not generate the COALESCE if there's no null. Until that happens, I think it makes sense to not do null semantics here, i.e. remove the COALESCE for now (nobody really complained before I added it in 3.0). @smitpatel I know this is sacrificing correctness for perf but in this case it may make sense.

It may be slightly related to the fact that null semantics are not taken care for custom expression.

That would definitely be a better way of applying the null semantics (and also allowing opt-out when relational nulls are on), but it wouldn't solve the perf issue created by adding the COALESCE...

@roji roji self-assigned this Dec 7, 2019
@roji roji added this to the 3.1.1 milestone Dec 7, 2019
@skynet2
Copy link
Author

skynet2 commented Dec 10, 2019

Hi @roji @smitpatel ,
Yep, sorry for bad examples and description, I was really in hurry posting that issue.
I updated my replies with the proper SQL and c# code examples.

Thanks,
Stas

@roji roji changed the title Issue "Contains" translation in 3.1.0 COALESCE in array Contains prevents index usage in 3.1.0 Dec 11, 2019
roji added a commit that referenced this issue Dec 11, 2019
Because it prevents index usage. Will reimplement after we can sniff
the parameter for null values.

Fixes #1152
roji added a commit that referenced this issue Dec 13, 2019
Because it prevents index usage. Will reimplement after we can sniff
the parameter for null values.

Fixes #1152
@mikes-gh
Copy link

I am hitting this.
I removed COALESSE form outputted SQL and performance is back
Before I start rewriting all my linq queries how far away is the fix from nuget?

@mikes-gh
Copy link

In the meantime if anyone hits this you can someArray.ToList() to force an IN translation
In the following example invoiceNoStrings is an array.

 ICollection<Models.Transactions> invoiceTransactions = _tradingCompanyDbContext.Transactions
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.InvTypeNavigation)
                .Include(t => t.AccountNoNavigation).ThenInclude(a => a.PaymentFreqNavigation)
                .Include(t => t.InvoiceDetails).ThenInclude(i => i.StockNoNavigation)
                .Include(t => t.DiscountDetails)
                .Include(t => t.CommentDetails)
                .Where(t => invoiceNoStrings.ToList().Contains(t.CustSuppRef)
                    && (t.TransCode == 0 || t.TransCode == 1))
                .AsNoTracking()
                .ToList();

@pgrm
Copy link

pgrm commented Jan 30, 2020

@roji we'd be also interested in having this at least in a preview release - or is there anything that breaks by removing coalesce / what are the trade-offs (like as long as you don't use it with a nullable column)?

@roji
Copy link
Member

roji commented Jan 30, 2020

@pgrm hang in there, I really intend to release 3.1.1 in the next few days!

@roji roji added the breaking change Represents a breaking change label Feb 6, 2020
@roji roji changed the title COALESCE in array Contains prevents index usage in 3.1.0 COALESCE in array Contains prevents index usage Feb 6, 2020
@roji
Copy link
Member

roji commented Feb 6, 2020

Shoot, seems like I forgot to cherry-pick this to 3.1.1 (just like #1154... not proud here). I will release 3.1.1.2 shortly.

Note that this is a small breaking change - anyone relying on the current coalescing behavior may be broken. However, this does seem like the right thing to do until we can bring back proper null semantics for 5.0 (#1142)

@roji roji modified the milestones: 3.1.1, 3.1.1.2 Feb 6, 2020
roji added a commit that referenced this issue Feb 20, 2020
Because it prevents index usage. Will reimplement after we can sniff
the parameter for null values.

Fixes #1152

(cherry picked from commit 03b63f2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Represents a breaking change performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants