-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer equality in boolean comparisons #34166
base: main
Are you sure you want to change the base?
Conversation
@@ -839,7 +839,7 @@ public override async Task Rewrite_compare_bool_with_bool(bool async) | |||
""" | |||
SELECT "e"."Id" | |||
FROM "Entities1" AS "e" | |||
WHERE "e"."BoolA" <> "e"."NullableBoolB" | |||
WHERE "e"."BoolA" = (NOT ("e"."NullableBoolB")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a positive change? I doubt any database out there will use index with equality and NOT, more than it would for inequality, no? Should we make this change more targeted, so that it doesn't do this specific transformation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a positive change?
It is a positive change, at least as long as we do not analyze the provenance of the columns.
In this specific case, it is basically irrelevant: the whole table is going to be scanned linearly (just once).
If the left column and the right column came from different tables, it would be much more efficient.
I doubt any database out there will use index with equality and NOT, more than it would for inequality, no?
At least Sqlite does. This is basically an instance of #34048 (comment)
Should we make this change more targeted, so that it doesn't do this specific transformation?
Do you expect worse plans on some db?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some examples in the issue #34164.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Sqlite and Postgres examples 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So between a <> b
and a = NOT(b)
, the former certainly seems more natural, and what I'd expect a standard SQL query to look like. If these perform the same, I'd definitely prefer the first, at least for readability etc. (and we do generally care about that).
I could alsi imagine a database where the planner optimizes to use an index on b (effectively transforming the inequality into a not on a), where this change would cause a regression. of course, this is entirely speculative, and I have an actually looked into which databases basis to which optimizations.
I'd prefer us to do a bit more cross database research before merging a change like this, which at the very least makes our SQL less readable/standard/expected (and that does tend to have some correlation sometimes with performance). If this address is a very specific Sqlite behavior, where equality is always better, we always have the option of doing this change for sqlite only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I wrote the above comment before noticing you posted data on other databases… some remarks:
- For the Sqlite case, have you confirmed that the second option, where an index is built, is actually faster than the first?
- For the SQL Server case, the total subtree cost is actually higher with the second method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will perform some measurement, but I do not have real code/an actual database that is using this filter; I will try to do some synthetic examples (I'll try to cover some interesting cases, but a real-world case would definitely be more relevant).
Maybe I should have explained this in advance: this is not a new/different approach to translate boolean (in)equalities; it is just making the code more consistent in choosing Ideally there should be no duplication of this code, hence it should be "inevitably" consistent. |
7f57ec3
to
32d05b3
Compare
I cleaned up the code a little (the same equality conversion logic is now shared between |
They can take advantage of indexing.
Rebased to resolve conflicts |
In most databases, equality comparisons can take advantage of indexing and inequalities cannot.
Fixes #34164.