Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer equality in boolean comparisons #34166

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jul 5, 2024

In most databases, equality comparisons can take advantage of indexing and inequalities cannot.

Fixes #34164.

@@ -839,7 +839,7 @@ public override async Task Rewrite_compare_bool_with_bool(bool async)
"""
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" <> "e"."NullableBoolB"
WHERE "e"."BoolA" = (NOT ("e"."NullableBoolB"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a positive change? I doubt any database out there will use index with equality and NOT, more than it would for inequality, no? Should we make this change more targeted, so that it doesn't do this specific transformation?

Copy link
Contributor Author

@ranma42 ranma42 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a positive change?

It is a positive change, at least as long as we do not analyze the provenance of the columns.
In this specific case, it is basically irrelevant: the whole table is going to be scanned linearly (just once).
If the left column and the right column came from different tables, it would be much more efficient.

I doubt any database out there will use index with equality and NOT, more than it would for inequality, no?

At least Sqlite does. This is basically an instance of #34048 (comment)

Should we make this change more targeted, so that it doesn't do this specific transformation?

Do you expect worse plans on some db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some examples in the issue #34164.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Sqlite and Postgres examples 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So between a <> b and a = NOT(b), the former certainly seems more natural, and what I'd expect a standard SQL query to look like. If these perform the same, I'd definitely prefer the first, at least for readability etc. (and we do generally care about that).

I could alsi imagine a database where the planner optimizes to use an index on b (effectively transforming the inequality into a not on a), where this change would cause a regression. of course, this is entirely speculative, and I have an actually looked into which databases basis to which optimizations.

I'd prefer us to do a bit more cross database research before merging a change like this, which at the very least makes our SQL less readable/standard/expected (and that does tend to have some correlation sometimes with performance). If this address is a very specific Sqlite behavior, where equality is always better, we always have the option of doing this change for sqlite only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I wrote the above comment before noticing you posted data on other databases… some remarks:

  • For the Sqlite case, have you confirmed that the second option, where an index is built, is actually faster than the first?
  • For the SQL Server case, the total subtree cost is actually higher with the second method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will perform some measurement, but I do not have real code/an actual database that is using this filter; I will try to do some synthetic examples (I'll try to cover some interesting cases, but a real-world case would definitely be more relevant).

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 12, 2024

Maybe I should have explained this in advance: this is not a new/different approach to translate boolean (in)equalities; it is just making the code more consistent in choosing = over <> (as per the issue).

Ideally there should be no duplication of this code, hence it should be "inevitably" consistent.

@ranma42 ranma42 force-pushed the prefer-equal branch 2 times, most recently from 7f57ec3 to 32d05b3 Compare July 13, 2024 17:45
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 13, 2024

I cleaned up the code a little (the same equality conversion logic is now shared between OptimizeComparison and RewriteNullSemantics.

They can take advantage of indexing.
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 29, 2024

Rebased to resolve conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prefer equality when comparing values
2 participants