Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Support for BETWEEN ... AND ... #12634

Open
smitpatel opened this issue Jul 11, 2018 · 10 comments
Open

Query: Support for BETWEEN ... AND ... #12634

smitpatel opened this issue Jul 11, 2018 · 10 comments

Comments

@smitpatel
Copy link
Member

Available in T-SQL/SQLite/CosmosSql

@roji
Copy link
Member

roji commented Jul 13, 2018

Also in PostgreSQL: https://www.postgresql.org/docs/current/static/functions-comparison.html.

@divega divega added this to the Backlog milestone Jul 16, 2018
@divega divega added type-enhancement help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. labels Jul 16, 2018
@ralmsdeveloper
Copy link
Contributor

@smitpatel you can see this, maybe it is the simplest solution to this problem.
ralmsdeveloper@47c2b4a

If you like PR, tests will be implemented in SimpleQueryTestBase, and then provider writers will be able to rewrite it if there are any quirks.

@divega divega added good first issue This issue should be relatively straightforward to fix. help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. and removed help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. good first issue This issue should be relatively straightforward to fix. labels May 31, 2019
@ajcvickers ajcvickers added good first issue This issue should be relatively straightforward to fix. and removed help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. labels Aug 5, 2019
@ajcvickers ajcvickers removed the good first issue This issue should be relatively straightforward to fix. label Aug 30, 2019
@tomasbruckner
Copy link

I would love to see this, what needs to be done to revive this PR @ralmsdeveloper ?

@roji
Copy link
Member

roji commented Jun 18, 2020

@tomasbruckner @ralmsdeveloper what do you think this would provide beyond a simple x > y AND y <= z?

@ralmsdeveloper
Copy link
Contributor

@roji I had started a draft but I ran out of time!

Answering your question:
It's just syntax, in the end when the query is compiled the result is the same, everything is converted to GTE and LTE.

The only thing that can change is just the order in which the query can be made.
BETWEEN can become a limitation.

DECLARE @id int = 3;
select 'OK' where @id between 2 and 4;
select 'OK' where @id between 4 and 2;
select 'OK' where @id>=2 and @id<=4;
select 'OK' where @id<=4 and @id>=2;

image

@roji
Copy link
Member

roji commented Jun 19, 2020

I also investigated if there's any potential perf benefit and didn't find any... As such I'd really deprioritize this, compared to other issues we have...

@ralmsdeveloper
Copy link
Contributor

@roji
I agree, based on some analysis that I did, and I came to the conclusion that it is a very big responsibility for EF Core, I believe that EF should not really translate that.
As I mentioned above, we can have different results if the user passes parameters out of order!

If there are a lot of people wanting the feature, maybe it would be a good idea to take it to EF.Functions, but it's just thoughts!

@roji roji added the area-perf label Aug 11, 2024
@roji
Copy link
Member

roji commented Aug 11, 2024

Coming back to this after a while...

We've recently been focusing a bit on reducing duplication of expressions in our generated SQL, especially around null semantics. For example, translating to "x IS NOT DISTINCT FROM y" instead of x = y OR (x IS NULL AND y IS NULL) would avoid duplicating x and y in the expression (#29624); this is valuable especially when x/y aren't simple columns/parameters/scalars, but rather complex arbitrary expressions which may be expensive to evaluate.

This is also an area where BETWEEN is valuable: instead of evaluating x twice (x >= y AND x <= z), it's evaluated only once (x BETWEEN y AND z).

Database support:

Notes:

  • Implementation-wise, we should probably do this in SQL post-processing, introducing a new optimization step (SqlOptimizer). We already have some needs for other SQL optimizations, such as constant arithmetic evaluation - this is #28036, though that can probably be done earlier in SqlExpressionFactory, whereas this post-processing optimizer should be reserved for larger-scope optimizations such as the transformation to BETWEEN.
  • There's a special negation syntax (x NOT BETWEEN y AND z), similar to NOT LIKE, NOT IN, NOT EXISTS... We should also implement that.
  • Since negation is involved here, we probably want to apply De Morgan simplifications early, before the BETWEEN optimization (and others) are applied. These are currently done in SqlNullabilityProcessor (since there's an interaction with null semantics), which is too late; we may want to apply De Morgan earlier as well, as part of this new optimization step.
  • The alternative would be to implement this optimization in SqlNullabilityProcessing as well; but we really should start avoiding putting everything in there.

/cc @ranma42

@ranma42
Copy link
Contributor

ranma42 commented Aug 11, 2024

Coming back to this after a while...

We've recently been focusing a bit on reducing duplication of expressions in our generated SQL, especially around null semantics. For example, translating to "x IS NOT DISTINCT FROM y" instead of x = y OR (x IS NULL AND y IS NULL) would avoid duplicating x and y in the expression (#29624); this is valuable especially when x/y aren't simple columns/parameters/scalars, but rather complex arbitrary expressions which may be expensive to evaluate.

This is also an area where BETWEEN is valuable: instead of evaluating x twice (x >= y AND x <= z), it's evaluated only once (x BETWEEN y AND z).

Note that in most cases the original expression will already contain the duplication. AFAICT currently to make EFCore introduce duplication without having it in the original query you should combine Select() and Where().
An example where this causes a bad translation is:

queriable
  .Select(e => new { X = e.A + e.B * EF.Functions.Random(), Entity = e})
  .Where(e => e.X >= y && e.X <= z)
  .Select(e => e.Entity)

(because of #33791)

Database support:

Notes:

  • Implementation-wise, we should probably do this in SQL post-processing, introducing a new optimization step (SqlOptimizer). We already have some needs for other SQL optimizations, such as constant arithmetic evaluation - this is #28036, though that can probably be done earlier in SqlExpressionFactory, whereas this post-processing optimizer should be reserved for larger-scope optimizations such as the transformation to BETWEEN.

Doing this in the post-processing is certainly possible and I believe it should be reasonably straightforward. OTOH if we plan on improving the behavior around impure functions (rand, etc) it might not be sufficient to cover all of the cases. Nonetheless, it could definitely start in the postprocessing as a first step and then, if/when needed, we could evaluate whether an additional syntax to guarantee it makes sense (maybe we could avoid it completely if in the meantime we get another form of value binding... yes, CTE, I am looking at you).

  • There's a special negation syntax (x NOT BETWEEN y AND z), similar to NOT LIKE, NOT IN, NOT EXISTS... We should also implement that.

👍

  • Since negation is involved here, we probably want to apply De Morgan simplifications early, before the BETWEEN optimization (and others) are applied. These are currently done in SqlNullabilityProcessor (since there's an interaction with null semantics), which is too late; we may want to apply De Morgan earlier as well, as part of this new optimization step.

They are already performed quite early: since #34142 they are part of the Not construction:

// !(a AND b) -> !a OR !b (De Morgan)
SqlBinaryExpression { OperatorType: ExpressionType.AndAlso } binary
=> OrElse(Not(binary.Left), Not(binary.Right)),
// !(a OR b) -> !a AND !b (De Morgan)
SqlBinaryExpression { OperatorType: ExpressionType.OrElse } binary
=> AndAlso(Not(binary.Left), Not(binary.Right)),

Overall, I think that doing this as an optimization makes sense and should be straightforward; it could also help with some cases in which side-effects are not respected, but that is a bigger issue, that will also require bigger changes.
If the optimization is what we are aiming for (at least for the moment), I think I would be able to tackle it 😎

@roji
Copy link
Member

roji commented Aug 11, 2024

They are already performed quite early: since #34142 they are part of the Not construction

Oh right 🤦 I forgot you did this change... Even better.

There's the question of whether we should be doing this even when the source LINQ query contains duplication, e.g. Where(b => b.Posts.First().Rating >= 2 && b.Posts.First().Rating <= 5). If the expression is impure, doing so could cause unexpected results the other way (though this seems like quite a contrived edge case). Putting impurity aside, there's also the question of performance - performing deep equality here (on b.Posts.First().Rating)) isn't free - though #34149 would make it generally very cheap. In addition, we'd only do this when we already have >= and <= in the same predicate, so that seems fine.

So overall this does seem fine to me, at least until we have a better story for impurity (which we currently don't handle properly in other scenarios as well). Once we know about impure expressions, we can avoid the transformation to BETWEEN the moment there's any impurity detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants