Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider: should we use sqlglot's optimizer after compilation? #8789

Closed
1 task done
NickCrews opened this issue Mar 27, 2024 · 9 comments
Closed
1 task done

consider: should we use sqlglot's optimizer after compilation? #8789

NickCrews opened this issue Mar 27, 2024 · 9 comments
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

Is your feature request related to a problem?

As I worked on #8788, it got me wondering if there might be some optimizations that we don't do, but sqlglot would catch. If we upstream work there too, then others would benefit as well. Also, per #8770, this might help with optimizing queries that are impossible to optimize in the pre-compilation stage.

Describe the solution you'd like

It looks like we already use sqlglot.optimizer.optimize already in ibis/expr/sql.py`, but maybe that only affects SubQueries? Can we apply it in more cases?

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label Mar 27, 2024
@kszucs
Copy link
Member

kszucs commented Mar 27, 2024

Note that ibis now has a pretty convenient rewrite system which we can use to do various simplifications and optimizations. We haven't been focusing on these tasks yet, but I suggest you to take a look at the **/rewrites.py files in the meantime.

@NickCrews
Copy link
Contributor Author

@kszucs sorry I wasn't very clear there. Often, as in #8770, inefficiencies are added back into the SQL during translation ibis->sqlglot, AFTER rewrites.py has done its magic, so rewrites.py isn't useful to us here.

@kszucs
Copy link
Member

kszucs commented Mar 27, 2024

What I mean is that we can and should improve rewrites.py to avoid those inefficiencies.

@NickCrews
Copy link
Contributor Author

NickCrews commented Mar 27, 2024

I might be missing something here. Take a look at https://github.com/ibis-project/ibis/pull/8788/files#diff-cb487151de645f8616c295a297cf1bb685b77b639163934c40073dc8b692b83eR472-R488.

There, we have a self.if_ to ensure that all indexes are nonnegative, because postgres can't handle negative indexing. We only add this branching during compilation, because some backends like duckb can handle negative indexing natively, so in our internal representation in ArraySlice, we want to allow storing negative start and stop. This means that this branching is only added during compilation, rewrites.py wouldn't ever see any branching. So, if we ever want to remove/simplify that branching using static analysis, it can only happen after compilation. Does that make sense?

@NickCrews
Copy link
Contributor Author

errr, or maybe I'm missing something and the rewrites can happen on a per-backend basis?

@kszucs
Copy link
Member

kszucs commented Mar 28, 2024

the rewrites can happen on a per-backend basis?

Yes. We can also heavily reorganize the expressions as we do in the pandas backend for example.

@NickCrews
Copy link
Contributor Author

ah, great, sorry for the noise! I'll take a look at solving these problems using that system. We can revisit using sqlglot's optimizer/simplifier if that is not sufficient.

@NickCrews
Copy link
Contributor Author

@kszucs I'm realizing that the rewrites needed would have to be destructive/non-idempotent. Ie adding one to an index. Does the rewrite system support "apply this rewrite only once", or am I given no guarantees about if my rewrite is applied once or N times?

@kszucs
Copy link
Member

kszucs commented Mar 28, 2024

We control the number of passes and replacements and currently we do them in a single pass.

On the other hand if we were having multiple passes, idempotency depends on both the pattern and the replacement. The pattern wouldn't match the rewrite outcome again then no need to worry about it, otherwise a new node type can be introduced with the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

No branches or pull requests

2 participants