Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: remove predicate fusion and pushdown #3031

Closed
wants to merge 5 commits into from

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 30, 2021

This PR removes predicate pushdown and fusion in ibis/expr/analysis.py
in an effort to greatly simplify expression construction and increase
maintainability.

This is the first in a series of backwards-incompatible changes that
will culminate in the simplication of the core relational operators (#2919).

There are a number of benefits to this:

  1. Fewer lines of code to maintain
  2. Fewer lines of highly complex code to maintain
  3. Much easier to debug broken queries
  4. New contributors can more easily understand the library

We can revisit adding fusion back later, after designing a more robust and
modular API for optimizing at expression construction time.

There's one major breaking change:

  1. Filter expressions that refer to a non-immediate child table are no longer
    valid and will raise an error (projections are almost all still valid,
    removing projection fusion will come in a follow-up).

    An example of this is:

t = ibis.table([("a", "string"), ("b", "int64")])
t[t.a == "foo"][t.b == 1]

previously this would fuse t.a == "foo" and t.b == 1 into a conjunction
and return t[(t.a == "foo") & (t.b == 1)].

The behavior is still available through other means, like the & operator, passing
a list of predicates, and if you must chain, you need to use a lambda.

Non-breaking changes:

  1. SQL queries contain more nesting. Queries that compose a lot of filters will
    contain more subqueries.

    I don't view this as problematic, as most database engines having fairly
    comprehensive subquery unnesting capabilities.

@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase expressions Issues or PRs related to the expression API labels Sep 30, 2021
@cpcloud cpcloud modified the milestone: Next release Sep 30, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 30, 2021

Unit Test Results

       19 files         19 suites   1h 41m 29s ⏱️
10 774 tests   8 717 ✔️   2 044 💤 13
54 565 runs  43 713 ✔️ 10 778 💤 74

For more details on these failures, see this check.

Results for commit 379d005.

♻️ This comment has been updated with latest results.

@icexelloss
Copy link
Contributor

I am nervous about this. We use pandas backends pretty heavily - are there any perf regressions if we remove fusion?

@cpcloud
Copy link
Member Author

cpcloud commented Oct 11, 2021

This is still a WIP (hence the draft).

There will probably be more temporaries created.

I may ultimately close this PR in favor of a different approach that would allow us to use rewrite rules to keep most optimizations.

@cpcloud
Copy link
Member Author

cpcloud commented Dec 16, 2021

Closing this out for now.

@cpcloud cpcloud closed this Dec 16, 2021
@cpcloud cpcloud deleted the simplify-analysis branch December 16, 2021 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants