refactor(ir): split the relational operations #7752

kszucs · 2023-12-14T14:21:32Z

Rationale & History

In the last couple of years we have been constantly refactoring the internals to make it easier to work with.
Although we have made great progress, the current codebase is still hard to maintain and extend. One example of that complexity is the try to remove the Projector class in #7430. I had to realize that we are unable to improve the internals in smaller incremental steps, we need to make a big leap forward to make the codebase maintainable in the long run.

One of the hotspots of problems is the analysis.py module which tries to bridge the gap between the user-facing API and the internal representation. Part of its complexity is caused by loose integrity checks in the internal representation, allowing various ways to represent the same operation. This makes it hard to inspect, reason about and optimize the relational operations. In addition to that, it makes much harder to implement the backends since more branching is required to cover all the variations.

We have always been aware of these problems, and actually we had several attempts to solve them the same way this PR does. However, we never managed to actually split the relational operations, we always hit roadblocks to maintain compatibility with the current test suite. Actually we were unable to even understand those issues because of the complexity of the codebase and number of indirections between the API, analysis functions and the internal representation.

But(!) finally we managed to prototype a new IR in #7580 along with implementations for the majority of the backends, including various SQL backends and pandas. After successfully validating the viability of the new IR, we split the PR into smaller pieces which can be individually reviewed. This PR is the first step of that process, it introduces the new IR and the new API. The next steps will be to implement the remaining backends on top of the new IR.

What does the PR do:

Split the ops.Selection and ops.Aggregration nodes into proper relational algebra operations.
Almost entirely remove analysis.py with the technical debt accumulated over the years.
More flexible window frame binding: if an unbound analytical function is used with a window containing references to a relation then .over() is now able to bind the window frame to the relation.
Introduce a new API-level technique to dereference columns to the target relation(s).
Revamp the subquery handling to be more robust and to support more use cases with strict validation, now we have ScalarSubquery, ExistsSubquery, and InSubquery nodes which can only be used in the appropriate context.
Use way stricter integrity checks for all the relational operations, most of the time enforcing that all the value inputs of the node must originate from the parent relation the node depends on.
Introduce a new JoinChain operations to represent multiple joins in a single operation followed by a projection attached to the same relation. This enabled to solve several outstanding issues with the join handling (including the notorious chain join issue).
Use straightforward rewrite rules collected in rewrites.py to reinterpret user input so that the new operations can be constructed, even with the strict integrity checks.
Provide a set of simplification rules to reorder and squash the relational operations into a more compact form.
Use mappings to represent projections, eliminating the need of internally storing ops.Alias nodes. In addition to that table nodes in projections are not allowed anymore, the columns are expanded to the same mapping making the semantics clear.
Uniform handling of the various kinds of inputs for all the API methods using a generic bind() function.

Advantages:

The operations are much simpler with clear semantics.
The operations are easier to reason about and to optimize.
The backends can easily lower the internal representation to a backend-specific form before compilation/execution, so the lowered form can be easily inspected, debugged, and optimized.
The API is much closer to the users' mental model, thanks to the dereferencing technique.
The backend implementation can be greatly simplified due to the simpler internal representation and strict integrity checks. As an example the pandas backend can be slimmed down by 4k lines of code while being more robust and easier to maintain.

Disadvantages:

The backends must be rewritten to support the new internal representation.

Technicalities

Used the following command to cherry-pick changes related to the IR from #7580

git format-patch --subject-prefix=ir --stdout master..newrels -- ibis/common ibis/formats ibis/expr ibis/tests/expr ibis/*.py | git am -3 -k

TODOs

These follow-ups must be addressed before doing a release:

ibis/tests/expr/test_table.py

ibis/expr/types/joins.py

ibis/expr/operations/relations.py

cpcloud · 2023-12-18T20:22:49Z

ibis/expr/analysis.py

cpcloud

Lots of great stuff here!

I think the PR description needs some prose :)

Here's some possible questions to answer in the description:

How are you solving the chained join problem?
What are the intended semantics of table references: i.e., what constitutes an integrity error?
How does the subquery flattening and column pruning work?
What still needs work and whether it can be done in a follow up
Are there any expected or known breakages to user code

Ideally most of the core maintainers can read the description and use it as a guide for the PR.

ibis/expr/operations/core.py

ibis/expr/decompile.py

ibis/expr/rewrites.py

ibis/expr/tests/snapshots/test_sql/test_parse_sql_table_alias/decompiled.py

ibis/expr/types/relations.py

cpcloud · 2023-12-19T15:14:01Z

ibis/tests/expr/test_table.py

-        table[["a", ["b"]]]
-    with pytest.raises(com.IbisTypeError, match=errmsg):
-        table["a", ["b"]]
+# FIXME(kszucs): currently bind() flattens the list of expressions, so arbitrary


Does this need to be fixed?

The API is less restrictive now, if we are okay with that then no, otherwise yes.

Seems fine to me!

Created a follow-up issue for this #7819

ibis/tests/expr/test_table.py

gforsyth

small things.

I think all of our future selves would benefit from writing up the join-chain-deferencing ideas and putting them somewhere in the developer docs.
Maybe under a "Design" or "Internals" section?

Non-blocking, and a lot of what would go in that doc is already present in docstrings here, but I think it would be good to collect it all and then add some more detail.

ibis/expr/types/generic.py

ibis/expr/operations/relations.py

cpcloud

Lookin' good!

Can you make sure to capture the collision implementation in the PR description? It could even go in a follow up IMO.

ibis/expr/types/relations.py

ibis/expr/tests/test_newrels.py

Rationale and history --------------------- In the last couple of years we have been constantly refactoring the internals to make it easier to work with. Although we have made great progress, the current codebase is still hard to maintain and extend. One example of that complexity is the try to remove the `Projector` class in ibis-project#7430. I had to realize that we are unable to improve the internals in smaller incremental steps, we need to make a big leap forward to make the codebase maintainable in the long run. One of the hotspots of problems is the `analysis.py` module which tries to bridge the gap between the user-facing API and the internal representation. Part of its complexity is caused by loose integrity checks in the internal representation, allowing various ways to represent the same operation. This makes it hard to inspect, reason about and optimize the relational operations. In addition to that, it makes much harder to implement the backends since more branching is required to cover all the variations. We have always been aware of these problems, and actually we had several attempts to solve them the same way this PR does. However, we never managed to actually split the relational operations, we always hit roadblocks to maintain compatibility with the current test suite. Actually we were unable to even understand those issues because of the complexity of the codebase and number of indirections between the API, analysis functions and the internal representation. But(!) finally we managed to prototype a new IR in ibis-project#7580 along with implementations for the majority of the backends, including `various SQL backends` and `pandas`. After successfully validating the viability of the new IR, we split the PR into smaller pieces which can be individually reviewed. This PR is the first step of that process, it introduces the new IR and the new API. The next steps will be to implement the remaining backends on top of the new IR. Changes in this commit ---------------------- - Split the `ops.Selection` and `ops.Aggregration` nodes into proper relational algebra operations. - Almost entirely remove `analysis.py` with the technical debt accumulated over the years. - More flexible window frame binding: if an unbound analytical function is used with a window containing references to a relation then `.over()` is now able to bind the window frame to the relation. - Introduce a new API-level technique to dereference columns to the target relation(s). - Revamp the subquery handling to be more robust and to support more use cases with strict validation, now we have `ScalarSubquery`, `ExistsSubquery`, and `InSubquery` nodes which can only be used in the appropriate context. - Use way stricter integrity checks for all the relational operations, most of the time enforcing that all the value inputs of the node must originate from the parent relation the node depends on. - Introduce a new `JoinChain` operations to represent multiple joins in a single operation followed by a projection attached to the same relation. This enabled to solve several outstanding issues with the join handling (including the notorious chain join issue). - Use straightforward rewrite rules collected in `rewrites.py` to reinterpret user input so that the new operations can be constructed, even with the strict integrity checks. - Provide a set of simplification rules to reorder and squash the relational operations into a more compact form. - Use mappings to represent projections, eliminating the need of internally storing `ops.Alias` nodes. In addition to that table nodes in projections are not allowed anymore, the columns are expanded to the same mapping making the semantics clear. - Uniform handling of the various kinds of inputs for all the API methods using a generic `bind()` function. Advantages of the new IR ------------------------ - The operations are much simpler with clear semantics. - The operations are easier to reason about and to optimize. - The backends can easily lower the internal representation to a backend-specific form before compilation/execution, so the lowered form can be easily inspected, debugged, and optimized. - The API is much closer to the users' mental model, thanks to the dereferencing technique. - The backend implementation can be greatly simplified due to the simpler internal representation and strict integrity checks. As an example the pandas backend can be slimmed down by 4k lines of code while being more robust and easier to maintain. Disadvantages of the new IR --------------------------- - The backends must be rewritten to support the new internal representation.

cpcloud · 2023-12-21T09:50:38Z

🎉 🎉 🚀 🤖

…perations Old Implementation ------------------ Since we need to reimplement/port all of the backends for ibis-project#7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times New Implementation ------------------ The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (#7797) ## Old Implementation Since we need to reimplement/port all of the backends for #7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times ## New Implementation The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (ibis-project#7797) Since we need to reimplement/port all of the backends for ibis-project#7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (#7797) Since we need to reimplement/port all of the backends for #7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (ibis-project#7797) Since we need to reimplement/port all of the backends for ibis-project#7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (#7797) Since we need to reimplement/port all of the backends for #7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (ibis-project#7797) Since we need to reimplement/port all of the backends for ibis-project#7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (#7797) Since we need to reimplement/port all of the backends for #7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

…model (ibis-project#7797) Since we need to reimplement/port all of the backends for ibis-project#7752, I took an attempt at reimplementing the pandas backend using a new execution engine. Previously the pandas backend was implemented using a top-down execution model and each operation was executing using a multidispatched function. While it served us well for a long time, it had a few drawbacks: - it was often hard to understand what was going on due to the complex preparation steps and various execution hooks - the multidispatched functions were hard to debug, additionally they supported a wide variety of inputs making the implementation rather bulky - due to the previous reaon, several inputs combinations were not supported, e.g. value operations with multiple columnar inputs - the `Scope` object was used to pass around the execution context which was created for each operation separately and the results were not reusable even though the same operation was executed multiple times The new execution model has changed in several ways: - there is a rewrite layer before execution which lowers the input expression to a form closer to the pandas execution model, this makes it much easier to implement the operations and also makes the input "plan" inspectable - the execution is now topologically sorted and executed in a bottom-up manner; the intermediate results are reused, making the execution more efficient while also aggressively cleaned up as soon as they are not needed anymore to reduce the memory usage - the execute function is now single-dispatched making the implementation easier to locate and debug - the inputs now broadcasted to columnar shape so that the same implementation can be used for multiple input shape combinations, this removes several special cases from the implementation in exchange of a negligible performance overhead - there are helper utilities making it easier to implement compute kernels for the various value operations: `rowwise`, `columnwise`, `elementwise`, `serieswise`; if there are multiple implementations available for a given operation, the most efficient one is selected based on the input shapes The new backend implementation has a higher feature coverage while the implementation is one third of the size of the previous one. BREAKING CHANGE: the `timecontext` feature is not supported anymore

kszucs changed the title ~~refactor(ir): split the relational operations [WIP~~ refactor(ir): split the relational operations [WIP] Dec 14, 2023

kszucs changed the base branch from master to the-epic-split December 14, 2023 14:22

kszucs force-pushed the the-epic-split branch 2 times, most recently from b5355a4 to bdd2239 Compare December 14, 2023 20:41

kszucs force-pushed the tes-ir branch from bf841d7 to dd992f9 Compare December 14, 2023 20:49

kszucs force-pushed the the-epic-split branch from bdd2239 to 5299ddc Compare December 17, 2023 11:53

kszucs force-pushed the tes-ir branch from f12b728 to f14e75f Compare December 17, 2023 12:06

kszucs commented Dec 17, 2023

View reviewed changes

ibis/tests/expr/test_table.py Outdated Show resolved Hide resolved

kszucs force-pushed the the-epic-split branch from 5299ddc to d174f3e Compare December 18, 2023 16:19

kszucs commented Dec 18, 2023

View reviewed changes

ibis/expr/types/joins.py Outdated Show resolved Hide resolved

kszucs force-pushed the tes-ir branch 2 times, most recently from 260ecd2 to 2145718 Compare December 18, 2023 19:04

kszucs commented Dec 18, 2023

View reviewed changes

ibis/expr/operations/relations.py Outdated Show resolved Hide resolved

kszucs commented Dec 18, 2023

View reviewed changes

ibis/expr/operations/relations.py Outdated Show resolved Hide resolved

kszucs mentioned this pull request Dec 18, 2023

refactor(sql): add a base SQLGlot backend for DuckDB and ClickHouse #7796

Merged

cpcloud reviewed Dec 18, 2023

View reviewed changes

ibis/expr/analysis.py Outdated

Copy link

Member

cpcloud Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kszucs reacted with laugh emoji

kszucs mentioned this pull request Dec 18, 2023

refactor(pandas): port the pandas backend with an improved execution model #7797

Merged

cpcloud requested changes Dec 19, 2023

View reviewed changes

kszucs force-pushed the tes-ir branch 2 times, most recently from 2b9193a to 402b90f Compare December 19, 2023 18:53

gforsyth reviewed Dec 19, 2023

View reviewed changes

ibis/expr/types/generic.py Outdated Show resolved Hide resolved

ibis/expr/types/generic.py Show resolved Hide resolved

ibis/expr/operations/relations.py Outdated Show resolved Hide resolved

cpcloud reviewed Dec 20, 2023

View reviewed changes

ibis/expr/types/relations.py Show resolved Hide resolved

ibis/expr/types/relations.py Outdated Show resolved Hide resolved

ibis/expr/tests/test_newrels.py Show resolved Hide resolved

This was referenced Dec 20, 2023

ux: revisit input value binding which currently supports arbitrary nesting of inputs in sequences and mappings. #7819

Closed

ux: reimplement the inner join convenience to not repeat fields in case of equality join predicates #7820

Closed

kszucs force-pushed the tes-ir branch from 1381187 to bffceb0 Compare December 20, 2023 20:55

kszucs force-pushed the the-epic-split branch from d174f3e to f3ee1a6 Compare December 20, 2023 20:58

kszucs added 2 commits December 20, 2023 21:58

test(ir): ensure that no backends are required to run the core tests

a2e79c9

kszucs force-pushed the tes-ir branch from bffceb0 to b79acbf Compare December 20, 2023 21:02

kszucs changed the title ~~refactor(ir): split the relational operations [WIP]~~ refactor(ir): split the relational operations Dec 20, 2023

cpcloud merged commit 072ed82 into ibis-project:the-epic-split Dec 21, 2023
11 checks passed

cpcloud deleted the tes-ir branch December 21, 2023 09:50

kszucs mentioned this pull request Jan 6, 2024

meta: Port backends to new relational operators/sqlglot #7909

Closed

21 tasks

lostmygithubaccount mentioned this pull request Jan 30, 2024

docs: Is there any documentation on the code principles and overall architecture? #8148

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(ir): split the relational operations #7752

refactor(ir): split the relational operations #7752

kszucs commented Dec 14, 2023 •

edited

Loading

cpcloud Dec 18, 2023

cpcloud left a comment

cpcloud Dec 19, 2023

kszucs Dec 19, 2023

cpcloud Dec 19, 2023

kszucs Dec 20, 2023

gforsyth left a comment

cpcloud left a comment

cpcloud commented Dec 21, 2023

refactor(ir): split the relational operations #7752

refactor(ir): split the relational operations #7752

Conversation

kszucs commented Dec 14, 2023 • edited Loading

Rationale & History

What does the PR do:

Advantages:

Disadvantages:

Technicalities

TODOs

cpcloud Dec 18, 2023

Choose a reason for hiding this comment

cpcloud left a comment

Choose a reason for hiding this comment

cpcloud Dec 19, 2023

Choose a reason for hiding this comment

kszucs Dec 19, 2023

Choose a reason for hiding this comment

cpcloud Dec 19, 2023

Choose a reason for hiding this comment

kszucs Dec 20, 2023

Choose a reason for hiding this comment

gforsyth left a comment

Choose a reason for hiding this comment

cpcloud left a comment

Choose a reason for hiding this comment

cpcloud commented Dec 21, 2023

kszucs commented Dec 14, 2023 •

edited

Loading