Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(ir): split the relational operations #7752

Merged
merged 2 commits into from
Dec 21, 2023

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Dec 14, 2023

Rationale & History

In the last couple of years we have been constantly refactoring the internals to make it easier to work with.
Although we have made great progress, the current codebase is still hard to maintain and extend. One example of that complexity is the try to remove the Projector class in #7430. I had to realize that we are unable to improve the internals in smaller incremental steps, we need to make a big leap forward to make the codebase maintainable in the long run.

One of the hotspots of problems is the analysis.py module which tries to bridge the gap between the user-facing API and the internal representation. Part of its complexity is caused by loose integrity checks in the internal representation, allowing various ways to represent the same operation. This makes it hard to inspect, reason about and optimize the relational operations. In addition to that, it makes much harder to implement the backends since more branching is required to cover all the variations.

We have always been aware of these problems, and actually we had several attempts to solve them the same way this PR does. However, we never managed to actually split the relational operations, we always hit roadblocks to maintain compatibility with the current test suite. Actually we were unable to even understand those issues because of the complexity of the codebase and number of indirections between the API, analysis functions and the internal representation.

But(!) finally we managed to prototype a new IR in #7580 along with implementations for the majority of the backends, including various SQL backends and pandas. After successfully validating the viability of the new IR, we split the PR into smaller pieces which can be individually reviewed. This PR is the first step of that process, it introduces the new IR and the new API. The next steps will be to implement the remaining backends on top of the new IR.

What does the PR do:

  • Split the ops.Selection and ops.Aggregration nodes into proper relational algebra operations.
  • Almost entirely remove analysis.py with the technical debt accumulated over the years.
  • More flexible window frame binding: if an unbound analytical function is used with a window containing references to a relation then .over() is now able to bind the window frame to the relation.
  • Introduce a new API-level technique to dereference columns to the target relation(s).
  • Revamp the subquery handling to be more robust and to support more use cases with strict validation, now we have ScalarSubquery, ExistsSubquery, and InSubquery nodes which can only be used in the appropriate context.
  • Use way stricter integrity checks for all the relational operations, most of the time enforcing that all the value inputs of the node must originate from the parent relation the node depends on.
  • Introduce a new JoinChain operations to represent multiple joins in a single operation followed by a projection attached to the same relation. This enabled to solve several outstanding issues with the join handling (including the notorious chain join issue).
  • Use straightforward rewrite rules collected in rewrites.py to reinterpret user input so that the new operations can be constructed, even with the strict integrity checks.
  • Provide a set of simplification rules to reorder and squash the relational operations into a more compact form.
  • Use mappings to represent projections, eliminating the need of internally storing ops.Alias nodes. In addition to that table nodes in projections are not allowed anymore, the columns are expanded to the same mapping making the semantics clear.
  • Uniform handling of the various kinds of inputs for all the API methods using a generic bind() function.

Advantages:

  • The operations are much simpler with clear semantics.
  • The operations are easier to reason about and to optimize.
  • The backends can easily lower the internal representation to a backend-specific form before compilation/execution, so the lowered form can be easily inspected, debugged, and optimized.
  • The API is much closer to the users' mental model, thanks to the dereferencing technique.
  • The backend implementation can be greatly simplified due to the simpler internal representation and strict integrity checks. As an example the pandas backend can be slimmed down by 4k lines of code while being more robust and easier to maintain.

Disadvantages:

  • The backends must be rewritten to support the new internal representation.

Technicalities

Used the following command to cherry-pick changes related to the IR from #7580

git format-patch --subject-prefix=ir --stdout master..newrels -- ibis/common ibis/formats ibis/expr ibis/tests/expr ibis/*.py | git am -3 -k

TODOs

These follow-ups must be addressed before doing a release:

@kszucs kszucs changed the title refactor(ir): split the relational operations [WIP refactor(ir): split the relational operations [WIP] Dec 14, 2023
@kszucs kszucs changed the base branch from master to the-epic-split December 14, 2023 14:22
@kszucs kszucs force-pushed the the-epic-split branch 2 times, most recently from b5355a4 to bdd2239 Compare December 14, 2023 20:41
ibis/tests/expr/test_table.py Outdated Show resolved Hide resolved
ibis/expr/types/joins.py Outdated Show resolved Hide resolved
@kszucs kszucs force-pushed the tes-ir branch 2 times, most recently from 260ecd2 to 2145718 Compare December 18, 2023 19:04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of great stuff here!

I think the PR description needs some prose :)

Here's some possible questions to answer in the description:

  • How are you solving the chained join problem?
  • What are the intended semantics of table references: i.e., what constitutes an integrity error?
  • How does the subquery flattening and column pruning work?
  • What still needs work and whether it can be done in a follow up
  • Are there any expected or known breakages to user code

Ideally most of the core maintainers can read the description and use it as a guide for the PR.

ibis/expr/operations/core.py Show resolved Hide resolved
ibis/expr/decompile.py Outdated Show resolved Hide resolved
ibis/expr/rewrites.py Outdated Show resolved Hide resolved
ibis/expr/rewrites.py Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
table[["a", ["b"]]]
with pytest.raises(com.IbisTypeError, match=errmsg):
table["a", ["b"]]
# FIXME(kszucs): currently bind() flattens the list of expressions, so arbitrary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be fixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API is less restrictive now, if we are okay with that then no, otherwise yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a follow-up issue for this #7819

ibis/tests/expr/test_table.py Outdated Show resolved Hide resolved
ibis/tests/expr/test_table.py Outdated Show resolved Hide resolved
ibis/tests/expr/test_table.py Show resolved Hide resolved
@kszucs kszucs force-pushed the tes-ir branch 2 times, most recently from 2b9193a to 402b90f Compare December 19, 2023 18:53
Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small things.

I think all of our future selves would benefit from writing up the join-chain-deferencing ideas and putting them somewhere in the developer docs.
Maybe under a "Design" or "Internals" section?

Non-blocking, and a lot of what would go in that doc is already present in docstrings here, but I think it would be good to collect it all and then add some more detail.

ibis/expr/types/generic.py Outdated Show resolved Hide resolved
ibis/expr/types/generic.py Show resolved Hide resolved
ibis/expr/operations/relations.py Outdated Show resolved Hide resolved
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookin' good!

Can you make sure to capture the collision implementation in the PR description? It could even go in a follow up IMO.

ibis/expr/types/relations.py Show resolved Hide resolved
ibis/expr/types/relations.py Outdated Show resolved Hide resolved
ibis/expr/tests/test_newrels.py Show resolved Hide resolved
Rationale and history
---------------------
In the last couple of years we have been constantly refactoring the
internals to make it easier to work with. Although we have made great
progress, the current codebase is still hard to maintain and extend.
One example of that complexity is the try to remove the `Projector`
class in ibis-project#7430. I had to realize that we are unable to improve the
internals in smaller incremental steps, we need to make a big leap
forward to make the codebase maintainable in the long run.

One of the hotspots of problems is the `analysis.py` module which tries
to bridge the gap between the user-facing API and the internal
representation. Part of its complexity is caused by loose integrity
checks in the internal representation, allowing various ways to
represent the same operation. This makes it hard to inspect, reason
about and optimize the relational operations. In addition to that, it
makes much harder to implement the backends since more branching is
required to cover all the variations.

We have always been aware of these problems, and actually we had several
attempts to solve them the same way this PR does. However, we never
managed to actually split the relational operations, we always hit
roadblocks to maintain compatibility with the current test suite.
Actually we were unable to even understand those issues because of the
complexity of the codebase and number of indirections between the API,
analysis functions and the internal representation.

But(!) finally we managed to prototype a new IR in ibis-project#7580 along with
implementations for the majority of the backends, including `various SQL
backends` and `pandas`. After successfully validating the viability of
the new IR, we split the PR into smaller pieces which can be
individually reviewed. This PR is the first step of that process, it
introduces the new IR and the new API. The next steps will be to
implement the remaining backends on top of the new IR.

Changes in this commit
----------------------
- Split the `ops.Selection` and `ops.Aggregration` nodes into proper
  relational algebra operations.
- Almost entirely remove `analysis.py` with the technical debt
  accumulated over the years.
- More flexible window frame binding: if an unbound analytical function
  is used with a window containing references to a relation then
  `.over()` is now able to bind the window frame to the relation.
- Introduce a new API-level technique to dereference columns to the
  target relation(s).
- Revamp the subquery handling to be more robust and to support more
  use cases with strict validation, now we have `ScalarSubquery`,
  `ExistsSubquery`, and `InSubquery` nodes which can only be used in
  the appropriate context.
- Use way stricter integrity checks for all the relational operations,
  most of the time enforcing that all the value inputs of the node must
  originate from the parent relation the node depends on.
- Introduce a new `JoinChain` operations to represent multiple joins in
  a single operation followed by a projection attached to the same
  relation. This enabled to solve several outstanding issues with the
  join handling (including the notorious chain join issue).
- Use straightforward rewrite rules collected in `rewrites.py` to
  reinterpret user input so that the new operations can be constructed,
  even with the strict integrity checks.
- Provide a set of simplification rules to reorder and squash the
  relational operations into a more compact form.
- Use mappings to represent projections, eliminating the need of
  internally storing `ops.Alias` nodes. In addition to that table nodes
  in projections are not allowed anymore, the columns are expanded to
  the same mapping making the semantics clear.
- Uniform handling of the various kinds of inputs for all the API
  methods using a generic `bind()` function.

Advantages of the new IR
------------------------
- The operations are much simpler with clear semantics.
- The operations are easier to reason about and to optimize.
- The backends can easily lower the internal representation to a
  backend-specific form before compilation/execution, so the lowered
  form can be easily inspected, debugged, and optimized.
- The API is much closer to the users' mental model, thanks to the
  dereferencing technique.
- The backend implementation can be greatly simplified due to the
  simpler internal representation and strict integrity checks. As an
  example the pandas backend can be slimmed down by 4k lines of code
  while being more robust and easier to maintain.

Disadvantages of the new IR
---------------------------
- The backends must be rewritten to support the new internal
  representation.
@kszucs kszucs changed the title refactor(ir): split the relational operations [WIP] refactor(ir): split the relational operations Dec 20, 2023
@cpcloud cpcloud merged commit 072ed82 into ibis-project:the-epic-split Dec 21, 2023
11 checks passed
@cpcloud cpcloud deleted the tes-ir branch December 21, 2023 09:50
@cpcloud
Copy link
Member

cpcloud commented Dec 21, 2023

🎉 🎉 🚀 🤖

kszucs added a commit to kszucs/ibis that referenced this pull request Jan 3, 2024
…perations

Old Implementation
------------------
Since we need to reimplement/port all of the backends for ibis-project#7752, I took an
attempt at reimplementing the pandas backend using a new execution engine.
Previously the pandas backend was implemented using a top-down execution model
and each operation was executing using a multidispatched function. While it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which was
  created for each operation separately and the results were not reusable even
  though the same operation was executed multiple times

New Implementation
------------------
The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input expression
  to a form closer to the pandas execution model, this makes it much easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up manner;
  the intermediate results are reused, making the execution more efficient while
  also aggressively cleaned up as soon as they are not needed anymore to reduce
  the memory usage
- the execute function is now single-dispatched making the implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same implementation
  can be used for multiple input shape combinations, this removes several
  special cases from the implementation in exchange of a negligible performance
  overhead
- there are helper utilities making it easier to implement compute kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
  `serieswise`; if there are multiple implementations available for a given
  operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Jan 4, 2024
…perations

Old Implementation
------------------
Since we need to reimplement/port all of the backends for ibis-project#7752, I took an
attempt at reimplementing the pandas backend using a new execution engine.
Previously the pandas backend was implemented using a top-down execution model
and each operation was executing using a multidispatched function. While it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which was
  created for each operation separately and the results were not reusable even
  though the same operation was executed multiple times

New Implementation
------------------
The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input expression
  to a form closer to the pandas execution model, this makes it much easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up manner;
  the intermediate results are reused, making the execution more efficient while
  also aggressively cleaned up as soon as they are not needed anymore to reduce
  the memory usage
- the execute function is now single-dispatched making the implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same implementation
  can be used for multiple input shape combinations, this removes several
  special cases from the implementation in exchange of a negligible performance
  overhead
- there are helper utilities making it easier to implement compute kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
  `serieswise`; if there are multiple implementations available for a given
  operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 4, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 4, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 5, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 12, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 13, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Jan 17, 2024
…model (#7797)

## Old Implementation

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex 
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times 

## New Implementation

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`, 
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the 
implementation is one third of the size of the previous one. 

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 1, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 1, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 1, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 2, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 2, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit to kszucs/ibis that referenced this pull request Feb 2, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit to cpcloud/ibis that referenced this pull request Feb 4, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit to cpcloud/ibis that referenced this pull request Feb 5, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit that referenced this pull request Feb 5, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit that referenced this pull request Feb 6, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit that referenced this pull request Feb 6, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit to cpcloud/ibis that referenced this pull request Feb 12, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Feb 12, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit to cpcloud/ibis that referenced this pull request Feb 12, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
cpcloud pushed a commit that referenced this pull request Feb 12, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
kszucs added a commit that referenced this pull request Feb 12, 2024
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
ncclementi pushed a commit to ncclementi/ibis that referenced this pull request Feb 21, 2024
…model (ibis-project#7797)

Since we need to reimplement/port all of the backends for ibis-project#7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants