refactor(pandas): rewrite the pandas backend for the new relational o… · kszucs/ibis@dcae407

Commit

refactor(pandas): rewrite the pandas backend for the new relational o…

…perations

Old Implementation
------------------
Since we need to reimplement/port all of the backends for ibis-project#7752, I took an
attempt at reimplementing the pandas backend using a new execution engine.
Previously the pandas backend was implemented using a top-down execution model
and each operation was executing using a multidispatched function. While it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which was
  created for each operation separately and the results were not reusable even
  though the same operation was executed multiple times

New Implementation
------------------
The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input expression
  to a form closer to the pandas execution model, this makes it much easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up manner;
  the intermediate results are reused, making the execution more efficient while
  also aggressively cleaned up as soon as they are not needed anymore to reduce
  the memory usage
- the execute function is now single-dispatched making the implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same implementation
  can be used for multiple input shape combinations, this removes several
  special cases from the implementation in exchange of a negligible performance
  overhead
- there are helper utilities making it easier to implement compute kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
  `serieswise`; if there are multiple implementations available for a given
  operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore

Loading branch information

kszucs committed Jan 3, 2024

1 parent 34618e4 commit dcae407

.github/workflows/ibis-backends.yml

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -74,10 +74,10 @@ jobs:
  
              #   title: Dask

              #   extras:

              #     - dask

              # - name: pandas

              #   title: Pandas

              #   extras:

              #     - pandas

              - name: pandas

                title: Pandas

                extras:

                  - pandas

              # - name: sqlite

              #   title: SQLite

              #   extras:

ibis/backends/base/df/__init__.py

Empty file.

ibis/backends/base/df/scope.py

This file was deleted.

0 comments on commit `dcae407`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `dcae407`

Commit

There are no files selected for viewing

0 comments on commit dcae407

0 comments on commit `dcae407`