refactor(pandas): port the pandas backend with an improved execution … · ibis-project/ibis@07d6692

Commit

refactor(pandas): port the pandas backend with an improved execution …

…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore

Loading branch information

kszucs authored and cpcloud committed Feb 12, 2024

1 parent a559d6f commit 07d6692

.github/workflows/ibis-backends.yml

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -79,10 +79,10 @@ jobs:
  
              #   title: Dask

              #   extras:

              #     - dask

              # - name: pandas

              #   title: Pandas

              #   extras:

              #     - pandas

              - name: pandas

                title: Pandas

                extras:

                  - pandas

              # - name: sqlite

              #   title: SQLite

              #   extras:

ibis/backends/base/df/__init__.py

Empty file.

ibis/backends/base/df/scope.py

This file was deleted.

0 comments on commit `07d6692`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `07d6692`

Commit

There are no files selected for viewing

0 comments on commit 07d6692

0 comments on commit `07d6692`