Skip to content

Commit

Permalink
refactor(pandas): port the pandas backend with an improved execution …
Browse files Browse the repository at this point in the history
…model (#7797)

Since we need to reimplement/port all of the backends for #7752, I took
an
attempt at reimplementing the pandas backend using a new execution
engine.
Previously the pandas backend was implemented using a top-down execution
model
and each operation was executing using a multidispatched function. While
it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they
supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not
supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which
was
created for each operation separately and the results were not reusable
even
  though the same operation was executed multiple times

The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input
expression
to a form closer to the pandas execution model, this makes it much
easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up
manner;
the intermediate results are reused, making the execution more efficient
while
also aggressively cleaned up as soon as they are not needed anymore to
reduce
  the memory usage
- the execute function is now single-dispatched making the
implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same
implementation
can be used for multiple input shape combinations, this removes several
special cases from the implementation in exchange of a negligible
performance
  overhead
- there are helper utilities making it easier to implement compute
kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
`serieswise`; if there are multiple implementations available for a
given
operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
  • Loading branch information
kszucs authored and cpcloud committed Feb 12, 2024
1 parent 064ae38 commit ae7da21
Show file tree
Hide file tree
Showing 63 changed files with 2,517 additions and 7,993 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ibis-backends.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,10 @@ jobs:
# title: Dask
# extras:
# - dask
# - name: pandas
# title: Pandas
# extras:
# - pandas
- name: pandas
title: Pandas
extras:
- pandas
# - name: sqlite
# title: SQLite
# extras:
Expand Down
Empty file removed ibis/backends/base/df/__init__.py
Empty file.
211 changes: 0 additions & 211 deletions ibis/backends/base/df/scope.py

This file was deleted.

Loading

0 comments on commit ae7da21

Please sign in to comment.