Skip to content

Commit

Permalink
refactor(pandas): rewrite the pandas backend for the new relational o…
Browse files Browse the repository at this point in the history
…perations

Old Implementation
------------------
Since we need to reimplement/port all of the backends for ibis-project#7752, I took an
attempt at reimplementing the pandas backend using a new execution engine.
Previously the pandas backend was implemented using a top-down execution model
and each operation was executing using a multidispatched function. While it
served us well for a long time, it had a few drawbacks:
- it was often hard to understand what was going on due to the complex
  preparation steps and various execution hooks
- the multidispatched functions were hard to debug, additionally they supported
  a wide variety of inputs making the implementation rather bulky
- due to the previous reaon, several inputs combinations were not supported,
  e.g. value operations with multiple columnar inputs
- the `Scope` object was used to pass around the execution context which was
  created for each operation separately and the results were not reusable even
  though the same operation was executed multiple times

New Implementation
------------------
The new execution model has changed in several ways:
- there is a rewrite layer before execution which lowers the input expression
  to a form closer to the pandas execution model, this makes it much easier to
  implement the operations and also makes the input "plan" inspectable
- the execution is now topologically sorted and executed in a bottom-up manner;
  the intermediate results are reused, making the execution more efficient while
  also aggressively cleaned up as soon as they are not needed anymore to reduce
  the memory usage
- the execute function is now single-dispatched making the implementation
  easier to locate and debug
- the inputs now broadcasted to columnar shape so that the same implementation
  can be used for multiple input shape combinations, this removes several
  special cases from the implementation in exchange of a negligible performance
  overhead
- there are helper utilities making it easier to implement compute kernels for
  the various value operations: `rowwise`, `columnwise`, `elementwise`,
  `serieswise`; if there are multiple implementations available for a given
  operation, the most efficient one is selected based on the input shapes

The new backend implementation has a higher feature coverage while the
implementation is one third of the size of the previous one.

BREAKING CHANGE: the `timecontext` feature is not supported anymore
  • Loading branch information
kszucs committed Jan 3, 2024
1 parent 34618e4 commit dcae407
Show file tree
Hide file tree
Showing 62 changed files with 2,505 additions and 7,939 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ibis-backends.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ jobs:
# title: Dask
# extras:
# - dask
# - name: pandas
# title: Pandas
# extras:
# - pandas
- name: pandas
title: Pandas
extras:
- pandas
# - name: sqlite
# title: SQLite
# extras:
Expand Down
Empty file removed ibis/backends/base/df/__init__.py
Empty file.
211 changes: 0 additions & 211 deletions ibis/backends/base/df/scope.py

This file was deleted.

Loading

0 comments on commit dcae407

Please sign in to comment.