Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add to_pandas_batches #7123

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 11, 2023

Adds a to_pandas_batches for returning an iterator of pandas DataFrames from expression execution. Closes #6805.

@cpcloud cpcloud added this to the 7.0 milestone Sep 11, 2023
@cpcloud cpcloud added feature Features or general enhancements ux User experience related issues labels Sep 11, 2023
@cpcloud cpcloud changed the title to pandas batches feat(api): add to_pandas_batches Sep 11, 2023
@cpcloud cpcloud marked this pull request as draft September 12, 2023 15:51
@cpcloud
Copy link
Member Author

cpcloud commented Sep 12, 2023

I'd like to add some more tests before this is merged.

Added some more tests, fixed up issues.

@cpcloud cpcloud removed this from the 7.0 milestone Sep 28, 2023
@cpcloud cpcloud marked this pull request as ready for review September 28, 2023 14:52
@cpcloud cpcloud requested a review from jcrist September 28, 2023 14:55
@cpcloud cpcloud added this to the 7.0 milestone Sep 28, 2023
params: Mapping[ir.Scalar, Any] | None = None,
limit: int | str | None = None,
**_: Any,
) -> pa.ipc.RecordBatchReader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect type annotation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

params: Mapping[ir.Value, Any] | None = None,
chunk_size: int = 1_000_000,
**kwargs: Any,
) -> pa.ipc.RecordBatchReader:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

schema = expr.schema()
yield from (
orig_expr.__pandas_result__(
self.pandas_converter.convert_table(batch.to_pandas(), schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC given this default definition, the PandasData converter should work for dask as well, since it's always working on direct pandas data. No need for the pandas_converter attribute at all, can just use PandasData directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed the attribute.

@@ -223,6 +224,8 @@ def _ipython_key_completions_(self) -> list[str]:


class _FileIOHandler:
pandas_converter = PandasData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference for this to be a private attribute if we keep this, just so it doesn't show up in tab-completion for users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the attribute entirely.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 28, 2023

Clouds are passing:

…/ibis on  to-pandas-batches is 📦 v6.1.0 via 🐍 v3.10.12 via ❄️  impure (ibis-3.10.12-env)
❯ pytest -m 'bigquery or snowflake' --snapshot-update --allow-snapshot-deletion -n auto -q
bringing up nodes...
........xx..x..x..........x........xx.xx....xxx..x.x.....x.........xxx....x........xx.............xx..x..x......xx...xx.x............xx......x.xxx.............x.x.x.............................x.x.. [  7%]
...........x.x...x...x...x..x...xx.x.........x......x...xxx.............x..x.x.....x..x..x..x......x...x........x...x........x......x..x......x.....xx.....x.......x....x....x....x...............x... [ 14%]
........xx....x.....x...................x.x...x...x.x.xxxxx.xxxxxx.xssssx.x.x.....x.sx...x.xxx...x......x..........x........x......x...x........x..x..............xx..............x......x..x..xx...x. [ 21%]
.................xx.........xx.x..............x.xx.....x..x.xx..x..........................x.x..x...x................................................................................................. [ 28%]
......x....x.....x.........x...x.............xxx.......x....x...x.......xx...x..........xx..xxx...x.......x........x....x.....x..........xx......................xx......xx.x..x.......x..xx.......... [ 35%]
...................x.x..........x........x.....x........x...x............x...x....xx......x...................xx..x.......sss...............x.........x....x.x..x............x..xx.x.......x....x..... [ 42%]
..........x..x............xxx.....................x...x...x..x............x.......x.x..x...x.x.x..sx...xx.x.x.x..x..x..........x..x............x.xx.xx.....x.x.....x.x.xxxx......x..x.....x..........x [ 49%]
x................xx...x....xxx...xxx.x...x....x...................x..x...x.....x........x....xx..x..xx.xx..xx...x.x...x.x..x..........x......x...xx.xxxx.xxx...xx.xxxx....x.........x..xxxx.xxxxxxxsss [ 56%]
ssssss.....x....x..........................x......x...x.x.x....x...xxxx.......x...xx.x.ss......xx.....x....s.......x..........x....sx.xs..x..........x.x........x.......x..x......x......x.x....x..... [ 63%]
...s...xx....x..x..xsx.x.xs.x...x.......x....................................s..x...........x...xx..........................sx....x..........xx....ssss.xx..xx.x.x..x..x...x...xx.xx.xxxxx.......x.... [ 70%]
......xxxx.............x...x.xx..xxxs.....x......................................x.............x..xss.....x...........x.xx.x.x........x.x......x......x............................................... [ 77%]
...................x................................................................x................................................x....x..........................x...x...x.........x..........x... [ 84%]
........................................x..x.x..........................x..........x..x...............x..x.x..x..........s....x......x...xx.xxx.....................xx.......x.....x.................. [ 91%]
..............................x...x.................................................................................................................................................s..s............... [ 98%]
.....................................xx...........x.                                                                                                                                                   [100%]
2349 passed, 38 skipped, 438 xfailed in 425.92s (0:07:05)

Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation nits, but this looks good to me

ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
ibis/backends/base/__init__.py Outdated Show resolved Hide resolved
ibis/expr/types/core.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member Author

cpcloud commented Sep 28, 2023

Clouds are good after the most recent commit:

…/ibis on  to-pandas-batches is 📦 v6.1.0 via 🐍 v3.10.12 via ❄️  impure (ibis-3.10.12-env)
❯ pytest -m 'bigquery or snowflake' --snapshot-update --allow-snapshot-deletion -n auto -q
bringing up nodes...
ssssxxxx..........xx.xxxxxxxxxxxxx.sxxxxxxxxxxxxxxsxxx....xxxxxxxxxxxxxxxxxxxxxxxxxxss..x............x..x.....x.xx....x..............x...........x......x......xx.....xx..x...x.xx..xxx.x...x....s.... [  7%]
.x...x........x.x.x..............x...x.sx..xxxx...xxxx....x..x.xxxxx......x.x.x.xx.x....xx.xx.x...x....x..xx......x....xx.....x..x.x.xxx.xx.....x.xx.....xx.....................x..x..x.xx............ [ 14%]
.......x...x..x...............xxx..x.....x......x...x...x..x.x.x..x...s.x.x.xxssss..x..............x.x.x....x..x.x..x................x..............x..........x.....x............x..xx.............ss [ 21%]
ss.............x....x.x.x....x....................................x....x.......xx.........x...............................x...x.........s.xxx.x.x..x.....xx...x.....x...xxx...x...x.....x...x....s...x [ 28%]
........x...........x.xx.x.x..s...x..................x....x...................x..............x......x...x..............x..........x.............x......................x..xx......x..x...x.x...x...... [ 35%]
.........xx.x...x.......x.x...x.....x...........x.x.....x.x..............ssss.x....x....ssxx.....s....x........x..xx..............x.......x.......x..x................xx...x....xx.x.xxxx...xx....x... [ 42%]
......x....x....x.....x..x...........x......x...x...x.......x.....x.s....x....x...x..x.....x....s...x...x......xx.......x.x...xx..x..x......xx......xx...........xx.........xs........x......x.x...... [ 49%]
..x....x.sssx..x.....x...............................x.x.x.x..xx...x..x....xx.....xx..s.x..........x.....x..........x..xx..x.x........x.x........................x....x.........xx...x...xxx..x.x..... [ 56%]
..........x.x......x.........xxx....x.......x..............x......x....xx.....x.....x.......x..x...................................x........x.......x..x......x..........x...........x......xx......x. [ 63%]
......x.....x.x.x....xx..............x.................x.x.x..........xx........x....x.........x.x.x...x.x............x....x....................x............x.........x.......x.x.x.....x............ [ 70%]
...x.....x....x.....x...xx.x.x..............x............................x.....x...x.x.x..................x...........................................................x............................... [ 77%]
......x.........................x..x........x.x..x...x.............................x......s.................x.........x............................x.................................................. [ 84%]
.......................x...............................................................................x................x...............x.xx..x..x..............................x...................xx [ 91%]
.................................................x.................................................................................................................................................... [ 98%]
.........sx........................x.......x.........                                                                                                                                                  [100%]
2349 passed, 38 skipped, 438 xfailed in 447.58s (0:07:27)

@cpcloud cpcloud merged commit ad88d8a into ibis-project:master Sep 28, 2023
84 checks passed
@cpcloud cpcloud deleted the to-pandas-batches branch September 28, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: add to_pandas_batches
3 participants