Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when calling chart.to_dict when using __dataframe__ protocol #3110

Closed
jcrist opened this issue Jul 17, 2023 · 6 comments
Closed

Error when calling chart.to_dict when using __dataframe__ protocol #3110

jcrist opened this issue Jul 17, 2023 · 6 comments
Labels

Comments

@jcrist
Copy link

jcrist commented Jul 17, 2023

Similar to #3109, except using ibis as the input and calling chart.to_dict (as is done automatically when rendering charts in a notebook).

import altair as alt
import ibis
import polars as pl

data = ibis.memtable(
    {'x': ['A', 'B', 'C', 'D', 'E'],
     'y': [5, 3, 6, 7, 2]}
)

chart = alt.Chart(data).mark_bar().encode(
    x='x',
    y='y',
)

chart.to_dict()

This outputs:

Traceback (most recent call last):
  File "/home/jcristharif/Code/altair/test.py", line 15, in <module>
    chart.to_dict()
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 2677, in to_dict
    return super().to_dict(
           ^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 892, in to_dict
    copy.data = _prepare_data(original_data, context)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/v5/api.py", line 114, in _prepare_data
    data = _pipe(data, data_transformers.get())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
           ^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/vegalite/data.py", line 23, in default_data_transformer
    return curried.pipe(data, limit_rows(max_rows=max_rows), to_values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 628, in pipe
    data = func(data)
           ^^^^^^^^^^
  File "/home/jcristharif/miniconda3/envs/altair/lib/python3.11/site-packages/toolz/functoolz.py", line 304, in __call__
    return self._partial(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jcristharif/Code/altair/altair/utils/data.py", line 95, in limit_rows
    if max_rows is not None and len(values) > max_rows:
                                ^^^^^^^^^^^
  File "/home/jcristharif/Code/ibis/ibis/expr/types/relations.py", line 547, in __len__
    raise com.ExpressionError('Use .count() instead')
ibis.common.exceptions.ExpressionError: Use .count() instead

Ibis tables don't support len, while many other dataframe-like objects (pyarrow/pandas/polars/...) do. I suspect this has masked this issue when using other dataframe like inputs.

A few possible fixes:

  • Always coerce input dataframes to some common dataframe-like object (e.g. using pd.api.interchange.from_dataframe). This would let you then write only pandas-compatible code, since things are automatically converted at the input to altair.
  • Use data.__dataframe__().num_rows() instead of len(data) when dealing with a __dataframe__ protocol input.

Versions:

  • Python 3.11
  • Altair main branch
  • Ibis main branch
@jcrist jcrist added the bug label Jul 17, 2023
@jonmmease
Copy link
Contributor

Thanks for reporting. Yeah, we shouldn't be using len here.

We'd also want to avoid calling data.__dataframe__() more than once (as I assume this would invoke multiple Ibis queries). We might be able to perform the conversion to Arrow in this limit_rows function, check the length, and then return the Arrow Table for further processing down the pipeline.

@jonmmease
Copy link
Contributor

I made a local change to get around the error with:

    elif hasattr(data, "__dataframe__"):
        pi = import_pyarrow_interchange()
        pa_table = pi.from_dataframe(data)
        assert pa_table.num_rows < max_rows
        return pa_table

This works with a pyarrow table, but fails with Ibis:

File ~/VegaFusion/repos/altair/altair/utils/data.py:95, in limit_rows(data, max_rows)
     93 elif hasattr(data, "__dataframe__"):
     94     pi = import_pyarrow_interchange()
---> 95     pa_table = pi.from_dataframe(data)
     96     assert pa_table.num_rows() < max_rows
     97     return pa_table

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pyarrow/interchange/from_dataframe.py:85, in from_dataframe(df, allow_copy)
     82 if not hasattr(df, "__dataframe__"):
     83     raise ValueError("`df` does not support __dataframe__")
---> 85 return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
     86                        allow_copy=allow_copy)

TypeError: Table.__dataframe__() got an unexpected keyword argument 'allow_copy'

It looks like Ibis isn't expecting the allow_copy argument to the __dataframe__ method, which pyarrow passes in:

Can repro without Altair with:

from pyarrow.interchange import from_dataframe
import ibis
data = ibis.memtable(
    {'x': ['A', 'B', 'C', 'D', 'E'],
     'y': [5, 3, 6, 7, 2]}
)
from_dataframe(data)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [4], line 6
      1 from pyarrow.interchange import from_dataframe
      2 data = ibis.memtable(
      3     {'x': ['A', 'B', 'C', 'D', 'E'],
      4      'y': [5, 3, 6, 7, 2]}
      5 )
----> 6 from_dataframe(data)

File ~/miniconda3/envs/altair-dev/lib/python3.10/site-packages/pyarrow/interchange/from_dataframe.py:86, in from_dataframe(df, allow_copy)
     83 if not hasattr(df, "__dataframe__"):
     84     raise ValueError("`df` does not support __dataframe__")
---> 86 return _from_dataframe(df.__dataframe__(allow_copy=allow_copy),
     87                        allow_copy=allow_copy)

TypeError: Table.__dataframe__() got an unexpected keyword argument 'allow_copy'

If pyarrow is passing allow_copy, then I would guess it's part of the protocol, but I haven't looked at it closely.

@jcrist
Copy link
Author

jcrist commented Jul 17, 2023

Thanks Jon! Yeah, I discovered that earlier today, fixed here (that's why I listed ibis dev as part of the reproducer :)).

@jonmmease
Copy link
Contributor

that's why I listed ibis dev as part of the reproducer :)

Haha, missed that 👍

@jonmmease
Copy link
Contributor

Here's a quick PR to try: #3111. It needs some testing before we merge, but it would be great if you want to start playing with it in case you run into other similar issues.

@mattijn mattijn closed this as completed Jul 17, 2023
@jcrist
Copy link
Author

jcrist commented Jul 17, 2023

Thanks y'all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants