Improve serialization of Pandas DataFrames to ipyvega #345

jdfekete · 2021-06-04T07:35:52Z

Hi,
The serialization of dataframes from python to vega in json is very inefficient, even for smallish datasets.
The https://github.com/vidartf/ipydatawidgets provide a mechanism to improve the serialization of numpy arrays, which is already a step.
For our project ProgressiVis, we are considering serializing a dataframe as a dictionary of columns (column-wise representation), where each column can be compressed from python and decompressed in js according to its type.
At the vega level, converting a column-wise format to vega's internal format has already been done for the "arrow" format in https://github.com/vega/vega-loader-arrow so it would not be hard to do it for a dictionary of columns.
In between, ipydatawidget uses gzip compression but there are other trade-offs, such as lz4.
The implementation is not hard but could take a couple of weeks and it would be great to be able to reuse it to send other dataframe formats if possible (e.g. our progressive tables would use the same serialization format).

How important would that kind of optimization be for ipyvega/Altair? low-priority? high-priority? Is anyone else interested in improving the data serialization for other dataframe formats?

Best,
Jean-Danel

domoritz · 2021-06-04T18:40:41Z

Adding to vega/altair#2471 (comment), I would say better serialization would be a great improvement and I am very supportive. I would suggest using Arrow as there is good support in Python and JS and more backends are adopting it as their internal representation.

domoritz · 2023-02-12T17:26:39Z

Done in #346 🎉

domoritz · 2023-04-12T02:09:49Z

Version 4.0 with this feature is released. Thanks for all your work on this.

jdfekete mentioned this issue Jun 4, 2021

Improve serialization of Pandas DataFrames to ipyvega vega/altair#2471

Closed

3 tasks

domoritz added the enhancement label Jun 4, 2021

domoritz closed this as completed Feb 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve serialization of Pandas DataFrames to ipyvega #345

Improve serialization of Pandas DataFrames to ipyvega #345

jdfekete commented Jun 4, 2021

domoritz commented Jun 4, 2021

domoritz commented Feb 12, 2023

domoritz commented Apr 12, 2023

Improve serialization of Pandas DataFrames to ipyvega #345

Improve serialization of Pandas DataFrames to ipyvega #345

Comments

jdfekete commented Jun 4, 2021

domoritz commented Jun 4, 2021

domoritz commented Feb 12, 2023

domoritz commented Apr 12, 2023