Skip to content

Commit

Permalink
Improve serialization of Pandas DataFrames to ipyvega (#346)
Browse files Browse the repository at this point in the history
* data transfer via ipydatawidgets

* fix

* Fix heatmap for speed

* Fix heatmap for speed

* init ipydatatablewidget

* using NDArray instead of DataUnion

* adding a customized table traitlet

* src/widget.ts

* fix

* cleanup

* adding lz4js decompression

* fix

* make NumpyAdapter.equals returns always False

* Remove most console.log and add %time and progress bars in notebooks

* fixing .gitignore

* Add stress tests

* rename update_histogram2d to update_array2d. Remove console.log polluting the console. Add python doc.

* Improve

* Add a definition file to configure Altair. More work needed

* compression as string

* fix (naive approach)

* adding an adapter for Progressivis tables

* cleanup

* adding a touch mode

* using ipytablewidgets

* calling update_dataframe() and update_array2d() from update()

* fixes for jupyterlab3 compliance

* adding UI tests

* adding reference outputs for UI tests

* add ui action

* fix syntax

* another fix

* fix poetry conf

* testing only with python 3.9

* testing with yarn --no-lockfile

* testing only on lab branch for now

* trusting test notebook

* debug: testing ipytablewidgets presence

* get a cell as an artifact

* fix syntax

* if: always()

* arfifact test-output instead of ref...

* uploading all .png as an artifact

* new refs

* upload artifacts only if: cancelled() to avoid warnings

* improvement

* Widget.ipynb modified for use datasets property

* adding a random.seed()

* fixes

* more ui tests

* fix: producing artifacts if failure

* fix

* adding new ref

* simplifying last tests

* resize histogram2d to 32x32

* adding new refernces

* many updates ported from the bench branch

* add jupyterlab

* Introduce vega.altair with the `stream` method to stream altair specs

* fix bugs and add interactive test

* Call resize after update in streaming mode, otherwise, the widget is not properly sized

* Add the resize option to stream and update

* fixing resize for pending updates

* Fix tests

* Simplify streaming test

* Ready for distribution

* fix test.yml

* prettier

* merged with origin

* Cleanup

* Fixed spec

* bump to new library versions for yarn and vega

* fix unused packages

* fixing ui tests

* other fixes rlated to the 346 issue

* cleanup+fix

* new fixes

* set nodejs=17

* set nodejs=16

* make variables that can be const const

* cleanup

* Add comment for chaining multiple datasets in widget.py. Cleanup declarations and spaces.

* use the latest version of Altair. Print better information in altair.py/stream_examples

* Try lower version of Altair

* trying altair4.1.0+jsonschema3.2.0

* fixing the previous essay

* altair back to dev-dependencies

* alt.renderers.enable('default') instead of 'notebook'

* Add a first cell in AltairStreaming.ipynb to explain to install Altair before running the notebook

* UI end-to-end tests without conda

* trying sleep 3

* trying py 3.8

* py 3.7 ?

* py 3.6 ?

* avoid any[]|string

* removing touch_mode and touch params

* removing binder conf

* using a logger instead of print()

* insert?: any[] | @dataframe | ...

* clean-up code, add comments, check the need for resize in vega.wiget, remove example code from vega.altair and move it to AltairStreaming.ipynb

* Altair should be either a dev dependency or a strong dependency, not optional

* Add comments to the AltairStreaming.ipynb notebook [skip CI]

* update dependencies

* update webpack

* altair back to dev dependencies. Update webpack package version.

* Update webpack-cli package version.

* fix yarn lockfile issue

* Apply requested changes, except for moving the dependencies of jupyterlab to the top-level package

* Bump to the newer version of Poetry

* Update src/index.ts

Co-authored-by: Dominik Moritz <[email protected]>

* Rename AllSupportedTypes.ipynb to AllCompressions.ipynb and fix the README.md to use the --sym-link option for jupyter nbextension install

* Use stable URL for example data

* adding extension.js labplugin.js

* removing TimeXXX.ipynb

* "adding jupyter nbextension enable" subcommand, removing comment "not needed in notebook >= 5.3"

* Remove unused ISerializers from import in wiget.ts [skip CI]

* fixing bug in vega.js, removing src/extension.js (replaced by extension.ts), webpack.config.js after fusion

* waiting 10s after start-jlab before running UI tests

* keeping only py3.9

* prettifying .ts files + back to py3.6..3.9

* a better fix for src/vega.js

* fix jupyterlab

* fix poetry.lock

* fix test

* new fix for test

* trying pip install for extra jupyterlab

* fix syntax error

* trying poetry without virtualisation

* ipytablewidgets=0.2.4, jupyterlab extra removed

* fixes

* adding forgotten .js

* fixes

* essay

* many fixes

* adding jupyter-tablewidgets

* fix

* error injection for test

* fixing UI Tests

* back to the previous fix

* save widgets rendering as images

* using poetry 1.3.1

* minor improvement

* serialize static widgets as json

* Regenerate poetry lock file and fix warning in tests

* Update packages and try saving a streamed example

* setting rendered images dims

* back to @jupyter-widgets/base=4.1.0 because of jupyterlab

* fix

* new fixes+upgraded dependencies

* upgrade dev dependency filemanager-webpack-plugin => 7.0.0

* rm .gitmodules+update poetry.loc

* using barley.json instead of barley.csv in all cells

* improvements

* Improve documentation to function `stream`

* Fix docstring

---------

Co-authored-by: Christian Poli <[email protected]>
Co-authored-by: xtianpoli <[email protected]>
Co-authored-by: Dominik Moritz <[email protected]>
  • Loading branch information
4 people authored Feb 12, 2023
1 parent e7fb07c commit bd93616
Show file tree
Hide file tree
Showing 51 changed files with 204,480 additions and 1,452 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:

strategy:
matrix:
python: ["3.6", "3.7", "3.8", "3.9"]
python: ["3.7", "3.8", "3.9", "3.10"]

name: Python ${{ matrix.python }}

Expand All @@ -29,7 +29,7 @@ jobs:
- name: Setup poetry
uses: abatilo/[email protected]
with:
poetry-version: 1.1.12
poetry-version: 1.3.1

- name: Configure poetry
run: poetry config virtualenvs.in-project true
Expand Down
68 changes: 68 additions & 0 deletions .github/workflows/ui.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: UI Tests

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
run:
runs-on: ubuntu-latest

strategy:
matrix:
python: ["3.7", "3.8", "3.9", "3.10"]

name: Python ${{ matrix.python }}

defaults:
run:
shell: bash -l {0}

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python }}
uses: actions/[email protected]
with:
python-version: ${{ matrix.python }}

- name: Setup poetry
uses: abatilo/[email protected]
with:
poetry-version: 1.3.1

- name: Configure poetry
run: poetry config virtualenvs.create false

- name: Install Python dependencies
run: |
poetry install
python -m pip install jupyterlab-widgets==1.0.2
python -m pip install jupyterlab==3.0.11
- name: Setup Node
uses: actions/[email protected]

- name: Install Node dependencies
run: yarn --frozen-lockfile

- name: Build
run: yarn run build

- name: Run tests
run: |
jupyter labextension install .
cd ui-tests/
yarn
yarn run start-jlab:detached
sleep 10
yarn run test
- name: Upload output file
if: failure()
uses: actions/upload-artifact@v2
with:
name: cell-artifact
path: ui-tests/test-output/test/screenshots/vega_cell_*.png
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ To install `vega` and its dependencies from the Python Package Index using
```sh
pip install jupyter pandas vega
pip install --upgrade notebook # need jupyter_client >= 4.2 for sys-prefix below
jupyter nbextension install --sys-prefix --py vega # not needed in notebook >= 5.3
jupyter nbextension install --sys-prefix --py vega
jupyter nbextension enable --py --sys-prefix vega
```

### Conda Forge
Expand Down Expand Up @@ -51,7 +52,8 @@ Then activate the virtual environment with `poetry shell`.
Symlink files instead of copying files:

```sh
jupyter nbextension install --py --symlink vega
jupyter nbextension install --py --symlink --sys-prefix vega
jupyter nbextension enable --py --sys-prefix vega
```

Run kernel with `jupyter notebook`. Run the tests with `pytest vega`.
Expand Down
214 changes: 214 additions & 0 deletions notebooks/AllCompressions.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "247bdd49",
"metadata": {},
"source": [
"# Compression methods\n",
"Two compression methods are provided to send data: zlib and lz4. The latter is enabled by default since it is both fast and efficient for numerical columns. You may experience better results for particular column types so feel free to tune the compression method for your particular data.\n",
"\n",
"With a `VegaWidget`, the field `.compression` specifies the method used to send data.\n",
"The following code shows how to do it and some time results. Keep in mind that the time returned by Python are not exactly related to the performance of the compression because the data transmission is asynchronous. The numbers are still informative.\n",
"The content of the `.compression` field can be either a string with the name of the compression or an instance of the compressor, to specify the compression level."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "688a1886",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d23699478a5e48f98027c11ce7792fb7",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VegaWidget()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6 µs, sys: 7 µs, total: 13 µs\n",
"Wall time: 15.3 µs\n"
]
}
],
"source": [
"spec_no_data = {\n",
" \"$schema\": \"https://vega.github.io/schema/vega-lite/v5.json\",\n",
" \"data\": {\"name\": \"data\"},\n",
" \"mark\": \"bar\",\n",
" \"encoding\": {\n",
" \"x\": {\"aggregate\": \"sum\", \"field\": \"yield\"},\n",
" \"y\": {\"field\": \"variety\"},\n",
" \"color\": {\"field\": \"site\"}\n",
" }\n",
"}\n",
"from vega.widget import VegaWidget\n",
"import requests\n",
"import json\n",
"req = requests.get(\"https://cdn.jsdelivr.net/npm/[email protected]/data/barley.json\")\n",
"values = json.loads(req.text)\n",
"#data\n",
"widget = VegaWidget(spec=spec_no_data)\n",
"display(widget)\n",
"%time widget.update('data', insert=values)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7d0f4bf0",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5864bd22268b4b03bfecb71af6206a23",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VegaWidget()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 7.41 ms, sys: 0 ns, total: 7.41 ms\n",
"Wall time: 7.1 ms\n"
]
}
],
"source": [
"import pandas as pd\n",
"URL = \"https://forge.scilab.org/index.php/p/rdataset/source/file/368b19abcb4292c56e4f21079f750eb76b325907/csv/lattice/barley.csv\"\n",
"df = pd.read_csv(URL)\n",
"widget = VegaWidget(spec=spec_no_data)\n",
"display(widget)\n",
"%time widget.update(\"data\", insert=df)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a13d013c",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "112e7a4059d64fa08cab2f0583de3074",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VegaWidget()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 9.44 ms, sys: 0 ns, total: 9.44 ms\n",
"Wall time: 8.62 ms\n"
]
}
],
"source": [
"import pandas as pd\n",
"URL = \"https://forge.scilab.org/index.php/p/rdataset/source/file/368b19abcb4292c56e4f21079f750eb76b325907/csv/lattice/barley.csv\"\n",
"df = pd.read_csv(URL)\n",
"widget = VegaWidget(spec=spec_no_data)\n",
"widget.compression = 'zlib'\n",
"display(widget)\n",
"%time widget.update('data', insert=df)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6792044d",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ab67b74c7db74b1896e859031a0deb72",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VegaWidget()"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 9.84 ms, sys: 1.57 ms, total: 11.4 ms\n",
"Wall time: 10.7 ms\n"
]
}
],
"source": [
"import pandas as pd\n",
"from ipytablewidgets import LZ4Compressor\n",
"URL = \"https://forge.scilab.org/index.php/p/rdataset/source/file/368b19abcb4292c56e4f21079f750eb76b325907/csv/lattice/barley.csv\"\n",
"df = pd.read_csv(URL)\n",
"widget = VegaWidget(spec=spec_no_data)\n",
"widget.compression = LZ4Compressor(2)\n",
"display(widget)\n",
"%time widget.update('data', insert=df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11890337",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
109 changes: 57 additions & 52 deletions notebooks/Altair.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit bd93616

Please sign in to comment.