Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pandas 2.0 support #5662

Merged
merged 37 commits into from
Apr 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
9d0a637
Cast to datetime64[ns]
hoxbro Mar 14, 2023
28315f5
Update test
hoxbro Mar 14, 2023
ea13012
Add to github actions
hoxbro Mar 14, 2023
1b94895
Add astype
hoxbro Mar 14, 2023
ddc9c77
Add comment
hoxbro Mar 14, 2023
ad5a888
Ignore not supported Dask test yet
hoxbro Mar 14, 2023
1cd2692
Update ci
hoxbro Mar 14, 2023
5ba1915
Test
hoxbro Mar 14, 2023
4477f31
Update examples with keyword argument
hoxbro Mar 14, 2023
0f2b1d7
Fix connect_tri_edges_pd
hoxbro Mar 14, 2023
22fa6f8
Add rest of keywords arguments
hoxbro Mar 14, 2023
c7f3db4
Merge branch 'main' into pandas_2
hoxbro Apr 4, 2023
189a115
Always convert to datetime64[ns]
hoxbro Apr 4, 2023
78a9001
Update test.yaml
hoxbro Apr 4, 2023
5498e5f
Force dt_to_int to return int
hoxbro Apr 4, 2023
30b0788
Only copy if xtype/ytype is datetime
hoxbro Apr 4, 2023
c35c112
Add spatialpandas to bokeh2 only test
hoxbro Apr 4, 2023
f3c800d
Add fix for np.intc pandas
hoxbro Apr 6, 2023
a819fa4
Workaround xarray not supporting pandas 2.0 yet
hoxbro Apr 6, 2023
55ec5dd
Remove rc for unittest
hoxbro Apr 8, 2023
131b90b
Add os to test.yaml
hoxbro Apr 8, 2023
3f94aaa
Add rest of os
hoxbro Apr 8, 2023
287680e
Don't run examples on pandas=2.0 because of xarray incompatibility
hoxbro Apr 8, 2023
a896898
Add ignore_glob to file
hoxbro Apr 8, 2023
f17f33e
Remove version 3.11 safe guard for examples
hoxbro Apr 8, 2023
3ae3385
Add more notebooks to ignore
hoxbro Apr 8, 2023
77ed9af
Better path in conftest
hoxbro Apr 8, 2023
3d5bfe6
Ignore xarray examples for pandas 2
hoxbro Apr 8, 2023
c455113
Only ignore pandas bug on windows
hoxbro Apr 8, 2023
801dd8f
Try again with pandas two as default
hoxbro Apr 19, 2023
e7fb207
Remove pandas pin
hoxbro Apr 19, 2023
10dbd1c
Merge branch 'main' into pandas_2
hoxbro Apr 19, 2023
c0c0c22
Remove Bokeh dev channel
hoxbro Apr 19, 2023
165eae8
Remove pandas 1 from CI
hoxbro Apr 19, 2023
af9e597
Add back xarray examples to CI
hoxbro Apr 19, 2023
b3012d3
Add 15-Large_Data to Pandas2-windows ignore
hoxbro Apr 19, 2023
d2ec764
Clean up
hoxbro Apr 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 2 additions & 11 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ jobs:
# Bokeh 3 does not support Python 3.7
- bokeh-version: '3'
python-version: '3.7'
timeout-minutes: 120 # Because slow conda solve on Python 3.7
timeout-minutes: 120
defaults:
run:
shell: bash -el {0}
Expand All @@ -63,7 +63,7 @@ jobs:
name: unit_test_suite_bokeh${{ matrix.bokeh-version }}
python-version: ${{ matrix.python-version }}
channel-priority: strict
channels: pyviz/label/dev,bokeh/label/dev,conda-forge,nodefaults
channels: pyviz/label/dev,conda-forge,nodefaults
envs: "-o flakes -o tests -o examples_tests -o bokeh${{ matrix.bokeh-version }}"
cache: true
conda-update: true
Expand All @@ -78,20 +78,11 @@ jobs:
conda activate test-environment
doit test_unit
- name: test examples
if: matrix.python-version != '3.11'
run: |
conda activate test-environment
mkdir -p ~/.jupyter/
echo "c.ExecutePreprocessor.startup_timeout=600" >> ~/.jupyter/jupyter_nbconvert_config.py
doit test_examples
- name: test examples - python 3.11
# Should be removed when numba support python 3.11
if: matrix.python-version == '3.11'
run: |
conda activate test-environment
mkdir -p ~/.jupyter/
echo "c.ExecutePreprocessor.startup_timeout=600" >> ~/.jupyter/jupyter_nbconvert_config.py
pytest -n auto --dist loadscope --nbval-lax examples/reference/elements
- name: codecov
run: |
conda activate test-environment
Expand Down
32 changes: 32 additions & 0 deletions examples/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import sys

import pandas as pd
from packaging.version import Version

PD2 = Version(pd.__version__) >= Version("2.0")

collect_ignore_glob = [
# Needs selenium, phantomjs, firefox, and geckodriver to save a png picture
"user_guide/Plotting_with_Bokeh.ipynb",
# Possible timeout error
"user_guide/17-Dashboards.ipynb",
# Give file not found
"user_guide/Plots_and_Renderers.ipynb",
]

# Numba incompatibility
if sys.version_info >= (3, 11):
collect_ignore_glob += [
"user_guide/15-Large_Data.ipynb",
"user_guide/16-Streaming_Data.ipynb",
"user_guide/Linked_Brushing.ipynb",
"user_guide/Network_Graphs.ipynb",
]

# Pandas bug: https://github.com/pandas-dev/pandas/issues/52451
if PD2 and sys.platform == "win32":
collect_ignore_glob += [
"gallery/demos/bokeh/point_draw_triangulate.ipynb",
"reference/elements/*/TriMesh.ipynb",
"user_guide/15-Large_Data.ipynb",
]
2 changes: 1 addition & 1 deletion examples/gallery/demos/bokeh/bars_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/bokeh/dropdown_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/bokeh/scatter_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/bokeh/us_unemployment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"source": [
"from bokeh.sampledata.unemployment1948 import data\n",
"\n",
"data = pd.melt(data.drop('Annual', 1), id_vars='Year', var_name='Month', value_name='Unemployment')\n",
"data = pd.melt(data.drop('Annual', axis=1), id_vars='Year', var_name='Month', value_name='Unemployment')\n",
"heatmap = hv.HeatMap(data, label=\"US Unemployment (1948 - 2013)\")"
]
},
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/matplotlib/bars_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/matplotlib/dropdown_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/matplotlib/scatter_economic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"metadata": {},
"outputs": [],
"source": [
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', '\\t')\n",
"macro_df = pd.read_csv('http://assets.holoviews.org/macro.csv', delimiter='\\t')\n",
"key_dimensions = [('year', 'Year'), ('country', 'Country')]\n",
"value_dimensions = [('unem', 'Unemployment'), ('capmob', 'Capital Mobility'),\n",
" ('gdp', 'GDP Growth'), ('trade', 'Trade')]\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/gallery/demos/matplotlib/us_unemployment.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"\n",
"colors = [\"#75968f\", \"#a5bab7\", \"#c9d9d3\", \"#e2e2e2\", \"#dfccce\", \"#ddb7b1\", \"#cc7878\", \"#933b41\", \"#550b1d\"]\n",
"\n",
"data = pd.melt(data.drop('Annual', 1), id_vars='Year', var_name='Month', value_name='Unemployment')\n",
"data = pd.melt(data.drop('Annual', axis=1), id_vars='Year', var_name='Month', value_name='Unemployment')\n",
"\n",
"heatmap = hv.HeatMap(data, label=\"US Unemployment (1948 - 2013)\")"
]
Expand Down
2 changes: 1 addition & 1 deletion holoviews/core/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -2108,7 +2108,7 @@ def dt_to_int(value, time_unit='us'):
try:
value = np.datetime64(value, 'ns')
tscale = (np.timedelta64(1, time_unit)/np.timedelta64(1, 'ns'))
return value.tolist()/tscale
return int(value.tolist() / tscale)
except Exception:
# If it can't handle ns precision fall back to datetime
value = value.tolist()
Expand Down
2 changes: 1 addition & 1 deletion holoviews/element/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ def connect_tri_edges_pd(trimesh):
edges = edges.drop("color", errors="ignore", axis=1).reset_index()
nodes = trimesh.nodes.dframe().copy()
nodes.index.name = 'node_index'
nodes = nodes.drop("color", errors="ignore", axis=1)
nodes = nodes.drop(["color", "z"], errors="ignore", axis=1)
v1, v2, v3 = trimesh.kdims
x, y, idx = trimesh.nodes.kdims[:3]

Expand Down
44 changes: 23 additions & 21 deletions holoviews/operation/datashader.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,11 +222,11 @@ def _get_sampling(self, element, x, y, ndim=2, default=None):
def _dt_transform(self, x_range, y_range, xs, ys, xtype, ytype):
(xstart, xend), (ystart, yend) = x_range, y_range
if xtype == 'datetime':
xstart, xend = (np.array([xstart, xend])/1e3).astype('datetime64[us]')
xs = (xs/1e3).astype('datetime64[us]')
xstart, xend = np.array([xstart, xend]).astype('datetime64[ns]')
xs = xs.astype('datetime64[ns]')
if ytype == 'datetime':
ystart, yend = (np.array([ystart, yend])/1e3).astype('datetime64[us]')
ys = (ys/1e3).astype('datetime64[us]')
ystart, yend = np.array([ystart, yend]).astype('datetime64[ns]')
ys = ys.astype('datetime64[ns]')
return ((xstart, xend), (ystart, yend)), (xs, ys)


Expand Down Expand Up @@ -528,9 +528,9 @@ def _process(self, element, key=None):
if 'x_axis' in agg.coords and 'y_axis' in agg.coords:
agg = agg.rename({'x_axis': x, 'y_axis': y})
if xtype == 'datetime':
agg[x.name] = (agg[x.name]/1e3).astype('datetime64[us]')
agg[x.name] = agg[x.name].astype('datetime64[ns]')
if ytype == 'datetime':
agg[y.name] = (agg[y.name]/1e3).astype('datetime64[us]')
agg[y.name] = agg[y.name].astype('datetime64[ns]')

if agg.ndim == 2:
# Replacing x and y coordinates to avoid numerical precision issues
Expand Down Expand Up @@ -700,7 +700,7 @@ def _process(self, element, key=None):

agg = cvs.area(df, x.name, y.name, agg_fn, axis=0, y_stack=ystack)
if xtype == "datetime":
agg[x.name] = (agg[x.name]/1e3).astype('datetime64[us]')
agg[x.name] = agg[x.name].astype('datetime64[ns]')

return self.p.element_type(agg, **params)

Expand Down Expand Up @@ -781,7 +781,7 @@ def _process(self, element, key=None):
df['y0'] = np.array(0, df.dtypes[y.name])
yagg = ['y0', y.name]
if xtype == 'datetime':
df[x.name] = cast_array_to_int64(df[x.name].astype('datetime64[us]'))
df[x.name] = cast_array_to_int64(df[x.name].astype('datetime64[ns]'))

params = self._get_agg_params(element, x, y, agg_fn, (x0, y0, x1, y1))

Expand All @@ -797,7 +797,7 @@ def _process(self, element, key=None):

agg = cvs.line(df, x.name, yagg, agg_fn, axis=1, **agg_kwargs).rename(rename_dict)
if xtype == "datetime":
agg[x.name] = (agg[x.name]/1e3).astype('datetime64[us]')
agg[x.name] = agg[x.name].astype('datetime64[ns]')

return self.p.element_type(agg, **params)

Expand All @@ -821,12 +821,14 @@ def _process(self, element, key=None):
((x0, x1), (y0, y1)), (xs, ys) = self._dt_transform(x_range, y_range, xs, ys, xtype, ytype)

df = element.interface.as_dframe(element)
if xtype == 'datetime' or ytype == 'datetime':
df = df.copy()
if xtype == 'datetime':
df[x0d.name] = cast_array_to_int64(df[x0d.name].astype('datetime64[us]'))
df[x1d.name] = cast_array_to_int64(df[x1d.name].astype('datetime64[us]'))
df[x0d.name] = cast_array_to_int64(df[x0d.name].astype('datetime64[ns]'))
df[x1d.name] = cast_array_to_int64(df[x1d.name].astype('datetime64[ns]'))
if ytype == 'datetime':
df[y0d.name] = cast_array_to_int64(df[y0d.name].astype('datetime64[us]'))
df[y1d.name] = cast_array_to_int64(df[y1d.name].astype('datetime64[us]'))
df[y0d.name] = cast_array_to_int64(df[y0d.name].astype('datetime64[ns]'))
df[y1d.name] = cast_array_to_int64(df[y1d.name].astype('datetime64[ns]'))

if isinstance(agg_fn, ds.count_cat) and df[agg_fn.column].dtype.name != 'category':
df[agg_fn.column] = df[agg_fn.column].astype('category')
Expand All @@ -843,9 +845,9 @@ def _process(self, element, key=None):

xdim, ydim = list(agg.dims)[:2][::-1]
if xtype == "datetime":
agg[xdim] = (agg[xdim]/1e3).astype('datetime64[us]')
agg[xdim] = agg[xdim].astype('datetime64[ns]')
if ytype == "datetime":
agg[ydim] = (agg[ydim]/1e3).astype('datetime64[us]')
agg[ydim] = agg[ydim].astype('datetime64[ns]')

params['kdims'] = [xdim, ydim]

Expand Down Expand Up @@ -1017,9 +1019,9 @@ def _process(self, element, key=None):

# Convert datetime coordinates
if xtype == "datetime":
rarray[x.name] = (rarray[x.name]/1e3).astype('datetime64[us]')
rarray[x.name] = rarray[x.name].astype('datetime64[ns]')
if ytype == "datetime":
rarray[y.name] = (rarray[y.name]/1e3).astype('datetime64[us]')
rarray[y.name] = rarray[y.name].astype('datetime64[ns]')
regridded[vd] = rarray
regridded = xr.Dataset(regridded)

Expand Down Expand Up @@ -1197,9 +1199,9 @@ def _process(self, element, key=None):
info = self._get_sampling(element, x, y)
(x_range, y_range), (xs, ys), (width, height), (xtype, ytype) = info
if xtype == 'datetime':
data[x.name] = data[x.name].astype('datetime64[us]').astype('int64')
data[x.name] = data[x.name].astype('datetime64[ns]').astype('int64')
if ytype == 'datetime':
data[y.name] = data[y.name].astype('datetime64[us]').astype('int64')
data[y.name] = data[y.name].astype('datetime64[ns]').astype('int64')

# Compute bounds (converting datetimes)
((x0, x1), (y0, y1)), (xs, ys) = self._dt_transform(
Expand All @@ -1218,9 +1220,9 @@ def _process(self, element, key=None):
agg = cvs.quadmesh(data[vdim], x.name, y.name, agg_fn)
xdim, ydim = list(agg.dims)[:2][::-1]
if xtype == "datetime":
agg[xdim] = (agg[xdim]/1e3).astype('datetime64[us]')
agg[xdim] = agg[xdim].astype('datetime64[ns]')
if ytype == "datetime":
agg[ydim] = (agg[ydim]/1e3).astype('datetime64[us]')
agg[ydim] = agg[ydim].astype('datetime64[ns]')

return Image(agg, **params)

Expand Down
17 changes: 17 additions & 0 deletions holoviews/tests/core/data/test_daskinterface.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from unittest import SkipTest
import unittest

import numpy as np
import pandas as pd
from packaging.version import Version

try:
import dask.dataframe as dd
Expand All @@ -10,6 +12,7 @@

from holoviews.core.data import Dataset
from holoviews.util.transform import dim
from holoviews.core.util import pandas_version

from .test_pandasinterface import BasePandasInterfaceTests

Expand Down Expand Up @@ -73,6 +76,20 @@ def test_dataset_2D_partial_reduce_ht(self):
def test_dataset_aggregate_string_types(self):
raise SkipTest("Temporarily skipped")

@unittest.skipIf(
pandas_version >= Version("2.0"),
reason="Not supported yet, https://github.com/dask/dask/issues/9913"
)
def test_dataset_aggregate_ht(self):
super().test_dataset_aggregate_ht()

@unittest.skipIf(
pandas_version >= Version("2.0"),
reason="Not supported yet, https://github.com/dask/dask/issues/9913"
)
def test_dataset_aggregate_ht_alias(self):
super().test_dataset_aggregate_ht_alias()

def test_dataset_from_multi_index(self):
raise SkipTest("Temporarily skipped")
df = pd.DataFrame({'x': np.arange(10), 'y': np.arange(10), 'z': np.random.rand(10)})
Expand Down
4 changes: 1 addition & 3 deletions holoviews/tests/plotting/bokeh/test_callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -406,10 +406,8 @@ def test_cds_resolves(self):
self.assertEqual(resolved, {'id': cds.ref['id'],
'value': points.columns()})

@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_rangexy_datetime(self):
# Raises a warning because makeTimeDataFrame isn't part of the public API.
curve = Curve(pd.util.testing.makeTimeDataFrame(), 'index', 'C')
curve = Curve(pd._testing.makeTimeDataFrame(), 'index', 'C')
stream = RangeXY(source=curve)
plot = bokeh_server_renderer.get_plot(curve)
callback = plot.callbacks[0]
Expand Down
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ namespace_map =
ibis-framework=ibis-sqlite
; dask pins to bokeh<3 right now
dask=dask-core
geoviews=geoviews-core
5 changes: 1 addition & 4 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,7 @@ commands = pytest holoviews --cov=./holoviews -vv
[_examples]
description = Test that default examples run
deps = .[examples_tests]
commands = pytest -n auto --dist loadscope --nbval-lax examples --force-flaky --max-runs=5 -k "not Plotting_with_Bokeh and not 17-Dashboards and not Plots_and_Renderers"
# 'Plotting_with_Bokeh' needs selenium, phantomjs, firefox and geckodriver to save a png picture.
# '17-Dashboards' can give a timeout error.
# 'Plots_and_Renderers' can give file not found here.
commands = pytest -n auto --dist loadscope --nbval-lax examples --force-flaky --max-runs=5

[_all_recommended]
description = Run all recommended tests
Expand Down