-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of repr of large arrays, particularly jupyter repr #4789
Comments
I uncovered this issue with Dask's SVG in its |
One quick observation is that it's related to the MultiIndex — if we swap out the index for |
The rabbit hole went deeper than I expected. I need to sign off now, but leaving what I have in case someone else has some insight. Essentially, we call I think we can probably do something smarter to only call this on the first & last items in the MultiIndex. For reference, here's the output of line_profiler, a good profiler for figuring this sort of thing out:
|
that seems to be the main issue. With diff --git a/xarray/core/formatting.py b/xarray/core/formatting.py
index 282620e3..f825ed85 100644
--- a/xarray/core/formatting.py
+++ b/xarray/core/formatting.py
@@ -300,9 +300,11 @@ def _summarize_coord_multiindex(coord, col_width, marker):
def _summarize_coord_levels(coord, col_width, marker="-"):
+ indices = list(range(10)) + list(range(-10, 0))
+ subset = coord[indices]
return "\n".join(
summarize_variable(
- lname, coord.get_level_variable(lname), col_width, marker=marker
+ lname, subset.get_level_variable(lname), col_width, marker=marker
)
for lname in coord.level_names
) I get a speed up of about 180x (for |
Yes great, I think that would be a great cut-through solution! |
What happened:
The
_repr_html_
method of large arrays seems very slow — 4.78s in the case of a 100m value array; and the generalrepr
seems fairly slow — 1.87s. Here's a quick example. I haven't yet investigated how dependent it is on there being aMultiIndex
.What you expected to happen:
We should really focus on having good repr performance, given how essential it is to any REPL workflow.
Minimal Complete Verifiable Example:
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.7 (default, Dec 30 2020, 10:13:08)
[Clang 12.0.0 (clang-1200.0.32.28)]
python-bits: 64
OS: Darwin
OS-release: 19.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: None
libnetcdf: None
xarray: 0.16.3.dev48+gbf0fe2ca
pandas: 1.1.3
numpy: 1.19.2
scipy: 1.5.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.5.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: None
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: installed
pint: 0.16.1
setuptools: 51.1.1
pip: 20.3.3
conda: None
pytest: 6.1.1
IPython: 7.19.0
sphinx: None
The text was updated successfully, but these errors were encountered: