WIP: html repr #1820

benbovy · 2018-01-11T16:33:07Z

Closes html repr of xarray object (for the notebook) #1627
Tests added
Tests passed
Passes git diff upstream/master **/*py | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

This is work in progress, although the basic functionality is there. You can see a preview here:
http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2

TODO:

Add support for Multi-indexes
Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
Add some tests

Nice to have (keep this for later):

Clean-up CSS code and HTML template (track CSS subgrid support in browsers, this may simplify a lot the things here).
Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here.
Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS.
Integration of Dask arrays HTML repr (+ integration of repr for other array backends).
Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice).
Review the rules for collapsing the Coordinates, Data variables and Attributes sections (maybe expose them as global options).
Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long.
Maybe add rich representation for Dataset.coords and Dataset.data_vars as well?

Other thoughts (old)

A big challenge here is to provide both robust and flexible styling (CSS):

I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good!
However, the current CSS code is a bit fragile (I had to add a lot of !important). Probably this could be a bit cleaned and optimized (unfortunately my CSS skills are limited).
Also, with the jupyterlab's dark theme it looks ugly. We probably need to use jupyterlab CSS variables so that our CSS scheme is compatible with the theme machinery, but at the same time we need to support other front-ends. So we probably need to maintain different stylings (i.e., multiple CSS files, one of them picked-up depending on the front-end), though I don't know if it's easy to automatically detect the front-end (choosing a default style is difficult too).
The notebook rendering on Github seems to disable style tags (no style is applied to the output, see https://gist.github.com/benbovy/3009f342fb283bd0288125a1f7883ef2). Output is not readable at all in this case, so it might be useful to allow turning off rich output as an option.

benbovy · 2018-01-12T18:07:25Z

The sizing of variable name and dimensions columns according to their content is tricky because (1) we want these columns be aligned between different sections (Coordinates and Data variables) and also aligned with the list of dimension labels in the Dimension section, (2) there are subsections for variable attributes and data repr, which both take 100 % width and (3) we want the sections and subsections to be collapsible/expandable but we are limited by a pure html/css solution.

While using <table> is usually appropriate for rendering that kind of content, for the reasons above I don't think it is possible here, unfortunately. Unless someone has a better idea, I don't see any other option than calculating a fixed size for the columns before rendering.

But I don't see neither any robust way to calculate these sizes. One option could be to use tkinter, e.g.,

>>> import tkinter as tk
>>> from tkinter import font
>>> tk.Tk()
>>> root = tk.Tk()
>>> front_end_font = font.Font(family='Helvetica', size=11, weight='bold')
>>> front_end_font.measure("variable_name")
76
>>> root.destroy()

That's not very elegant to say the least, but it has the advantage of being part of the Python standard library. The problem is that we don't know the font-family and font-size. We could define it explicitly in the CSS code but it's better to inherit it from the notebook front-ends (in some cases it is dynamically defined, e.g., the jupyterlab presentation mode). So a workaround might be to use a common font which has wide characters to calculate the width + add a good safety margin.

If anyone has a better idea, e.g., a layout using some kind of smart CSS grid system... that would be great!

@rgbkrk? @ellisonbg?

shoyer · 2018-01-12T18:53:42Z

While using <table> is usually appropriate for rendering that kind of content, for the reasons above I don't think it is possible here, unfortunately.

I'm not sure I follow here. It's been a while since I wrote much html, but I would think you could achieve this using a table with colspan? I'm not sure about the expandable/hide-able part though.

But I don't see neither any robust way to calculate these sizes. One option could be to use tkinter

I think we should probably avoid adding a tkinter dependency. I would rather assume a fixed column-width for the first column.

rgbkrk · 2018-01-12T19:22:17Z

Something like this with a table?

https://codepen.io/rgbkrk/pen/XVYpEE

You'll have to embed some JS to do it though (I'm using jquery here, you could write it with document.querySelectorAll and some change handlers here).

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

benbovy · 2018-01-12T20:36:20Z

I would think you could achieve this using a table with colspan? I'm not sure about the expandable/hide-able part though.

Yes we could, but I was indeed thinking more about the expandable/hide-able part. With pure html/css the hidden/shown container must be child or sibling of its controller, and I don't know how to achieve that with our current layout design using a table.

I think we should probably avoid adding a tkinter dependency. I would rather assume a fixed column-width for the first column.

Even considering that tkinter is already shipped with CPython as part of the standard library? My concern with an arbitraily fixed column-width is that it should be wide enough to cover a reasonable range of use cases, but when the variable names are really short (it occurs often in examples, e.g., 'foo', 'x', 'y'...) it won't look very nice (I haven't tested it yet, though). I guess we can also calculate the width by hand considering the worst case scenario in order to have a good margin...

Something like this with a table? [...] You'll have to embed some JS to do it though

That solution (JS included) would be nice if we can support all notebook front-ends without any extra installation or configuration step.

shoyer · 2018-01-12T23:12:21Z

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

Is this something CSS grid would solve? Or is it not clear yet?

Even considering that tkinter is already shipped with CPython as part of the standard library?

Yes, but that doesn't mean it's actually bundled into every Python install. For example, it requires a separate package on Ubuntu: https://stackoverflow.com/questions/34890383/python3-tkinter-ubuntu-trusty-does-not-work-under-virtual-environment

My bigger concern is that it feels hacky and might be slow.

benbovy · 2018-01-13T00:41:39Z

My bigger concern is that it feels hacky and might be slow.

Agreed! Moreover, I have a "Python" icon appearing in the MacOS Dock, which I think it's caused by initializing tk. That's bad!

I played a bit and it seems feasible to estimate an approximate relationship between the text width and the number of characters (see https://gist.github.com/benbovy/fce796c663728b1bdbb3f1514daa458c -- it's a very naive approach, though).

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

I don't know much about it, but it seems very powerful. That would be the cleanest solution. I'll take a look.

benbovy · 2018-01-14T22:03:57Z

I re-implemented the Dataset repr using CSS grid (https://jsfiddle.net/Lmqq7yzz/9/), which I think is much cleaner for column widths that fit the content.

However, one big limitation is that it's currently compatible only in Firefox! Because we want the columns in different sections aligned, I had to define a single grid at the top level and then use display: contents so that all the nested children elements can be positioned using this same grid. Hopefully this will be soon supported in Chrome and Safari (https://caniuse.com/#feat=css-display-contents).

Two other, smaller issues:

Column-width may change on section expand/collapse as apparently it is re-calculated with the visible elements only. This is a bit annoying.
I couldn't get working highlighted rows on hover in this implementation.

Note : in the link above, I changed a bit the design. Variable attributes and data repr can now be show/hidden using clickable icons on the right (tooltips are still needed). This is better from a UX point of view, IMO.

EDIT: tooltips would be also very useful to show full variable names and/or lists of dimensions when these are truncated.

rgbkrk · 2018-01-15T00:41:09Z

Wow, that does work really well on Firefox.

shoyer · 2018-01-16T08:21:00Z

It looks like CSS grid is coming to Chrome very soon -- the relevant bug is now listed as fixed.

shoyer · 2018-01-25T19:12:48Z

It looks like this will make it into Chrome stable by roughly mid-March 2018: https://www.chromium.org/developers/calendar

If we're on Chrome and Firefox, that's probably good enough. We still might want to have an option that makes this easy to turn on/off (default value TBD).

benbovy · 2018-01-25T19:18:46Z

I'll try if we can have good results using fixed columns widths (thus not using display: content)...

shoyer · 2018-07-17T20:50:01Z

I played around a little with using text-overflow for truncation. That seems like an elegant way to handle cases where a simple heuristic fails:
https://jsfiddle.net/nkezu9wq/

I'm sure we could figure out some better CSS magic that shows the full variable name when you hover over it.

rabernat · 2019-01-11T08:47:50Z

Let's revive this excellent idea!

In particular, I would be interested in using the HTML repr on its own in conjunction with #2659 (dict / json serialization of dataset schema). If we could develop a standalone html repr function that interprets the output of dataset.to_dict(data=False) (or maybe dataset.to_dict(data='preview')), this would be very useful for pangeo-data/pangeo-datastore#1.

benbovy · 2019-02-28T16:01:18Z

We could also borrow ideas from https://github.com/agoose77/numpy-html or xtensor-stack/xframe@90638ec for displaying the data of each variable here.

shoyer · 2019-02-28T16:58:07Z

is there an example of what the xframe output HTML looks like?

benbovy · 2019-02-28T17:22:16Z

You can see it by running the xframe example notebook with binder. It actually looks very much like pandas dataframe (with "multi-index" rows for ndims > 2), with some hover effects showing the coordinates names/values at data elements.

The output of xtensor objects is slightly different but interesting too, with nested tables (xtensor's binder). I haven't checked if numpy-html provides the same output.

rabernat · 2019-03-26T19:32:30Z

Hi @benbovy - how can we convince you to work more on this amazing idea? What help / support do you need from other xarray devs?

StanczakDominik · 2019-03-28T12:28:33Z

I just came by to say that the attached sample notebook is very, very pretty, and I would love to see this line of work continue!

shoyer · 2019-03-28T18:34:51Z

I did a little more tweaking of text-overflow for truncation. This version shows the full name when you hover over it: https://jsfiddle.net/1g04ykum/

benbovy · 2019-03-29T14:30:35Z

I did a little more tweaking of text-overflow for truncation. This version shows the full name when you hover over it: https://jsfiddle.net/1g04ykum/

Nice!

Hi @benbovy - how can we convince you to work more on this amazing idea? What help / support do you need from other xarray devs?

I'd really like to see this finally happen soon, especially that've I already spend a good amount of time on it (a while ago, I admit). But honestly (and sadly), it's been hard for me to find free time to continue the work on this feature. I'm sorry for that.

I'm also a bit worried by the things (mostly related to compatibility with notebook front-ends and themes) that we'll need to support/fix quickly when this will be ready. Maybe we should make it opt-in for one or two releases.

I would be extremely pleased if anyone is willing to jump in and help on the front-end part (HTML/CSS)! See the checklist at the top of this PR. Unfortunately, my limited expertise in this area makes me rather unproductive.

fmaussion · 2019-03-29T14:52:19Z

I would be extremely pleased if anyone is willing to jump in and help on the front-end part (HTML/CSS)! See the checklist at the top of this PR. Unfortunately, my limited expertise in this area makes me rather unproductive.

yeah, I guess this is the major issue here. Who could we get in to help out? Does @pydata/xarray know anyone from the extended community with an interest in these things?

dcherian · 2019-03-29T14:52:26Z

Should we email the "announce" list and ask for help.

fmaussion · 2019-03-29T17:24:17Z

@benbovy is the checklist still up-to-date? The length of it is a bit scary TBH ;-)

rabernat · 2019-03-29T17:39:58Z

Perhaps we could leverage our recently formed links between Pangeo and the Jupyter folks to help confront these front-end issues, in which we have limited expertise as a project.

@ian-r-rose, a developer of jupyter server extensions, has been a very helpful resource. Maybe he could give us some advice?

benbovy · 2019-03-31T09:51:30Z

is the checklist still up-to-date? The length of it is a bit scary TBH ;-)

Yes it is still up-to-date :-)

But this list is exhaustive and a lot of things could be saved for later! Some of the items are easy to implement but require a decision.

TomNicholas · 2019-07-09T13:39:20Z

I was just following the new draft dask repr, and it seems the tools are in place to be able to autogenerate a html repr of a full xarray dataset which includes an image, e.g. autogenerate something like:

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later? The main thing that has to be done before merging is tests? If that bare-bones version gets merged (even as a hidden feature) then others can start having a go at adding images like dask?

benbovy · 2019-07-10T07:17:29Z

Ooh that's nice! Iris and zarr html representations look nice too (i hadn't followed those developments), definitely some good ideas for the xarray html repr! I think the dask and zarr html outputs would integrate very well with the repr here and it would be quite straightforward to encapsulate it in the drop-down html containers of each coordinate / data variable here.

I also like the idea of the summary image like shown above, although this could be harder to achieve.

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later?

Yes, actually most of the work is done. I was mainly worried by how the html repr would look in the different notebook front-ends, but now that other projects (dask, iris, zarr) have such repr, it looks like there's is no major issue. I also struggled with grid column resizing for correctly displaying the variable names, but I think that @shoyer's suggestion https://jsfiddle.net/1g04ykum/ is good enough for now.

Carreau · 2019-07-14T21:18:08Z

I'm just starting to look at this, was there any experiment with the html "detail" and "summary" pairs ?

They are made to do collapsible sections, and will likely allow to get rid of (some of) the UUID logic.

Here is a full example of a summary section.

%%html
<style>
details[open] >summary > .info{
    display:none;
}
</style>
<details>
    <summary>Coordinate <span class='info'>(hidden when expanded)</span>:</summary>
    Your actual nested content.
</details>

Carreau · 2019-07-14T21:21:50Z

Sidenote: the css is not injected at load time when the notebook is not trusted, so the REPRs may looked garbled.

shoyer · 2019-07-14T22:14:58Z

Details/Summary does look like a nice way to simplify things!

It's too bad that CSS isn't processed with untrusted inputs. How do Iris and Dask deal with this limitation?

benbovy · 2019-07-14T22:45:39Z

was there any experiment with the html "detail" and "summary" pairs ?

I agree it would highly simplify the HTML code, but when I tried it things were not that easy (I don't remember exactly what, I think it had to do with alignment of nested lists) and I had some weird issues with conflicts between HTML reprs in different output cells. See: jupyterlab/jupyterlab#3200 (comment) and the comment below. Probably I'm missing something obvious?

the css is not injected at load time when the notebook is not trusted

How do Iris and Dask deal with this limitation?

I've quickly checked the related PRs dask/dask#4794 and SciTools/iris#2918. Dask adds style attributes to HTML elements while Iris seems to encapsulate a <style> element in every repr.

mrocklin · 2019-07-20T22:47:57Z

It's too bad that CSS isn't processed with untrusted inputs. How do Iris and Dask deal with this limitation?

Yeah, we just use raw HTML

mrocklin · 2019-07-20T22:48:30Z

I'll say that I'm looking forward to this getting in, mostly so that I can raise an issue about adding Dask's chunked array images :)

SimonHeybrock · 2019-08-30T09:55:16Z

I was just following the new draft dask repr, and it seems the tools are in place to be able to autogenerate a html repr of a full xarray dataset which includes an image, e.g. autogenerate something like:

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later? The main thing that has to be done before merging is tests? If that bare-bones version gets merged (even as a hidden feature) then others can start having a go at adding images like dask?

We have done something similar using inline svg (see, e.g., https://scipp.readthedocs.io/en/latest/user-guide/data-structures.html#Dataset). It is basically a hack for testing right now, but is sufficient for auto-generated illustration in the documentation.

I am pretty impressed by the html representation previewed in #1627. Since our data structures are very similar I would be happy to contribute to this output rendering somehow, since we could then also benefit from it (with a few tweaks, probably). So let me know if I can help out somehow (unfortunately I do not know much html and css, just C++ and a bit of Python).

shoyer · 2019-08-30T20:02:24Z

@SimonHeybrock very cool to see your Scipp project! I will make some comments over in your repo but I'm impressed with what you've done. I'd love to find ways to collaborate more in the future, many of the problems you're solving are also important for xarray users.

jsignell · 2019-10-16T15:45:56Z

Is there anything that I can do to help get this PR in? Are the items on the TODO list prioritized? One minor comment is that in terms of style for overflow, it might be more legible if the var_names were bolded on hover (fiddle), although that might make them look clickable.

jhamman · 2019-10-16T16:21:29Z

Hi @jsignell - it would be great if someone could pick this up. From my perspective, I'd like to get a minimum viable implementation out in the wild. With this in mind, I feel like some of the checklist should be moved to follow up issues. @benbovy thoughts?

benbovy · 2019-10-17T14:10:45Z

@jsignell feel free to pick this up, that would be great if you could make this finally happen! (Again, I'm sorry for letting this sit so long).

I'm going to edit the checklist in my 1st comment. There is indeed a lot of things that we can move to follow up issues.

jsignell · 2019-10-17T21:10:56Z

Ok thanks! I'll get cracking :)

jsignell · 2019-10-21T19:44:52Z

The last fiddle and this PR seem fairly different. Does the fiddle have the most up-to-date hierarchy or is it just somewhere where people were playing around with ideas (in which case I should see what is improved and try to pull those bits of css)?

benbovy added 6 commits January 11, 2018 07:37

add CSS style and internal functions for html repr

6781d52

move CSS code to its own file in a new static directory

8b98c4c

add repr of array objects + some refactoring and fixes

336d522

add _repr_html_ methods to dataset, dataarray and variable

e97a7a2

fix encoding issue in read CSS

99f733b

fix some CSS for compatibility with notebook (tested 5.2)

ac1c189

use CSS grid + add icons to show/hide attrs and data repr

17de08b

benbovy mentioned this pull request Mar 9, 2018

WIP: progress toward making groupby work with multiple arguments #924

Closed

benbovy mentioned this pull request Jul 14, 2018

html repr of xarray object (for the notebook) #1627

Closed

jakirkham mentioned this pull request Dec 16, 2018

HTML jstree doesn't work in jupyter lab zarr-developers/zarr-python#259

Closed

shoyer mentioned this pull request May 13, 2019

Add draft of Array._repr_html_ dask/dask#4794

Merged

TomNicholas mentioned this pull request Jul 12, 2019

[Feature Request] Visualizing dimensions #2175

Closed

SimonHeybrock mentioned this pull request Aug 30, 2019

_repr_html_ for improved rendering in Jupyter notebooks scipp/scipp#496

Closed

jsignell mentioned this pull request Oct 21, 2019

Html repr #3425

Merged

7 tasks

dcherian closed this in #3425 Oct 24, 2019

benbovy deleted the html_repr branch October 25, 2019 15:06

benbovy mentioned this pull request May 11, 2020

expanded HTML repr when opening notebook #4041

Closed

WIP: html repr #1820

WIP: html repr #1820

Conversation

benbovy commented Jan 11, 2018 • edited Loading

benbovy commented Jan 12, 2018

shoyer commented Jan 12, 2018 • edited Loading

rgbkrk commented Jan 12, 2018

benbovy commented Jan 12, 2018 • edited Loading

shoyer commented Jan 12, 2018

benbovy commented Jan 13, 2018

benbovy commented Jan 14, 2018 • edited Loading

rgbkrk commented Jan 15, 2018

shoyer commented Jan 16, 2018

shoyer commented Jan 25, 2018

benbovy commented Jan 25, 2018

shoyer commented Jul 17, 2018

rabernat commented Jan 11, 2019

benbovy commented Feb 28, 2019 • edited Loading

shoyer commented Feb 28, 2019

benbovy commented Feb 28, 2019

rabernat commented Mar 26, 2019

StanczakDominik commented Mar 28, 2019

shoyer commented Mar 28, 2019

benbovy commented Mar 29, 2019

fmaussion commented Mar 29, 2019

dcherian commented Mar 29, 2019

fmaussion commented Mar 29, 2019

rabernat commented Mar 29, 2019

benbovy commented Mar 31, 2019 • edited Loading

TomNicholas commented Jul 9, 2019

benbovy commented Jul 10, 2019

Carreau commented Jul 14, 2019

Carreau commented Jul 14, 2019

shoyer commented Jul 14, 2019

benbovy commented Jul 14, 2019

mrocklin commented Jul 20, 2019

mrocklin commented Jul 20, 2019

SimonHeybrock commented Aug 30, 2019

shoyer commented Aug 30, 2019

jsignell commented Oct 16, 2019

jhamman commented Oct 16, 2019

benbovy commented Oct 17, 2019

jsignell commented Oct 17, 2019

jsignell commented Oct 21, 2019

benbovy commented Jan 11, 2018 •

edited

Loading

shoyer commented Jan 12, 2018 •

edited

Loading

benbovy commented Jan 12, 2018 •

edited

Loading

benbovy commented Jan 14, 2018 •

edited

Loading

benbovy commented Feb 28, 2019 •

edited

Loading

benbovy commented Mar 31, 2019 •

edited

Loading