Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: html repr #1820

Closed
wants to merge 7 commits into from
Closed

WIP: html repr #1820

wants to merge 7 commits into from

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Jan 11, 2018

This is work in progress, although the basic functionality is there. You can see a preview here:
http://nbviewer.jupyter.org/gist/benbovy/3009f342fb283bd0288125a1f7883ef2

TODO:

  • Add support for Multi-indexes
  • Probably good to have some opt-in or fail back system in case where we (or users) know that the rendering will not work
  • Add some tests

Nice to have (keep this for later):

  • Clean-up CSS code and HTML template (track CSS subgrid support in browsers, this may simplify a lot the things here).
  • Dynamically adapt cell widths (given the length of the names of variables and dimensions). Currently all cells have a fixed width. This is tricky, though, as we don't use a monospace font here.
  • Integration with jupyterlab/notebook themes (CSS classes) and maybe allow custom CSS.
  • Integration of Dask arrays HTML repr (+ integration of repr for other array backends).
  • Maybe find a way (if possible) to include CSS only once in the notebook (currently it is included each time a xarray object is displayed in an output cell, which is not very nice).
  • Review the rules for collapsing the Coordinates, Data variables and Attributes sections (maybe expose them as global options).
  • Maybe also define some rules to collapse automatically the data section (DataArray and Variable) when the data repr is too long.
  • Maybe add rich representation for Dataset.coords and Dataset.data_vars as well?
Other thoughts (old)

A big challenge here is to provide both robust and flexible styling (CSS):

  • I have tested the current styling in jupyterlab (0.30.6, light theme), notebook (5.2.2) and nbviewer: despite some slight differences it looks quite good!
  • However, the current CSS code is a bit fragile (I had to add a lot of !important). Probably this could be a bit cleaned and optimized (unfortunately my CSS skills are limited).
  • Also, with the jupyterlab's dark theme it looks ugly. We probably need to use jupyterlab CSS variables so that our CSS scheme is compatible with the theme machinery, but at the same time we need to support other front-ends. So we probably need to maintain different stylings (i.e., multiple CSS files, one of them picked-up depending on the front-end), though I don't know if it's easy to automatically detect the front-end (choosing a default style is difficult too).
  • The notebook rendering on Github seems to disable style tags (no style is applied to the output, see https://gist.github.com/benbovy/3009f342fb283bd0288125a1f7883ef2). Output is not readable at all in this case, so it might be useful to allow turning off rich output as an option.

@benbovy
Copy link
Member Author

benbovy commented Jan 12, 2018

The sizing of variable name and dimensions columns according to their content is tricky because (1) we want these columns be aligned between different sections (Coordinates and Data variables) and also aligned with the list of dimension labels in the Dimension section, (2) there are subsections for variable attributes and data repr, which both take 100 % width and (3) we want the sections and subsections to be collapsible/expandable but we are limited by a pure html/css solution.

While using <table> is usually appropriate for rendering that kind of content, for the reasons above I don't think it is possible here, unfortunately. Unless someone has a better idea, I don't see any other option than calculating a fixed size for the columns before rendering.

But I don't see neither any robust way to calculate these sizes. One option could be to use tkinter, e.g.,

>>> import tkinter as tk
>>> from tkinter import font
>>> tk.Tk()
>>> root = tk.Tk()
>>> front_end_font = font.Font(family='Helvetica', size=11, weight='bold')
>>> front_end_font.measure("variable_name")
76
>>> root.destroy()

That's not very elegant to say the least, but it has the advantage of being part of the Python standard library. The problem is that we don't know the font-family and font-size. We could define it explicitly in the CSS code but it's better to inherit it from the notebook front-ends (in some cases it is dynamically defined, e.g., the jupyterlab presentation mode). So a workaround might be to use a common font which has wide characters to calculate the width + add a good safety margin.

If anyone has a better idea, e.g., a layout using some kind of smart CSS grid system... that would be great!

@rgbkrk? @ellisonbg?

@shoyer
Copy link
Member

shoyer commented Jan 12, 2018

While using <table> is usually appropriate for rendering that kind of content, for the reasons above I don't think it is possible here, unfortunately.

I'm not sure I follow here. It's been a while since I wrote much html, but I would think you could achieve this using a table with colspan? I'm not sure about the expandable/hide-able part though.

But I don't see neither any robust way to calculate these sizes. One option could be to use tkinter

I think we should probably avoid adding a tkinter dependency. I would rather assume a fixed column-width for the first column.

@rgbkrk
Copy link

rgbkrk commented Jan 12, 2018

Something like this with a table?

https://codepen.io/rgbkrk/pen/XVYpEE

You'll have to embed some JS to do it though (I'm using jquery here, you could write it with document.querySelectorAll and some change handlers here).

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

@benbovy
Copy link
Member Author

benbovy commented Jan 12, 2018

I would think you could achieve this using a table with colspan? I'm not sure about the expandable/hide-able part though.

Yes we could, but I was indeed thinking more about the expandable/hide-able part. With pure html/css the hidden/shown container must be child or sibling of its controller, and I don't know how to achieve that with our current layout design using a table.

I think we should probably avoid adding a tkinter dependency. I would rather assume a fixed column-width for the first column.

Even considering that tkinter is already shipped with CPython as part of the standard library? My concern with an arbitraily fixed column-width is that it should be wide enough to cover a reasonable range of use cases, but when the variable names are really short (it occurs often in examples, e.g., 'foo', 'x', 'y'...) it won't look very nice (I haven't tested it yet, though). I guess we can also calculate the width by hand considering the worst case scenario in order to have a good margin...

Something like this with a table? [...] You'll have to embed some JS to do it though

That solution (JS included) would be nice if we can support all notebook front-ends without any extra installation or configuration step.

@shoyer
Copy link
Member

shoyer commented Jan 12, 2018

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

Is this something CSS grid would solve? Or is it not clear yet?

Even considering that tkinter is already shipped with CPython as part of the standard library?

Yes, but that doesn't mean it's actually bundled into every Python install. For example, it requires a separate package on Ubuntu: https://stackoverflow.com/questions/34890383/python3-tkinter-ubuntu-trusty-does-not-work-under-virtual-environment

My bigger concern is that it feels hacky and might be slow.

@benbovy
Copy link
Member Author

benbovy commented Jan 13, 2018

My bigger concern is that it feels hacky and might be slow.

Agreed! Moreover, I have a "Python" icon appearing in the MacOS Dock, which I think it's caused by initializing tk. That's bad!

I played a bit and it seems feasible to estimate an approximate relationship between the text width and the number of characters (see https://gist.github.com/benbovy/fce796c663728b1bdbb3f1514daa458c -- it's a very naive approach, though).

I still haven't gotten a chance to use CSS grid, been hoping for a good moment.

I don't know much about it, but it seems very powerful. That would be the cleanest solution. I'll take a look.

@benbovy
Copy link
Member Author

benbovy commented Jan 14, 2018

I re-implemented the Dataset repr using CSS grid (https://jsfiddle.net/Lmqq7yzz/9/), which I think is much cleaner for column widths that fit the content.

However, one big limitation is that it's currently compatible only in Firefox! Because we want the columns in different sections aligned, I had to define a single grid at the top level and then use display: contents so that all the nested children elements can be positioned using this same grid. Hopefully this will be soon supported in Chrome and Safari (https://caniuse.com/#feat=css-display-contents).

Two other, smaller issues:

  • Column-width may change on section expand/collapse as apparently it is re-calculated with the visible elements only. This is a bit annoying.
  • I couldn't get working highlighted rows on hover in this implementation.

Note : in the link above, I changed a bit the design. Variable attributes and data repr can now be show/hidden using clickable icons on the right (tooltips are still needed). This is better from a UX point of view, IMO.

EDIT: tooltips would be also very useful to show full variable names and/or lists of dimensions when these are truncated.

@rgbkrk
Copy link

rgbkrk commented Jan 15, 2018

Wow, that does work really well on Firefox.

@shoyer
Copy link
Member

shoyer commented Jan 16, 2018

It looks like CSS grid is coming to Chrome very soon -- the relevant bug is now listed as fixed.

@shoyer
Copy link
Member

shoyer commented Jan 25, 2018

It looks like this will make it into Chrome stable by roughly mid-March 2018: https://www.chromium.org/developers/calendar

If we're on Chrome and Firefox, that's probably good enough. We still might want to have an option that makes this easy to turn on/off (default value TBD).

@benbovy
Copy link
Member Author

benbovy commented Jan 25, 2018

I'll try if we can have good results using fixed columns widths (thus not using display: content)...

@shoyer
Copy link
Member

shoyer commented Jul 17, 2018

I played around a little with using text-overflow for truncation. That seems like an elegant way to handle cases where a simple heuristic fails:
https://jsfiddle.net/nkezu9wq/

I'm sure we could figure out some better CSS magic that shows the full variable name when you hover over it.

@rabernat
Copy link
Contributor

Let's revive this excellent idea!

In particular, I would be interested in using the HTML repr on its own in conjunction with #2659 (dict / json serialization of dataset schema). If we could develop a standalone html repr function that interprets the output of dataset.to_dict(data=False) (or maybe dataset.to_dict(data='preview')), this would be very useful for pangeo-data/pangeo-datastore#1.

@benbovy
Copy link
Member Author

benbovy commented Feb 28, 2019

We could also borrow ideas from https://github.com/agoose77/numpy-html or xtensor-stack/xframe@90638ec for displaying the data of each variable here.

@shoyer
Copy link
Member

shoyer commented Feb 28, 2019

is there an example of what the xframe output HTML looks like?

@benbovy
Copy link
Member Author

benbovy commented Feb 28, 2019

You can see it by running the xframe example notebook with binder. It actually looks very much like pandas dataframe (with "multi-index" rows for ndims > 2), with some hover effects showing the coordinates names/values at data elements.

The output of xtensor objects is slightly different but interesting too, with nested tables (xtensor's binder). I haven't checked if numpy-html provides the same output.

@rabernat
Copy link
Contributor

Hi @benbovy - how can we convince you to work more on this amazing idea? What help / support do you need from other xarray devs?

@StanczakDominik
Copy link
Contributor

I just came by to say that the attached sample notebook is very, very pretty, and I would love to see this line of work continue!

@shoyer
Copy link
Member

shoyer commented Mar 28, 2019

I did a little more tweaking of text-overflow for truncation. This version shows the full name when you hover over it: https://jsfiddle.net/1g04ykum/

@benbovy
Copy link
Member Author

benbovy commented Mar 29, 2019

I did a little more tweaking of text-overflow for truncation. This version shows the full name when you hover over it: https://jsfiddle.net/1g04ykum/

Nice!

Hi @benbovy - how can we convince you to work more on this amazing idea? What help / support do you need from other xarray devs?

I'd really like to see this finally happen soon, especially that've I already spend a good amount of time on it (a while ago, I admit). But honestly (and sadly), it's been hard for me to find free time to continue the work on this feature. I'm sorry for that.

I'm also a bit worried by the things (mostly related to compatibility with notebook front-ends and themes) that we'll need to support/fix quickly when this will be ready. Maybe we should make it opt-in for one or two releases.

I would be extremely pleased if anyone is willing to jump in and help on the front-end part (HTML/CSS)! See the checklist at the top of this PR. Unfortunately, my limited expertise in this area makes me rather unproductive.

@fmaussion
Copy link
Member

I would be extremely pleased if anyone is willing to jump in and help on the front-end part (HTML/CSS)! See the checklist at the top of this PR. Unfortunately, my limited expertise in this area makes me rather unproductive.

yeah, I guess this is the major issue here. Who could we get in to help out? Does @pydata/xarray know anyone from the extended community with an interest in these things?

@dcherian
Copy link
Contributor

Should we email the "announce" list and ask for help.

@fmaussion
Copy link
Member

@benbovy is the checklist still up-to-date? The length of it is a bit scary TBH ;-)

@rabernat
Copy link
Contributor

Perhaps we could leverage our recently formed links between Pangeo and the Jupyter folks to help confront these front-end issues, in which we have limited expertise as a project.

@ian-r-rose, a developer of jupyter server extensions, has been a very helpful resource. Maybe he could give us some advice?

@benbovy
Copy link
Member Author

benbovy commented Mar 31, 2019

is the checklist still up-to-date? The length of it is a bit scary TBH ;-)

Yes it is still up-to-date :-)

But this list is exhaustive and a lot of things could be saved for later! Some of the items are easy to implement but require a decision.

@TomNicholas
Copy link
Member

I was just following the new draft dask repr, and it seems the tools are in place to be able to autogenerate a html repr of a full xarray dataset which includes an image, e.g. autogenerate something like:

image

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later? The main thing that has to be done before merging is tests? If that bare-bones version gets merged (even as a hidden feature) then others can start having a go at adding images like dask?

@benbovy
Copy link
Member Author

benbovy commented Jul 10, 2019

Ooh that's nice! Iris and zarr html representations look nice too (i hadn't followed those developments), definitely some good ideas for the xarray html repr! I think the dask and zarr html outputs would integrate very well with the repr here and it would be quite straightforward to encapsulate it in the drop-down html containers of each coordinate / data variable here.

I also like the idea of the summary image like shown above, although this could be harder to achieve.

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later?

Yes, actually most of the work is done. I was mainly worried by how the html repr would look in the different notebook front-ends, but now that other projects (dask, iris, zarr) have such repr, it looks like there's is no major issue. I also struggled with grid column resizing for correctly displaying the variable names, but I think that @shoyer's suggestion https://jsfiddle.net/1g04ykum/ is good enough for now.

@Carreau
Copy link
Contributor

Carreau commented Jul 14, 2019

I'm just starting to look at this, was there any experiment with the html "detail" and "summary" pairs ?

They are made to do collapsible sections, and will likely allow to get rid of (some of) the UUID logic.

Here is a full example of a summary section.

%%html
<style>
details[open] >summary > .info{
    display:none;
}
</style>
<details>
    <summary>Coordinate <span class='info'>(hidden when expanded)</span>:</summary>
    Your actual nested content.
</details>

@Carreau
Copy link
Contributor

Carreau commented Jul 14, 2019

Sidenote: the css is not injected at load time when the notebook is not trusted, so the REPRs may looked garbled.

@shoyer
Copy link
Member

shoyer commented Jul 14, 2019

Details/Summary does look like a nice way to simplify things!

It's too bad that CSS isn't processed with untrusted inputs. How do Iris and Dask deal with this limitation?

@benbovy
Copy link
Member Author

benbovy commented Jul 14, 2019

was there any experiment with the html "detail" and "summary" pairs ?

I agree it would highly simplify the HTML code, but when I tried it things were not that easy (I don't remember exactly what, I think it had to do with alignment of nested lists) and I had some weird issues with conflicts between HTML reprs in different output cells. See: jupyterlab/jupyterlab#3200 (comment) and the comment below. Probably I'm missing something obvious?

the css is not injected at load time when the notebook is not trusted

How do Iris and Dask deal with this limitation?

I've quickly checked the related PRs dask/dask#4794 and SciTools/iris#2918. Dask adds style attributes to HTML elements while Iris seems to encapsulate a <style> element in every repr.

@mrocklin
Copy link
Contributor

It's too bad that CSS isn't processed with untrusted inputs. How do Iris and Dask deal with this limitation?

Yeah, we just use raw HTML

@mrocklin
Copy link
Contributor

I'll say that I'm looking forward to this getting in, mostly so that I can raise an issue about adding Dask's chunked array images :)

@SimonHeybrock
Copy link

I was just following the new draft dask repr, and it seems the tools are in place to be able to autogenerate a html repr of a full xarray dataset which includes an image, e.g. autogenerate something like:

image

It seems to me @benbovy that 90% of your ToDo list is nice-to-have or special-case stuff which can be left for later? The main thing that has to be done before merging is tests? If that bare-bones version gets merged (even as a hidden feature) then others can start having a go at adding images like dask?

We have done something similar using inline svg (see, e.g., https://scipp.readthedocs.io/en/latest/user-guide/data-structures.html#Dataset). It is basically a hack for testing right now, but is sufficient for auto-generated illustration in the documentation.

I am pretty impressed by the html representation previewed in #1627. Since our data structures are very similar I would be happy to contribute to this output rendering somehow, since we could then also benefit from it (with a few tweaks, probably). So let me know if I can help out somehow (unfortunately I do not know much html and css, just C++ and a bit of Python).

@shoyer
Copy link
Member

shoyer commented Aug 30, 2019

@SimonHeybrock very cool to see your Scipp project! I will make some comments over in your repo but I'm impressed with what you've done. I'd love to find ways to collaborate more in the future, many of the problems you're solving are also important for xarray users.

@jsignell
Copy link
Contributor

Is there anything that I can do to help get this PR in? Are the items on the TODO list prioritized? One minor comment is that in terms of style for overflow, it might be more legible if the var_names were bolded on hover (fiddle), although that might make them look clickable.

@jhamman
Copy link
Member

jhamman commented Oct 16, 2019

Hi @jsignell - it would be great if someone could pick this up. From my perspective, I'd like to get a minimum viable implementation out in the wild. With this in mind, I feel like some of the checklist should be moved to follow up issues. @benbovy thoughts?

@benbovy
Copy link
Member Author

benbovy commented Oct 17, 2019

@jsignell feel free to pick this up, that would be great if you could make this finally happen! (Again, I'm sorry for letting this sit so long).

I'm going to edit the checklist in my 1st comment. There is indeed a lot of things that we can move to follow up issues.

@jsignell
Copy link
Contributor

Ok thanks! I'll get cracking :)

@jsignell
Copy link
Contributor

The last fiddle and this PR seem fairly different. Does the fiddle have the most up-to-date hierarchy or is it just somewhere where people were playing around with ideas (in which case I should see what is improved and try to pull those bits of css)?

@jsignell jsignell mentioned this pull request Oct 21, 2019
7 tasks
@benbovy benbovy deleted the html_repr branch October 25, 2019 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

html repr of xarray object (for the notebook)