-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of HTML repr for HighLevelGraph layers #7763
Implementation of HTML repr for HighLevelGraph layers #7763
Conversation
I've tried to keep the look similar to @jacobtomlinson's work here dask/distributed#4857 just for consistency (although it seems I've forgotten to add a little blue square next to each layer, oops). TODO:
For discussionI would like to shift all the layer specific stuff into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, really exciting to see this.
I'm pleased to see the design resources being used, but I think we should try and make the HLG repr visually different from the cluster/client/scheduler/worker repr. If a user is scrolling through a long notebook looking for one or the other they should be able to identify them at a glance.
We also have some information about the width of each layer right? Perhaps we could show something visually to represent that. I'm thinking along the lines of the minimaps you see in many text editors these days.
Huh, that's an interesting point to consider. Maybe let's turn those squares into circles & maybe use a different color?
I don't know what you mean by width, could you clarify this? The layers are all pretty heterogeneous in terms of what attributes they have (but we can write special case logic or start to enforce more homogeneous descriptive attributes) |
This is the design reference I've been working from. I'm keen to try and stay within the colours and shapes shown there in the various diagrams. I would be tempted to say let's specifically keep this diagram in mind when putting together this repr. So as you say maybe something like little grey circles would work well? Maybe a blue circle for the HLG and grey for the layers? |
By width I was specifically referring to the
But I see what you mean, there isn't much consistency between the layers so this could be tricky. |
Also it looks like Line 160 in 90fccab
|
That little graph icon can be included with <svg width="76" height="71" viewBox="0 0 76 71" fill="none" xmlns="http://www.w3.org/2000/svg">
<circle cx="61.5" cy="36.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<circle cx="14.5" cy="14.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<circle cx="14.5" cy="56.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<path d="M28 16L30.5 16C33.2614 16 35.5 18.2386 35.5 21L35.5 32.0001C35.5 34.7615 37.7386 37.0001 40.5 37.0001L43 37.0001" stroke="black" stroke-width="1.5"/>
<path d="M40.5 37L40.5 37.75L40.5 37.75L40.5 37ZM35.5 42L36.25 42L35.5 42ZM35.5 52L34.75 52L35.5 52ZM30.5 57L30.5 57.75L30.5 57ZM41.5001 36.25L40.5 36.25L40.5 37.75L41.5001 37.75L41.5001 36.25ZM34.75 42L34.75 52L36.25 52L36.25 42L34.75 42ZM30.5 56.25L28.0001 56.25L28.0001 57.75L30.5 57.75L30.5 56.25ZM34.75 52C34.75 54.3472 32.8472 56.25 30.5 56.25L30.5 57.75C33.6756 57.75 36.25 55.1756 36.25 52L34.75 52ZM40.5 36.25C37.3244 36.25 34.75 38.8243 34.75 42L36.25 42C36.25 39.6528 38.1528 37.75 40.5 37.75L40.5 36.25Z" fill="black"/>
<circle cx="28" cy="16" r="2.25" fill="#E5E5E5" stroke="black" stroke-width="1.5"/>
<circle cx="28" cy="57" r="2.25" fill="#E5E5E5" stroke="black" stroke-width="1.5"/>
<path d="M45.25 36.567C45.5833 36.7594 45.5833 37.2406 45.25 37.433L42.25 39.1651C41.9167 39.3575 41.5 39.117 41.5 38.7321V35.2679C41.5 34.883 41.9167 34.6425 42.25 34.8349L45.25 36.567Z" fill="#1D1D1D"/>
</svg> |
This is a very fun conversation to lurk on :)
…On Fri, Jun 4, 2021 at 6:22 AM Jacob Tomlinson ***@***.***> wrote:
That little graph icon can be included with
<svg width="76" height="71" viewBox="0 0 76 71" fill="none" xmlns="http://www.w3.org/2000/svg">
<circle cx="61.5" cy="36.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<circle cx="14.5" cy="14.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<circle cx="14.5" cy="56.5" r="13.5" fill="#F2F2F2" stroke="#1D1D1D" stroke-width="2"/>
<path d="M28 16L30.5 16C33.2614 16 35.5 18.2386 35.5 21L35.5 32.0001C35.5 34.7615 37.7386 37.0001 40.5 37.0001L43 37.0001" stroke="black" stroke-width="1.5"/>
<path d="M40.5 37L40.5 37.75L40.5 37.75L40.5 37ZM35.5 42L36.25 42L35.5 42ZM35.5 52L34.75 52L35.5 52ZM30.5 57L30.5 57.75L30.5 57ZM41.5001 36.25L40.5 36.25L40.5 37.75L41.5001 37.75L41.5001 36.25ZM34.75 42L34.75 52L36.25 52L36.25 42L34.75 42ZM30.5 56.25L28.0001 56.25L28.0001 57.75L30.5 57.75L30.5 56.25ZM34.75 52C34.75 54.3472 32.8472 56.25 30.5 56.25L30.5 57.75C33.6756 57.75 36.25 55.1756 36.25 52L34.75 52ZM40.5 36.25C37.3244 36.25 34.75 38.8243 34.75 42L36.25 42C36.25 39.6528 38.1528 37.75 40.5 37.75L40.5 36.25Z" fill="black"/>
<circle cx="28" cy="16" r="2.25" fill="#E5E5E5" stroke="black" stroke-width="1.5"/>
<circle cx="28" cy="57" r="2.25" fill="#E5E5E5" stroke="black" stroke-width="1.5"/>
<path d="M45.25 36.567C45.5833 36.7594 45.5833 37.2406 45.25 37.433L42.25 39.1651C41.9167 39.3575 41.5 39.117 41.5 38.7321V35.2679C41.5 34.883 41.9167 34.6425 42.25 34.8349L45.25 36.567Z" fill="#1D1D1D"/>
</svg>
[image: Group 2]
<https://user-images.githubusercontent.com/1610850/120793973-790d9480-c52f-11eb-88a5-06611efdce48.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7763 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTHHERQDKJJQHYDIM3TTRCZO3ANCNFSM46CO647Q>
.
|
|
Update: it turns out that Here's a short example using the timeseries dataset: from dask.datasets import timeseries
ddf = timeseries().shuffle("id", shuffle="tasks").head(compute=False)
layer_key = 'simple-shuffle-029255f702ea9cf087d6955375c8e9ad' This is what we get when we look at the high level graph dependencies >>> ddf.dask.dependencies[layer_key]
{'make-timeseries-2d58a4d4763bdeb3eca63e8b5ffd6c3b'} But when we use the layer
>>> layer = ddf.dask.layers[layer_key]
>>> tuple_key = ('simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0)
>>> layer.get_dependencies(tuple_key, layer.keys())
{('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 0),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 1),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 2),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 3),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 4),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 5),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 6),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 7),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 8),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 9),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 10),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 11),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 12),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 13),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 14),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 15),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 16),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 17),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 18),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 19),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 20),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 21),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 22),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 23),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 24),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 25),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 26),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 27),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 28),
('split-simple-shuffle-029255f702ea9cf087d6955375c8e9ad', 0, 29)} |
The default is to have the "Layers" bit expanded, so all the layer headings are visible but not the detailed layer information. TODO:
|
dask/highlevelgraph.py
Outdated
0px;">{highlevelgraph_info}</p> | ||
|
||
<div style=""> | ||
<details open style="margin-left: 0px;"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know you could default them to open, that's awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I didn't know that either until I really wanted to do it! I'm very happy that turned out to be easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this "details open" trick is causing problems with the test checking HTML validity. So I think we'll just get rid of it entirely for now.
What I'm thinking about now is how to get the layer names from within the layers themselves. From the high level graph, I just used the dictionary key for that particular layer. I'm not totally sure what the canonical way to get that information from inside the layer is yet. There's no requirement for layers to have a |
For tests I recommend that we ...
|
Whoops, errant send, trying again For tests I recommend that we ...
Clearly we can't test the visual aesthetics, but having a couple of basic tests here is likely to keep this code healthy long term |
I think that it would be good to have a nicer string here I wonder if there is a nicer way to show this information. For example, saying These things could be in a table, they could also be in the summary. We can stuff more information there for at-a-glance information. |
dask/highlevelgraph.py
Outdated
"is_materialized": layer.is_materialized(), | ||
"dependencies": dependencies, | ||
} | ||
return info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but my preference would be to keep logic like this inlined in order to avoid indirection (blog post here: https://matthewrocklin.com/blog/work/2019/06/23/avoid-indirection )
However, really I'd prefer that we move this down to Layer subclasses (even more indirection, I acknowledge).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it should go with the Layer.
I don't agree that it should be inlined with the _repr_html_
function logic, because imo that will make it harder for people making representations for layer subclasses. (Perhaps I misunderstand your point here though)
For someone writing a _repr_html
function for a Layer subclass, it seems a lot easier to grab the default dictionary of interesting layer information with this function,, then add more dictionary keys/values specific to that particular Layer subclass, and then turn that into a nice HTML string. If you only have the HTML string, you might end up either duplicating a lot of the logic here for each layer subclass, OR faffing around removing HTML div tags, and trying to insert the new extra information into the right place in the HTML string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that dupliation would ensure (we would have to address this with methods, functions, etc..
However it could open up larger variations than just different keys/values in the table. For example we might include images like the chunk structure, or different coloring based on layer type.
Maybe this is a silly question, but how do I know what the number of nodes in a layer is? |
I love this! I feel like it's already helping me gain insight. I pushed it a little using different chunking with Matt's example: import dask.array as da
x = da.random.random((10000, 10000), chunks=(100, 100))
y = x + x.T - x.mean(axis=0) Some ideas:
|
Co-authored-by: Julia Signell <[email protected]>
…ey/dask into HighLevelGraphs-repr-html
Here's what it looks like now Dataframe exampleArray exampleNote: there's something odd on my machine making the little circle icons not line up with the headings. I also don't seem to get the little arrows indicating you can expand the collapsible details. However, Jacob, Matt, and Julia all have really pretty screenshots in this thread, so I think it's probably fine for everyone else. |
Don't be sorry, it's fun to tinker.
Sure, sounds good. Here's a summary of the changes I've made in the last round of edits:
Here's what didn't change:
|
Yeah, I just tried this myself. I expected it to work because of this: I suspect that the IPython display function must have special cased this, causing the confusion. |
I'm happy to withdraw my request. |
Tests seem to be sad. Happy to merge otherwise though |
Just merged |
FWIW we have a In [1]: import dask.array as da
In [2]: from dask.utils import typename
In [3]: typename(da.Array)
Out[3]: 'dask.array.core.Array' |
I see what @jsignell means about the titles. Perhaps for the layers we should drop the size down to |
Let's merge this in. We can iterate. |
I didn't check what it looks like when the jupyter notebook is in dark mode, that's probably another useful thing to try. |
I've opened #7809 to add dark mode support. |
Builds on #7763 and adds dark mode support.
Here's an implementation of a repr_html() for the HighLevelGraph layer. It should give us a nice starting point to discuss what information we do or don't want displayed, and how we want it to look.
black dask
/flake8 dask
/isort dask