Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Summer of Code 2021 Project #107

Merged
merged 8 commits into from
Aug 23, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions _posts/2021-08-23-gsoc-2021-project.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
layout: post
title: GSoC 2021 Project
freyam marked this conversation as resolved.
Show resolved Hide resolved
author: Freyam Mehta
theme: twitter
---

{% include JB/setup %}

GenevieveBuckley marked this conversation as resolved.
Show resolved Hide resolved
This blog covers the work done by me during the Google Summer of Code 2021. Dask took part in the program under the NumFOCUS umbrella organization.

Google Summer of Code is a global program focused on bringing more student developers into open source software development. Students work with an open source organization on a 10 week programming project during their break from school.
GenevieveBuckley marked this conversation as resolved.
Show resolved Hide resolved

## Contents

- [Visualizing the Performance Characteristics of Computations](#visualizing-the-performance-characteristics)
- [Graphical Representation of Task Graphs](#graphical-representation)
- [HTML Representation of Task Graphs](#html-representation)

## Visualizing the Performance Characteristics of Computations

GenevieveBuckley marked this conversation as resolved.
Show resolved Hide resolved
## Graphical Representation of Task Graphs

I worked on enhancing the graphviz output of the task graphs. The Graphviz library boasts a great set of attributes which can be modifified to create a more visually appealing output. I worked on making them more illustrative, engaging, and informative.

### Fixing calling `.visualize()` with `filename=None` [#7740](https://github.com/dask/dask/pull/7740)
GenevieveBuckley marked this conversation as resolved.
Show resolved Hide resolved

I fixed a minor bug that was caused when users tried to call `dask.visualize()` with `filename=None`. I fixed it by adding an extra condition before it reaches the error line to make sure the format is not `None` by making it equal to the default `png`.

```python
import dask
import dask.array as da

array = da.arange(10)
dask.visualize(array, filename=None)
```

<img src="/images/gsoc21/7740.png" alt="#7740 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

### Add node size scaling to the Graphviz output for the high level graphs [#7869](https://github.com/dask/dask/pull/7869)

I tweaked the node sizes in the High-level graph's task graphs to align with the density of each layer. Layers with more tasks would appear larger than the rest.

```python
import dask.array as da

array = da.random.random((10000, 10000), chunks=(200, 200))
result = array + array.T - array.mean(axis=0)

result.dask.visualize()
```

<img src="/images/gsoc21/7869.png" alt="#7869 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

### Change graphviz font family to sans [#7931](https://github.com/dask/dask/pull/7931)
freyam marked this conversation as resolved.
Show resolved Hide resolved

I modified the font of the Graphviz graphs to `Helvetica` (a sans-serif fontface). This is cleaner and more readable than the default `Times-Roman` (a serif fontface).

```python
import dask.array as da

array = da.ones((10, 10), chunks=(5, 5))
array = array + 100
array = array * 100

array.dask.visualize()
```

<img src="/images/gsoc21/7931.png" alt="#7931 Demo" height=414 width=736>

### Add tooltips to graphviz [#7973](https://github.com/dask/dask/pull/7973)

I added tooltips to the High-level graph's task graphs. The tooltips show the `layer_type`, `number of tasks`, and the information stored in the `collection_annotations` dictionary for each layer.

```python
import dask.array as da

x = da.ones((10, 10), chunks=(5, 5))
x = x + 100
x = x * 100

x.dask.visualize()
```

<img src="/images/gsoc21/7973.png" alt="#7973 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

### Add colors to represent high level layer types [#7974](https://github.com/dask/dask/pull/7974)

<img src="/images/gsoc21/7974.png" alt="#7974 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

## HTML Representation of Task Graphs

GenevieveBuckley marked this conversation as resolved.
Show resolved Hide resolved
I also worked on creating/enhancing HTML reprs for existing classes.

### Add `dask.array` SVG to the HTML Repr [#7886](https://github.com/dask/dask/pull/7886)

I added the Dask Array SVG of the chunks to the HTML Repr of the High-level graph by calling the `dask.array.svg.svg()` function.

```python
import dask.array as da

array = da.ones((10, 20), chunks=(5, 10))
array = array.T

array.dask
```

<img src="/images/gsoc21/7886.png" alt="#7886 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

### Add HTML Repr for `Security` Class [#5178](https://github.com/dask/dask/pull/5178)

<img src="/images/gsoc21/5178.png" alt="#5178 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved

### Add HTML Repr for `ProcessInterface` Class [#5181](https://github.com/dask/dask/pull/5181)

<img src="/images/gsoc21/5181.png" alt="#5181 Demo" height=414 width=736>
freyam marked this conversation as resolved.
Show resolved Hide resolved
Binary file added images/gsoc21/7740.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/gsoc21/7869.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/gsoc21/7886.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/gsoc21/7931.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/gsoc21/7973.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.