-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Summer of Code 2021 Project #107
Conversation
From our slack discussion... Here's the structure I think we should have for the dask-blog post Section 1: Visualizing high level graphs
Section 2: HTML representation
All of these (except the bugfix) will need nice before and after screenshots. Putting those together would be a fantastic start (feel free to make a new folder inside the dask-blog/images directory so they're all grouped in one place) |
Co-authored-by: Genevieve Buckley <[email protected]>
We talked earlier about the difference in audience/purpose between your Medium blogpost and this one.
This draft is very like (1) instead of (2), with a lot of first person sentences ("I worked...", "I changed...", "I tweaked..."). We'll probably want to adjust it to suit the second audience better. |
I agree. I will edit accordingly. |
General suggestions:
BTW, I'm happy to write or re-write text content, and will probably do some of this before we publish the final piece. |
Hiii Genevieve, I read the draft you just pushed to the branch. It's amazing 💯! I also had made some adjustments and tweaks of my own yesterday. It's not much, but looking at yours, it feels very non-professional. I think we should go along with yours. I already have the images and some extra text ready. Will commit them in sometime when I reach to my laptop 😀 |
@freyam - there are still some important to-do items listed here, mostly involving adding the rest of the demonstration examples. @jacobtomlinson - you might like to take a brief look over some of this (most relevant to your interests is the second section on HTML representations). No worries if you're busy though. |
Updated ✔️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me
- Dataframe shuffles are particularly expensive operations. You can [read more about this here](https://docs.dask.org/en/latest/dataframe-best-practices.html#avoid-full-data-shuffling). | ||
- Reading and writing data to/from storage/network services is often high-latency and therefore a bottleneck. | ||
- Blockwise layers are generally efficient for computation. | ||
- All layers are materialized during computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should write more about materialized layers here. I can't think of a good way to say:
- ideally we won't see many materialized layers before
compute()
is called - but we might see some and that's ok
- but you might also accidentally materialize layers without meaning to, perhaps by counting the number of tasks or looking at the HTML repr (which in turn counts the number of tasks)
- and fixing that is a job for dask developers, not dask users
I think on balance this might be more confusing than helpful. If anyone has ideas or thoughts around this I'd be interested to hear them.
Thank you @martindurant and @jacobtomlinson If either of you have thoughts about this point https://github.com/dask/dask-blog/pull/107/files#r692726228 before then, let me know. |
💛 |
I will be writing about my work in the summer working along with @GenevieveBuckley and @martindurant on the different representations of Dask computation.
This has been part of the annual Google Summer of Code program where students get the opportunity to work with mentors on large-scale projects.
The blogpost would contain a list of all the merged work and what the new features mean to the users 🚀