Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GS:Visualization and Plots #3050

Merged
merged 69 commits into from
Apr 27, 2022
Merged

Add GS:Visualization and Plots #3050

merged 69 commits into from
Apr 27, 2022

Conversation

iesahin
Copy link
Contributor

@iesahin iesahin commented Nov 30, 2021

This is mostly for discussion after the Notion experimentation went "we wanted the best, but it turned out like always" :)

I'll develop this as we discuss the scope of changes.

Closes #2925
Closes #3309

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw November 30, 2021 15:40 Inactive
@iesahin
Copy link
Contributor Author

iesahin commented Dec 1, 2021

The following are @shcheklein 's comments in Notion. Some of these are more related to #3051 than this,

It's not to my mind. The fact that we use DVCLive to produce metric files (and we should be doing this for live and for regular scalar metrics, confusion matrices, etc) doesn't mean we even have to mention it here.

The focus of this document (if it's a replacement of the exisiting get started) is to show the general mechanism of codifying metrics as files. Show regular commands - dvc plots show and only after that live metrics as an advanced example.

Or even if we start with live metrics (to emphasize the experiments tracking, we should still mention regular metrics).

I would also keep the existing metrics section in the regular NLP get started,. at least for now. I don't see any reasons to touch it.

If the project will use DVCLive, then there is no natural point that we'll be using ˋdvc metrics.ˋ

DVCLive should be used as an utility to dump regular metrics / plots files.

I think dvc metrics command is more related to pipelines

I don't any reason for this. The current get started metrics are about comparing experiments.

Also, it's not only about dvc metrics , it's about dvc plots . If we are removing the existing section we should have a hollistic approach to present those here.

I don't see any reasons to not include different types of metrics and plots in the project and briefly explain those here.

@iesahin
Copy link
Contributor Author

iesahin commented Dec 1, 2021

I don't intend to cover metrics here. In my plan this page will be related to plots with experiments.

@shcheklein
Copy link
Member

I don't intend to cover metrics here. In my plan this page will be related to plots with experiments.

sounds good, esp considering that metrics are covered by the first part in this case

@shcheklein
Copy link
Member

I assume that we keep the metrics, plots, etc section in the data trail as well, right?

@shcheklein
Copy link
Member

@iesahin thanks Emre, that's exactly right approach to the iteration. Makes it way easier to review and get on the same page before we jump into implementation.

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw December 14, 2021 12:14 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw December 14, 2021 12:26 Inactive
@iesahin
Copy link
Contributor Author

iesahin commented Dec 14, 2021

added some content to this for you to see the flow concretely @shcheklein @dberenbaum

@iesahin
Copy link
Contributor Author

iesahin commented Dec 14, 2021

BTW, could we remove style checks in draft PRs? @shcheklein My setup in WSL seems not to run the commit hook, and I forget to run !yarn format % before commits.

@shcheklein
Copy link
Member

BTW, could we remove style checks in draft PRs?

I think so, there is a ready_for_review event for this. I would confirm with team this though - some people might benefit from using CI even on draft.

@iesahin
Copy link
Contributor Author

iesahin commented Dec 15, 2021

I think so, there is a ready_for_review event for this. I would confirm with team this though - some people might benefit from using CI even on draft.

On second thought, not a big deal. Thanks.

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw December 15, 2021 15:47 Inactive
@dberenbaum
Copy link
Contributor

Regarding how to introduce dvclive, I'm not sure whether it's better as a 3rd type of plot, or as a different way to generate the other types of plots. I think it's fine in this form and we can iterate later since there's no clear right way.

In case it seems more compelling, the alternative is to explain plots as data series or images, and mention dvclive briefly as the way that those plots files were generated here. This makes dvclive more of a helper utility and less of an entirely separate concept. For example:

  1. Show the acc.tsv or similar from the dvclive output. Mention that it was generated by the dvclive callback, but this data can be generated manually as well.
  2. Show misclassified examples or other images. Again mention that they were logged by dvclive but can be generated manually.

@@ -0,0 +1,100 @@
# Visualization with DVC Plots

In parallel with versioning parameters and metrics by associating them with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early to edit language, but this paragraph is too complicated for the intro.

  • "multivariate outputs"
  • DVC allows to generate plots - I think DVC also captures them, right? (keeps track)
  • In parallel with versioning parameters and metrics - this means that it should go right after experiments show/run?

@shcheklein
Copy link
Member

Looks good overall for me, Emre. One question is the right place for this + properly integrate into the project that we already use (if it's not yet).

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw December 17, 2021 17:22 Inactive
@iesahin
Copy link
Contributor Author

iesahin commented Dec 28, 2021

Do you think is it better to cover parallel coordinates plot here? @shcheklein @dberenbaum

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw December 28, 2021 16:49 Inactive
@iesahin
Copy link
Contributor Author

iesahin commented Dec 29, 2021

@dberenbaum
Copy link
Contributor

Do you think is it better to cover parallel coordinates plot here? @shcheklein @dberenbaum

IMO no. We can't squeeze every visualization into GS, and this one is particular to comparing experiments, so it seems like an odd fit.

@iesahin
Copy link
Contributor Author

iesahin commented Jan 4, 2022

The misclassification visualization is something like.
confusion

I think this one is better but requires a few more iterations.

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-lamemw January 5, 2022 13:24 Inactive
@iesahin
Copy link
Contributor Author

iesahin commented Jan 10, 2022

I've updated the project to use MNIST instead of Fashion-MNIST and the misclassification examples look like:

SmartSelect_20220110-171922_Firefox

I think this one is easier to interpret.

@iesahin
Copy link
Contributor Author

iesahin commented Jan 12, 2022

I'll wait for #3051 to be reviewed/merged in separate PR before going on to this. If you think we can proceed in parallel, please let me know. @shcheklein @dberenbaum @jorgeorpinel

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-gs-visu-60h6xb April 13, 2022 15:42 Inactive
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions. Only the last question about the dvclive image worries me a little, everything else is minor. Thanks again @iesahin -- we're almost there 👍🏼

content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
model. You can see the report in `training_metrics/index.html` with your
browser:

![dvclive](/img/start_visualization_dvclive.png)
Copy link
Contributor

@jorgeorpinel jorgeorpinel Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
![dvclive](/img/start_visualization_dvclive.png)
![DVCLive plots](/img/start_visualization_dvclive.png)

The image is not readable though. Can the fonts be larger?

And maybe the top part with dvclive.json sample should be a regular code block (under the image) with a few rows of the data instead of just one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same in the documentation: https://dvc.org/doc/dvclive/dvclive-with-dvc#html-report

Zooming in to make the axis labels readable makes the whole chart so large. How could we solve this @daavoo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still change the label as suggested for now.

same in the documentation: https://dvc.org/doc/dvclive/dvclive-with-dvc#html-report

Yes we have that problem in a few places but at least those are GIFs and you can see changing plots so maybe it doesn't matter that much that the text is readable in some other cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to create separate issue or deal with this in #3455

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be tracked in #3470

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but still the image caption could be better. It's helpful to have descriptive image captions for SEO and even image searches.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor things I'll commit myself)

content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
content/docs/start/visualization.md Outdated Show resolved Hide resolved
@jorgeorpinel
Copy link
Contributor

Ping 🙂 @iesahin ☝🏼 please see #3050 (review)

@iesahin iesahin mentioned this pull request Apr 20, 2022
2 tasks
iesahin added a commit that referenced this pull request Apr 25, 2022
@iesahin
Copy link
Contributor Author

iesahin commented Apr 25, 2022

Closing in favor of #3455

@iesahin iesahin closed this Apr 25, 2022
@iterative iterative deleted a comment from iesahin Apr 26, 2022
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Apr 26, 2022

Closing in favor of #3455

Not a great strategy. Makes reviewing that other PR much more complicated. Removes my contributions from commit history... Etc.

Let's reopen and finish this PR, and then reorganize once all the material is in?

@jorgeorpinel jorgeorpinel reopened this Apr 26, 2022
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatsby-cloud
Copy link

gatsby-cloud bot commented Apr 26, 2022

Gatsby Cloud Build Report

dvc.org

🎉 Your build was successful! See the Deploy preview here.

Build Details

View the build logs here.

🕐 Build time: 3m

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Apr 26, 2022

OK @iesahin up to you if you want to resolve the above comments and

a) close this one but copy/paste the final result into https://github.com/iterative/dvc.org/pull/3455/files#diff-e852faa513d0975fe637b21cd31c7530b2ae1d16cfd6e9e6132128c3397c9e03

or

b) merge this, and then merge master into #3455 + move the file to it's new location.

Whatever is easier for you. Thanks again

@iesahin iesahin merged commit db3ef44 into master Apr 27, 2022
@iesahin iesahin deleted the iesahin/gs-visualization branch April 27, 2022 14:17
iesahin added a commit that referenced this pull request Apr 28, 2022
* reorg #3 for trails

* shortened data/ urls

* renamed access

* renamed pipelines

* redirects from older URLs to new doc/start/data/ URLs

* added trail section titles and minor revision

* deleting forgotten metrics file

* update visualization from #3050

* Update content/docs/sidebar.json

Co-authored-by: Jorge Orpinel <[email protected]>

* Update redirects-list.json

Co-authored-by: Jorge Orpinel <[email protected]>

* renamed to experiments

* moved visualization to experiments/

* Update content/docs/start/index.md

Co-authored-by: Jorge Orpinel <[email protected]>

* reverting the sidebar titles

* Restyled by prettier (#3471)

Co-authored-by: Restyled.io <[email protected]>

* renamed the directory

Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants