Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: integrate (updated) params/metrics/plots to Experiments Trail #2925

Closed
3 tasks
iesahin opened this issue Oct 14, 2021 · 23 comments · Fixed by #3050
Closed
3 tasks

start: integrate (updated) params/metrics/plots to Experiments Trail #2925

iesahin opened this issue Oct 14, 2021 · 23 comments · Fixed by #3050
Assignees
Labels
C: start Content of /doc/start p1-important Active priorities to deal within next sprints

Comments

@iesahin
Copy link
Contributor

iesahin commented Oct 14, 2021

We need to update https://dvc.org/doc/start/metrics-parameters-plots after the Experiments Trail.

Related #2479
Related #2496
Related #2574

Points to discuss

  • The current document is a continuation of the pipelines document. Instead, it needs to be built on Experiments.
  • Code blocks with dvc run will be rewritten in dvc stage add and dvc exp run. Or, as the train stage is already there, we can just skip these "stage" discussions and move to params and metrics, plots within experiments.
  • How deep should we go into plotting?
  • It's possible to move to DVCLive and checkpoints quickly with plots and metrics.

Tentative Plan

  • start: Add a model.dense_units parameter to Experiments Trail in a later section to show how to add params
  • start: Add a section about metrics to the Experiments Trail
  • start: Add a section about plots and show how to generate plots from output files

Decisions

@iesahin iesahin self-assigned this Oct 14, 2021
@iesahin
Copy link
Contributor Author

iesahin commented Oct 18, 2021

I've added some discussion points and a tentative plan for updates to the Experiment trail for params/metrics/plots. Could you review? @shcheklein @dberenbaum @jorgeorpinel

@iesahin iesahin added the status: research Writing concrete steps for the issue label Oct 18, 2021
@dberenbaum
Copy link
Contributor

Looks good to me.

We can probably skip the dvc metrics show/diff and dvc params diff commands since the dvc exp show/diff commands mostly supersede these.

  • Code blocks with dvc run will be rewritten in dvc stage add and dvc exp run. Or, as the train stage is already there, we can just skip these "stage" discussions and move to params and metrics, plots within experiments.

Yup, I think we could probably skip adding the stage or address it briefly within the initial experiment setup.

  • How deep should we go into plotting?

It's probably too deep currently IMO. One way to simplify would be to start with an image file rather than a data file as the output, but it doesn't show off as much of what DVC can do, so I'm not sure. Probably easiest to start writing and then get feedback.

@shcheklein
Copy link
Member

Before we jump into coding (writing), could we outline here some specifics:

  • new section names
  • their structure
  • summary of the content (specially intro, how they are connected as a story, etc)

Also, what about dvclive? We need to cover that part as well.

If we are removing metrics and plots from the data management trail? We need to cover visualization I guess? Does it makes sense to do here or in a separate trail (Model Management?).

@iesahin iesahin added the C: start Content of /doc/start label Oct 20, 2021
@jorgeorpinel jorgeorpinel changed the title start: Update and integrate params/metrics/plots to Experiments Trail start: integrate (updated) params/metrics/plots to Experiments Trail Oct 24, 2021
@jorgeorpinel jorgeorpinel added the p1-important Active priorities to deal within next sprints label Oct 24, 2021
@iesahin
Copy link
Contributor Author

iesahin commented Oct 24, 2021

I have added a document-to-discuss to #2961. I believe it's better to discuss on a document with a concrete context. @shcheklein

@jorgeorpinel
Copy link
Contributor

Agree to decreasing the story continuation from pipelines but it's OK if the code samples do continue it I think? You need to be aware of stage definitions (dvc.yaml) for params and metrics/plots usage.

Agree to avoid stage definitions in this page (and migrate from run to stage add+repro in the pipelines one).

it needs to be built on Experiments.

Why though? Could it cover both basic params/metrics as a stand-alone feature ant then move onto the more advanced (albeit probably more useful) exp-based use? Agree to use exp show/diff only (maybe just mention metrics/plots alternatives).

How deep should we go into plotting?

Probably focus on metrics and just show that you can plot data-series metrics + mention image support.

move to DVCLive and checkpoints quickly with plots and metrics

DVCLive makes sense to use here but probably best to avoid explaining anything about checkpoints here (link or assume if needed).

@shcheklein
Copy link
Member

@iesahin I see some point that you added to the existing document, but they do not answer (at least I can't figure out this, maybe I'm missing something) questions I had. I would start with some very basics - where do we write this, how it is connected to the other parts, what do we cover.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

Before we jump into coding (writing), could we outline here some specifics:

new section names
their structure
summary of the content (specially intro, how they are connected as a story, etc)

I believe even we discuss these here, we'll revise them after we have a concrete document. Writing the summary and the storyline is not that different than writing the document itself. #2961 is about iterating on these. The notes in the document are directed to specify these, actually.

Also, what about dvclive? We need to cover that part as well.

We can cover it here.

If we are removing metrics and plots from the data management trail?

Params/metrics/plots are more related to experimentation than data management. Moving the topic to exp. trail makes sense to me.

We need to cover visualization I guess? Does it makes sense to do here or in a separate trail (Model Management?).

Data visualization is covered in the plots topic to some extent. The current documentation is not that deep, doesn't touch images as plots but these may be considered advanced. How detailed should we cover the visualization? What's the extent of the topic in your mind? @shcheklein

I think visualization is closer to experimentation than model management. In the latter we'll touch these models/plots/metrics as artifacts and won't cover their content. At least this is what I understand from model management.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

Agree to decreasing the story continuation from pipelines but it's OK if the code samples do continue it I think? You need to be aware of stage definitions (dvc.yaml) for params and metrics/plots usage.

Agree to avoid stage definitions in this page (and migrate from run to stage add+repro in the pipelines one).

I think code samples must continue on top of the experiments, not pipelines and they may need an overhaul. I don't intend to do unnecessary updates but if we'll base p/m/p on top of the experiments, most of the samples may need to be updated.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

it needs to be built on Experiments.

Why though? Could it cover both basic params/metrics as a stand-alone feature ant then move onto the more advanced (albeit probably more useful) exp-based use? Agree to use exp show/diff only (maybe just mention metrics/plots alternatives).

The decision in #2496 was to create three trails.

  • Data Management
  • Pipelines
  • Experiments

Params/metrics/plot look closer to experiments in this. We can link/refer to pipelines in it, but these are mostly related with experimentation. As we progress towards dvc exp init and "pipeline-free experiments", pipelines becomes secondary to p/m/p.

Experiments are not an "advanced feature", at least we aim to make it as beginner-friendly as possible. The reason behind most "content-pruning" in the exp. trail was this, I believe.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

How deep should we go into plotting?

Probably focus on metrics and just show that you can plot data-series metrics + mention image support.

This is also ~what I have in mind, and possibly @dberenbaum will agree to this. WDYT @shcheklein ?

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

move to DVCLive and checkpoints quickly with plots and metrics

DVCLive makes sense to use here but probably best to avoid explaining anything about checkpoints here (link or assume if needed).

This is also a good point to keep in mind. Thank you @jorgeorpinel

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

What I have in mind as a structure in Get Started / Experiments Trail is something like this:

Experiments Trail/
- Introduction (current GS/Experiments)
- Comparing and Persisting (current section in GS/Experiments)
- Parameters
- Metrics 
- Visualization
- DVCLive
- Checkpoints?

Each of these will be a standalone document that can be linked from other trails. So, when we want to discuss parameters in pipelines trail, we'll link to the parameters here.

These can be sections on a single page or we can have Get Started With Experiments, Get Started with Pipelines, Get Started with Data Management instead of the single Get Started, and have the above as titles in standalone documents.

WDYT @shcheklein @dberenbaum @jorgeorpinel

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

I'll create a sample for separate structure in #2961. We can discuss after it.

@iesahin
Copy link
Contributor Author

iesahin commented Oct 26, 2021

I've updated #2961 and split the current docs into separate documents.

image

You can see the proposed structure in the deployment: https://dvc-org-iesahin-gs-metr-bvmgk2.herokuapp.com/doc/start/experiments

@shcheklein
Copy link
Member

My initial take - this is way too many sections. Even if every single of them is one page long, this structure is probably suboptimal. What can we merge? At some point we had an iteration of the existing Get started exactly like this - 7-8 sections (you can even see this by the tag names in the repo). After quite a lot of discussions and feedback we grouped them the way they now.

E.g.

intro - rename to be something similar to the other sections (Running)
comparing - should be covered in the intro?
params - mention them in the intro
metrics - as well

plots + visualization - one section, they are related

checkpoints and dvclive - merge into "live metrics" or something?

@iesahin
Copy link
Contributor Author

iesahin commented Oct 27, 2021

We have a user's guide section titled "Running Experiments", I thought it might be confusing to have an identical GS section.

So, your take is merging params and metrics to the current https://dvc.org/doc/start/experiments, and creating a visualization page covering plots, and a dvclive with checkpoints.

I think that's a good idea, but checkpoints already look a bit too much. What about

  • Adding params/metrics within the context of experiments, as Dave suggested
  • Adding a visualization page, covering DVCLive, plots and plot images a bit more deeper than the current docs.

Basically we'll remove Metrics, Parameters and Plots from the sidebar, and add a Visualization and DVCLive page.

@dberenbaum
Copy link
Contributor

We will likely have some kind of release or push to specifically address deep learning scenarios in the next few months. That could include a get started page where dvclive and checkpoints are addressed, with more of a focus on a user problem than a set of features.

@shcheklein
Copy link
Member

@iesahin so, could summarize the new structure/make a screenshot pls?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 28, 2021

Params/metrics/plot look closer to experiments in this.

Except that if you want to introduce how to set them up that would be best in the pipelines trail I think. Why not have them in both e.g. definitions in Pipelines (link from exps), meaningful usage in Experiments (link from pipes) ?

Each of these will be a standalone document that can be linked from other trails

Was that also decided in the spike? I had the impression each trail would be a single page, like https://dvc.org/doc/start/experiments now.

running // comparing // params & metrics
plots + vis
dvclive

Feels like it could be a single (long due to tables and images) GS page.

But so are we reworking the whole GS/Experiments trail again right after it was merged? I'm a bit confused, sorry 😅

@jorgeorpinel
Copy link
Contributor

We have a user's guide section titled "Running Experiments", I thought it might be confusing to have an identical GS section

p.s. I do think https://dvc.org/doc/start/experiments is missing that H2: it's the first topic after the intro and video, but not mentioned in the right-hand ToC:

image

@iesahin
Copy link
Contributor Author

iesahin commented Oct 29, 2021

I have migrated the proposed draft to Notion. We can discuss the content & scope in https://www.notion.so/iterative/Experiments-Trail-Nov-21-0e2a492ba968405dbc0870adcaea3cc0

@iesahin iesahin added status/creating and removed status: research Writing concrete steps for the issue labels Nov 3, 2021
@jorgeorpinel
Copy link
Contributor

Hi. What is status: creating? Only this issue has that label. Is it needed?

@iesahin
Copy link
Contributor Author

iesahin commented Feb 1, 2022

It was about to specify the content that are being written. I'm deleting that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: start Content of /doc/start p1-important Active priorities to deal within next sprints
Projects
None yet
4 participants