Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: ML Pipelines (1): Defining Pipelines & Stages #3414

Merged
merged 103 commits into from
Sep 7, 2022

Conversation

iesahin
Copy link
Contributor

@iesahin iesahin commented Apr 5, 2022

Related to #2883

In review app: https://dvc-org-iesahin-ug-pipe-qykfpz.herokuapp.com/doc/user-guide/data-pipelines


@restyled-io restyled-io bot mentioned this pull request Apr 5, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-26xe74 April 5, 2022 15:33 Inactive
@iesahin iesahin marked this pull request as draft April 5, 2022 15:34
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-26xe74 April 6, 2022 14:14 Inactive
@jorgeorpinel

This comment was marked as resolved.

@restyled-io restyled-io bot mentioned this pull request May 9, 2022
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first reviews @iesahin ☝🏼

Thanks

@gatsby-cloud
Copy link

gatsby-cloud bot commented May 9, 2022

Gatsby Cloud Build Report

dvc.org

🎉 Your build was successful! See the Deploy preview here.

Build Details

View the build logs here.

🕐 Build time: 1m

Performance

Lighthouse report

Metric Score
Performance 🔶 61
Accessibility 💚 98
Best Practices 🔶 83
SEO 💚 93

🔗 View full report

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-qykfpz September 2, 2022 20:02 Inactive
@jorgeorpinel
Copy link
Contributor

happy to let you two merge when ready

I think it's mergeable @dberenbaum but needs an approval.

A stage represents individual data processes, including their input and
resulting output which can be combined to build detailed machine learning
pipelines.
Stages capture the commands, scripts, or code that a DVC pipeline executes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture -> represents (it was better after all I think)

that a DVC pipeline executes

it's like defining pipelines though pipelines. Can we do something like "that you would run as part of your project to get the result (e..g. train.py) ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in afe2ca5. PTAL

evaluate: ... # stage 3 definition
```

- Capture other useful metadata such as runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capture and describe ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or specify ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "capture" is enough (high-level, more details in other sections/doc). But feel free to commit:

Suggested change
- Capture other useful metadata such as runtime
- Capture and describe other useful metadata such as runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

Suggested change
- Capture other useful metadata such as runtime
- Specify other useful metadata such as runtime


<admon type="info">

We call this file-based definition _codification_ (YAML format in our case). It
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have it in use case, but no here - not practical, doesn't have any useful information

Copy link
Contributor

@jorgeorpinel jorgeorpinel Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you but the purpose of this sentence is to be able to get to the next one. I want to say that you can get on GitOps and that's enabled by the codification. Should we just throw the term "codify" somewhere in there without explaining it? Like

Codifying your pipeline with DVC has the added benefit of allowing you to develop pipelines on standard Git workflows (GitOps).

@shcheklein
Copy link
Member

@jorgeorpinel it looks better, I still don't like it tbh.

I think the whole pipelines and defining pipelines section should be focused on the first section of the page (where we describe the process). I feel that describing again formally different types of outs, deps, stage doesn't make sense here (at least because it overlaps with a formal definition).

We should probably talk more about dvc exp init here? (since it helps to bootstrap the dvc.yaml after all)?

we should provide some example - actual pipepline files? mention VS Code as an editor that supports schema definition, etc

Include things like Jupyter notebooks - how to make a pipeline out of it ... etc

wdyt @dberenbaum @jorgeorpinel ?

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-qykfpz September 7, 2022 04:15 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-qykfpz September 7, 2022 04:22 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-qykfpz September 7, 2022 04:32 Inactive
@jorgeorpinel jorgeorpinel removed their assignment Sep 7, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-ug-pipe-qykfpz September 7, 2022 04:47 Inactive
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 7, 2022

Your proposal makes sense to me @shcheklein . I think we can update #2883 based on that, merge #3899 and this, and follow up on that as well as remaining topics for this guide (Reproduction, Operationalizing, Experimentation -- I have drafts of all these docs).

We should probably talk more about dvc exp init here?

That one I imagined should be a separate page (Experimenting with/ Experimental Pipelines), but we could def. mention exp init here and link to that new page when we get there.

@dberenbaum
Copy link
Contributor

Agree with @jorgeorpinel that the proposals from @shcheklein make sense but we can include them in future PRs. As long as this PR improves upon the current docs and there's nothing wrong/blocking in it, can we merge?

dvc exp init and Jupyter notebook migration are still not well defined and might make this an endless PR IMHO. We are also having product discussions related to those ideas, so let me think about how we can consolidate the product and docs discussions here 🤔 🙏 .

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are not pending discussions, let's merge this.

@jorgeorpinel jorgeorpinel merged commit 55d9164 into main Sep 7, 2022
@jorgeorpinel jorgeorpinel deleted the iesahin/ug-pipelines branch September 7, 2022 22:25
jorgeorpinel added a commit that referenced this pull request Sep 7, 2022
* initial plan and some content

* added content about stages

* title and restyle fixes

* added dag section

* depend to -> depend on

* added dependencies section

* Update content/docs/user-guide/pipelines/index.md

* Update content/docs/user-guide/pipelines/index.md

* Restyled by prettier (#3532)

Co-authored-by: Restyled.io <[email protected]>

* Update content/docs/user-guide/pipelines/index.md

Co-authored-by: Jorge Orpinel <[email protected]>

* added pipelines to sidebar

* updated the title

* fixed formatting

* updating for dvc.yaml first

* fed -> used

* dvc.yaml-first

* editing to tell dvc.yaml first

* minor fix

* url dependency

* dvc lock example

* section titles for deps

* section titles for outputs

* reproduction -> running

* adding hyperparameters section

* added experiments section

* adding url dependencies

* added outputs section content

* minor

* added running pipelines content

* moved outputs below running

* removed plots section header

* guide: Defining Data Pipelines

* guide: split up Data Pipelines section

* Update content/docs/command-reference/plots/templates.md

* guide: Data Pipes -> ML Pipes
per #3414 (review)

* guide: oops, remove op-pipes file

* guide: remoge ML Pipes intro
per #3414 (review)

* guide: mention both imports in Def ML Pipes
rel #3414 (review)

* guide: move DAG info from cmd ref
per #3414 (review)

* guide: move all info and links about DAG to ML Pipes
section about Dependency Graphs

* guide: point from some Stage links to ML Pipes
section on Stages (Defining Pipes)

* guide: delete Running ML Pipes (for now)

* nav: remove future ML Pipes guides

* guide: remove ML Pipes/ Experimental Pipes

* roll back unrelated changes...

* guide: roll back dvc.yaml page changes
per #3414 (review)

* guide: link ML Pipes/ Defining Stages to dvc.yaml/stages spec

* guide: link deps and params tooltip to ML Pipes/ Stages guide sections

* guide: links from dvc.yaml doc to ML Pipes/ Stages

* guide: more links

* guide: oops, remove unused files

* remove unrelated change

* guide: move stage definition details to ML Pipes
from cmd ref

* guide: move stage command details into ML Pipes
re-link from existing places

* ref: roll back unrelated changes

* .

* ref: few more links to dependency graph in guide

* ref: reorg exp init to include simple usage example in Desc
per #3414 (review)

* concept: reintroduce DAG in more places
per #3414 (review)

* guide: pipelines are not ML-specific
per #3414 (review)

* guide: more details for params fields
per #3414 (review)

* one word

* guide: restructure Def Pipes and
fix links

rel #3414 (review)

* guide: rewrite Def Pipes intro
per #3414 (review)

* guide: move DAG up in Def Pipes
per #3414 (review)

* guide: inner link in Def Pipes

* guide: fix link and typos

* start: revert DAG changes
per #3414 (review)

* guide: use typical ML stage names
per #3414 (review)

* guide: better flow in Pipes index
per #3414 (comment)

* glossary: high-level def of Pipes
per #3414 (review)

* guide: move Stage command to dvc.yaml ref
per #3414 (review)

* guide: remove abc mention

* guide: edits to Defining Pipes

* guide: improve Param deps in Def Pipes and
remove details from other places

* guide: add Outputs to Def Pipes
and other edits

* guide: update dep, param and out tooltips

* guide: separate params in Pipes vs Exps
per #3414 (comment)

* ref: move Stage commands section of dvc.yaml up
per #3414 (review)

* guide: update Def Pipes and DAG
per #3414 (review)
and others

* params: more separation of content and
create small params file section in dvc.yaml ref

* concept: rehash params
per #3414 (review)

* guide: more holistic pipelining info
per #3414 (review)

* guide: Pipe edits

* params: roll back changes for now...

* Revert "params: roll back changes for now..."

This reverts commit 23cd9c6.

* guide: more edits to Pipelines
per #3414 (review)
and beyond

* params: do not define as "simple values"
per #3899 (review)
and #3899 (comment)

* ref: better params index intro
per #3899 (review)
and #3899 (review)

* ref: mention param groups in dvc.yaml
per #3899 (review)

* params: DVC can pass them via templating/dict unpacking
per #3899 (review)

Co-authored-by: iterative <[email protected]>
Co-authored-by: Emre Şahin <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
This was referenced Sep 8, 2022
jorgeorpinel added a commit that referenced this pull request Oct 12, 2022
* initial plan and some content

* added content about stages

* title and restyle fixes

* added dag section

* depend to -> depend on

* added dependencies section

* Update content/docs/user-guide/pipelines/index.md

* Update content/docs/user-guide/pipelines/index.md

* Restyled by prettier (#3532)

Co-authored-by: Restyled.io <[email protected]>

* Update content/docs/user-guide/pipelines/index.md

Co-authored-by: Jorge Orpinel <[email protected]>

* added pipelines to sidebar

* updated the title

* fixed formatting

* updating for dvc.yaml first

* fed -> used

* dvc.yaml-first

* editing to tell dvc.yaml first

* minor fix

* url dependency

* dvc lock example

* section titles for deps

* section titles for outputs

* reproduction -> running

* adding hyperparameters section

* added experiments section

* adding url dependencies

* added outputs section content

* minor

* added running pipelines content

* moved outputs below running

* removed plots section header

* guide: Defining Data Pipelines

* guide: split up Data Pipelines section

* Update content/docs/command-reference/plots/templates.md

* guide: Data Pipes -> ML Pipes
per #3414 (review)

* guide: oops, remove op-pipes file

* guide: remoge ML Pipes intro
per #3414 (review)

* guide: mention both imports in Def ML Pipes
rel #3414 (review)

* guide: move DAG info from cmd ref
per #3414 (review)

* guide: move all info and links about DAG to ML Pipes
section about Dependency Graphs

* guide: point from some Stage links to ML Pipes
section on Stages (Defining Pipes)

* guide: delete Running ML Pipes (for now)

* nav: remove future ML Pipes guides

* guide: remove ML Pipes/ Experimental Pipes

* roll back unrelated changes...

* guide: roll back dvc.yaml page changes
per #3414 (review)

* guide: link ML Pipes/ Defining Stages to dvc.yaml/stages spec

* guide: some changes to start improving the dvc.yaml guide

* guide: edit Stages section (dvc.yaml guide) and
and move Stage entry spec to right after that.

* guide: link deps and params tooltip to ML Pipes/ Stages guide sections

* guide: links from dvc.yaml doc to ML Pipes/ Stages

* guide: update dvc.yaml Templating spec

* guide: a couple more admons for dvc.yaml doc

* guide: more links

* guide: oops, remove unused files

* remove unrelated change

* ref: update stage add/ run Descs

* guide: move stage definition details to ML Pipes
from cmd ref

* guide: move stage definition details to ML Pipes
from cmd ref

* guide: move stage command details into ML Pipes
re-link from existing places

* guide; update Stage entries spec descs.

* guide: admons for dvc.yaml page

* guide: roll back wrong change

* edits to dvc.yaml doc
per #3730 (review)

* ref: roll back unrelated changes

* .

* ref: few more links to dependency graph in guide

* ref: refactor run, stage add, and repro a little

* guide: drop old pipelines guide

* ref: revert files
one change moved to #4024

* pipelines: revert a bunch of files (for now)
per #3789 (comment)

* proper admons

Co-authored-by: iterative <[email protected]>
Co-authored-by: Emre Şahin <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: guide Content of /doc/user-guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants