Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.

Commit

Permalink
merged step1 and intro and other fixes in #29
Browse files Browse the repository at this point in the history
  • Loading branch information
iesahin committed Mar 10, 2021
1 parent 1c0cd33 commit f142413
Show file tree
Hide file tree
Showing 12 changed files with 28 additions and 40 deletions.
18 changes: 0 additions & 18 deletions get-started/stages/01-whats-a-stage.md

This file was deleted.

File renamed without changes.
22 changes: 9 additions & 13 deletions get-started/stages/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,43 +7,39 @@
"steps": [
{
"title": "Step 1",
"text": "01-whats-a-stage.md"
"text": "01-manual-data-preparation.md"
},
{
"title": "Step 2",
"text": "02-manual-data-preparation.md"
"text": "02-adding-a-stage.md"
},
{
"title": "Step 3",
"text": "03-adding-a-stage.md"
"text": "03-running-a-stage.md"
},
{
"title": "Step 4",
"text": "04-running-a-stage.md"
"text": "04-how-dvc-tracks-stages.md"
},
{
"title": "Step 5",
"text": "05-how-dvc-tracks-stages.md"
"text": "05-how-directories-are-cached.md"
},
{
"title": "Step 6",
"text": "06-how-directories-are-cached.md"
"text": "06-add-featurization-stage.md"
},
{
"title": "Step 7",
"text": "07-add-featurization-stage.md"
"text": "07-reproduce-a-pipeline.md"
},
{
"title": "Step 8",
"text": "08-reproduce-a-pipeline.md"
},
{
"title": "Step 9",
"text": "09-visualize-the-pipeline.md"
"text": "08-visualize-the-pipeline.md"
},
{
"title": "Congratulations!",
"text": "10-ending.md"
"text": "09-ending.md"
}
],
"intro": {
Expand Down
28 changes: 19 additions & 9 deletions get-started/stages/intro.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
The commands that we have seen so far (`add`, `push`, `pull`, etc.) provide a
useful framework to track, save, and share models and large data files. In some
cases and projects, this could be all you need.

Usually, in ML projects, you need to process data and generate outputs in a
In ML projects, usually we need to process data and generate outputs in a
reproducible way. This requires establishing a connection between the data
processed, the program that processes them, its parameters and the outputs.

In a typical machine learning project we have the following stages:
processed, the program that processes them, its parameters, and the outputs.

![](/dvc/courses/get-started/stages/assets/example-flow.png)

This process is reflected in DVC with a [data pipeline][bcpipeline]. In this
scenario we begin to build pipelines using stage definitions and connect them
scenario, we begin to build pipelines using stage definitions and connect them
together.

[bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline

[Stages][bcstage] are the basic building blocks of pipelines in DVC. They define
and execute an action, like data import or feature extraction, and usually
produce some output.

[bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage

We have a machine learning project already provided in `~/project`. We provided
source files in `~/project/src/`, downloaded data to `data/data.xml`, and made
it smaller. You can review these steps in more detail in [Data and Model
Versioning][v] and [Accessing Data and Models][a] scenarios.

[v]: https://katacoda.com/dvc/courses/get-started/versioning
[a]: https://katacoda.com/dvc/courses/get-started/accessing

You can use the editor to browse the project.

0 comments on commit f142413

Please sign in to comment.