From 1c0cd3386bbf7d1968814c8a662608a45f30b4ba Mon Sep 17 00:00:00 2001
From: Emre Sahin <github@emresult.com>
Date: Wed, 10 Mar 2021 12:38:04 +0300
Subject: [PATCH 1/3] updated intro for typos and changes ~/stages to ~/project

---
 get-started/stages/01-whats-a-stage.md         |  2 +-
 .../stages/02-manual-data-preparation.md       |  2 +-
 get-started/stages/05-how-dvc-tracks-stages.md |  2 +-
 .../stages/06-how-directories-are-cached.md    |  2 +-
 .../stages/07-add-featurization-stage.md       |  8 ++++----
 get-started/stages/08-reproduce-a-pipeline.md  |  2 +-
 .../stages/09-visualize-the-pipeline.md        |  4 ++--
 get-started/stages/index.json                  |  8 ++++----
 get-started/stages/init.sh                     |  2 +-
 get-started/stages/install.sh                  |  2 +-
 get-started/stages/intro.md                    | 18 +++++++++---------
 11 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/get-started/stages/01-whats-a-stage.md b/get-started/stages/01-whats-a-stage.md
index 2f55a1a..0ad2a39 100644
--- a/get-started/stages/01-whats-a-stage.md
+++ b/get-started/stages/01-whats-a-stage.md
@@ -7,7 +7,7 @@ machine learning project.
 
 [bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage
 
-We have a machine learning project already provided in `~/stages`. We covered
+We have a machine learning project already provided in `~/project`. We covered
 these steps in previous scenarios. DVC is installed. Data is downloaded from
 `https://github.com/iterative/dataset-registry` and made smaller. A _local
 remote_ is created in `/tmp/data-storage` named `mystorage`, and the data in the
diff --git a/get-started/stages/02-manual-data-preparation.md b/get-started/stages/02-manual-data-preparation.md
index 0b6bc18..968a839 100644
--- a/get-started/stages/02-manual-data-preparation.md
+++ b/get-started/stages/02-manual-data-preparation.md
@@ -3,7 +3,7 @@
 The script `src/prepare.py` splits the data into train and test sets. You can
 click the link below to open the preparation script in the editor.
 
-`stages/src/prepare.py`{{open}}
+`project/src/prepare.py`{{open}}
 
 We first run this script without DVC to see what happens:
 
diff --git a/get-started/stages/05-how-dvc-tracks-stages.md b/get-started/stages/05-how-dvc-tracks-stages.md
index 0151209..fcf96b0 100644
--- a/get-started/stages/05-how-dvc-tracks-stages.md
+++ b/get-started/stages/05-how-dvc-tracks-stages.md
@@ -6,7 +6,7 @@ define relationships between the data, code, parameters, and stages.
 
 Let's take a look at `dvc.yaml` file to see the content:
 
-`stages/dvc.yaml`{{open}}
+`project/dvc.yaml`{{open}}
 
 It contains what we supplied to `dvc stage add`. It lists stages by name and defines
 `cmd`, `deps` and `outs` for each of them.
diff --git a/get-started/stages/06-how-directories-are-cached.md b/get-started/stages/06-how-directories-are-cached.md
index ed91eaa..7a88a37 100644
--- a/get-started/stages/06-how-directories-are-cached.md
+++ b/get-started/stages/06-how-directories-are-cached.md
@@ -16,7 +16,7 @@ their hash values.
 For example we see that the individual hash value of `train.tsv` as
 `fcebfd4c6f1645ac4987d39f1c5cf610` and check its content
 
-`stages/.dvc/cache/fc/ebfd4c6f1645ac4987d39f1c5cf610`{{open}}.
+`project/.dvc/cache/fc/ebfd4c6f1645ac4987d39f1c5cf610`{{open}}.
 
 Note also that DVC adds `/prepared` to `.gitignore` to prevent output data
 files to be committed in Git.
diff --git a/get-started/stages/07-add-featurization-stage.md b/get-started/stages/07-add-featurization-stage.md
index 7f475a3..942f047 100644
--- a/get-started/stages/07-add-featurization-stage.md
+++ b/get-started/stages/07-add-featurization-stage.md
@@ -7,7 +7,7 @@ with DVC.
 _Featurization_ step is run by `src/featurization.py`. You can check the
 contents of this program by clicking the link below.
 
-`stages/src/featurization.py`{{open}}
+`project/src/featurization.py`{{open}}
 
 We use `dvc.yaml` file in the previous step to add another stage. We name the
 stage `featurize`. It has two dependencies: one is the code file, and
@@ -16,11 +16,11 @@ ready for training as an output.
 
 Please click the below link to open the file in the editor.
 
-`stages/dvc.yaml`{{open}}
+`project/dvc.yaml`{{open}}
 
 Now please click the below text to append the stage configuration to the file.
 
-<pre class="file" data-filename="stages/dvc.yaml" data-target="append">
+<pre class="file" data-filename="project/dvc.yaml" data-target="append">
   featurize:
     cmd: >-
       python3 src/featurization.py data/prepared data/features
@@ -42,4 +42,4 @@ dataset.
 ```
 git add dvc.yaml dvc.lock data/.gitignore
 git commit -m "Configured prepare stage"
-```{{execute}}
\ No newline at end of file
+```{{execute}}
diff --git a/get-started/stages/08-reproduce-a-pipeline.md b/get-started/stages/08-reproduce-a-pipeline.md
index 664585d..4e7847d 100644
--- a/get-started/stages/08-reproduce-a-pipeline.md
+++ b/get-started/stages/08-reproduce-a-pipeline.md
@@ -53,7 +53,7 @@ changes `dvc repro` won't rerun any part of it.
 Suppose we decided to update our code for `src/prepare.py` by adding the
 following line to it.
 
-<pre class="file" data-filename="stages/src/prepare.py" data-target="append">
+<pre class="file" data-filename="project/src/prepare.py" data-target="append">
 # THIS COMMENT CHANGES MD5 HASH OF THE FILE
 </pre>
 
diff --git a/get-started/stages/09-visualize-the-pipeline.md b/get-started/stages/09-visualize-the-pipeline.md
index 4d96be1..5507225 100644
--- a/get-started/stages/09-visualize-the-pipeline.md
+++ b/get-started/stages/09-visualize-the-pipeline.md
@@ -32,7 +32,7 @@ and convert the `.dot` file to PNG using:
 
 Now we can view the pipeline in an image format by clicking the link below: 
 
-`stages/pipeline.png`{{open}}
+`project/pipeline.png`{{open}}
 
 Let's commit the changes in this step to Git.
 
@@ -44,4 +44,4 @@ git commit -m "another stage to the pipeline is added"
 
 In the next step, we'll see how to run these two stages together.
 
-[graphviz]: https://graphviz.org
\ No newline at end of file
+[graphviz]: https://graphviz.org
diff --git a/get-started/stages/index.json b/get-started/stages/index.json
index 7d99618..82b4b6c 100644
--- a/get-started/stages/index.json
+++ b/get-started/stages/index.json
@@ -66,20 +66,20 @@
                 },
                 {
                     "file": "params.yaml",
-                    "target": "/root/stages"
+                    "target": "/root/project"
                 },
                 {
                     "file": "src/",
-                    "target": "/root/stages/"
+                    "target": "/root/project/"
                 }
             ]
         }
     },
     "environment": {
-        "uieditorpath": "/root/stages",
+        "uieditorpath": "/root/project",
         "uilayout": "vscode-terminal-split"
     },
     "backend": {
         "imageid": "ubuntu:2004"
     }
-}
\ No newline at end of file
+}
diff --git a/get-started/stages/init.sh b/get-started/stages/init.sh
index 6058e61..ae9cd6a 100755
--- a/get-started/stages/init.sh
+++ b/get-started/stages/init.sh
@@ -21,6 +21,6 @@ source /etc/bash_completion
 # clear screen
 clear
 
-cd stages
+cd project
 # auto-play preparation steps
 DELAY=0 play prepare.sh
diff --git a/get-started/stages/install.sh b/get-started/stages/install.sh
index b36d6f9..90e8e38 100755
--- a/get-started/stages/install.sh
+++ b/get-started/stages/install.sh
@@ -12,4 +12,4 @@ wget -O /etc/bash_completion.d/dvc \
     https://raw.githubusercontent.com/iterative/dvc/master/scripts/completion/dvc.bash
 
 # this is about a bug in index.json
-rm -f /root/stages/play /root/stages/prepare.sh /root/stages/example-flow.png
+rm -f /root/project/play /root/project/prepare.sh /root/project/example-flow.png
diff --git a/get-started/stages/intro.md b/get-started/stages/intro.md
index a3998c9..b33fe5b 100644
--- a/get-started/stages/intro.md
+++ b/get-started/stages/intro.md
@@ -1,17 +1,17 @@
 The commands that we have seen so far (`add`, `push`, `pull`, etc.) provide a
-useful framework to track, save and share models and large data files. In
-some cases and projects, this could be all you need.
+useful framework to track, save, and share models and large data files. In some
+cases and projects, this could be all you need.
 
-Usually, in ML projects, you need to process data and generate
-outputs in a reproducible way. This requires establishing a connection
-between the data processed, the program that processes them,
-the parameters, and the outputs.
+Usually, in ML projects, you need to process data and generate outputs in a
+reproducible way. This requires establishing a connection between the data
+processed, the program that processes them, its parameters and the outputs.
 
 In a typical machine learning project we have the following stages: 
 
 ![](/dvc/courses/get-started/stages/assets/example-flow.png)
 
-This process is reflected in DVC with a [pipeline][bcpipeline]. In this scenario
-we begin to build pipelines using stage definitions and connect them together.
+This process is reflected in DVC with a [data pipeline][bcpipeline]. In this
+scenario we begin to build pipelines using stage definitions and connect them
+together.
 
-[bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline
\ No newline at end of file
+[bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline

From f1424133e16e59def079f7dca48b5d8146e8919a Mon Sep 17 00:00:00 2001
From: Emre Sahin <github@emresult.com>
Date: Wed, 10 Mar 2021 13:39:50 +0300
Subject: [PATCH 2/3] merged step1 and intro and other fixes in #29

---
 ...ation.md => 01-manual-data-preparation.md} |  0
 get-started/stages/01-whats-a-stage.md        | 18 ------------
 ...adding-a-stage.md => 02-adding-a-stage.md} |  0
 ...nning-a-stage.md => 03-running-a-stage.md} |  0
 ...-stages.md => 04-how-dvc-tracks-stages.md} |  0
 ...ed.md => 05-how-directories-are-cached.md} |  0
 ...stage.md => 06-add-featurization-stage.md} |  0
 ...pipeline.md => 07-reproduce-a-pipeline.md} |  0
 ...peline.md => 08-visualize-the-pipeline.md} |  0
 .../stages/{10-ending.md => 09-ending.md}     |  0
 get-started/stages/index.json                 | 22 ++++++---------
 get-started/stages/intro.md                   | 28 +++++++++++++------
 12 files changed, 28 insertions(+), 40 deletions(-)
 rename get-started/stages/{02-manual-data-preparation.md => 01-manual-data-preparation.md} (100%)
 delete mode 100644 get-started/stages/01-whats-a-stage.md
 rename get-started/stages/{03-adding-a-stage.md => 02-adding-a-stage.md} (100%)
 rename get-started/stages/{04-running-a-stage.md => 03-running-a-stage.md} (100%)
 rename get-started/stages/{05-how-dvc-tracks-stages.md => 04-how-dvc-tracks-stages.md} (100%)
 rename get-started/stages/{06-how-directories-are-cached.md => 05-how-directories-are-cached.md} (100%)
 rename get-started/stages/{07-add-featurization-stage.md => 06-add-featurization-stage.md} (100%)
 rename get-started/stages/{08-reproduce-a-pipeline.md => 07-reproduce-a-pipeline.md} (100%)
 rename get-started/stages/{09-visualize-the-pipeline.md => 08-visualize-the-pipeline.md} (100%)
 rename get-started/stages/{10-ending.md => 09-ending.md} (100%)

diff --git a/get-started/stages/02-manual-data-preparation.md b/get-started/stages/01-manual-data-preparation.md
similarity index 100%
rename from get-started/stages/02-manual-data-preparation.md
rename to get-started/stages/01-manual-data-preparation.md
diff --git a/get-started/stages/01-whats-a-stage.md b/get-started/stages/01-whats-a-stage.md
deleted file mode 100644
index 0ad2a39..0000000
--- a/get-started/stages/01-whats-a-stage.md
+++ /dev/null
@@ -1,18 +0,0 @@
-# What's a stage?
-
-[Stages][bcstage] are the basic building blocks of pipelines in DVC. They define
-and execute an action, like data import or feature extraction, and usually
-produce some output. In this scenario, we create stages and pipelines for a
-machine learning project.
-
-[bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage
-
-We have a machine learning project already provided in `~/project`. We covered
-these steps in previous scenarios. DVC is installed. Data is downloaded from
-`https://github.com/iterative/dataset-registry` and made smaller. A _local
-remote_ is created in `/tmp/data-storage` named `mystorage`, and the data in the
-DVC repository is pushed. Code and python requirements are prepared, and all
-changes are committed to Git.
-
-You can use the editor to browse the project.
-
diff --git a/get-started/stages/03-adding-a-stage.md b/get-started/stages/02-adding-a-stage.md
similarity index 100%
rename from get-started/stages/03-adding-a-stage.md
rename to get-started/stages/02-adding-a-stage.md
diff --git a/get-started/stages/04-running-a-stage.md b/get-started/stages/03-running-a-stage.md
similarity index 100%
rename from get-started/stages/04-running-a-stage.md
rename to get-started/stages/03-running-a-stage.md
diff --git a/get-started/stages/05-how-dvc-tracks-stages.md b/get-started/stages/04-how-dvc-tracks-stages.md
similarity index 100%
rename from get-started/stages/05-how-dvc-tracks-stages.md
rename to get-started/stages/04-how-dvc-tracks-stages.md
diff --git a/get-started/stages/06-how-directories-are-cached.md b/get-started/stages/05-how-directories-are-cached.md
similarity index 100%
rename from get-started/stages/06-how-directories-are-cached.md
rename to get-started/stages/05-how-directories-are-cached.md
diff --git a/get-started/stages/07-add-featurization-stage.md b/get-started/stages/06-add-featurization-stage.md
similarity index 100%
rename from get-started/stages/07-add-featurization-stage.md
rename to get-started/stages/06-add-featurization-stage.md
diff --git a/get-started/stages/08-reproduce-a-pipeline.md b/get-started/stages/07-reproduce-a-pipeline.md
similarity index 100%
rename from get-started/stages/08-reproduce-a-pipeline.md
rename to get-started/stages/07-reproduce-a-pipeline.md
diff --git a/get-started/stages/09-visualize-the-pipeline.md b/get-started/stages/08-visualize-the-pipeline.md
similarity index 100%
rename from get-started/stages/09-visualize-the-pipeline.md
rename to get-started/stages/08-visualize-the-pipeline.md
diff --git a/get-started/stages/10-ending.md b/get-started/stages/09-ending.md
similarity index 100%
rename from get-started/stages/10-ending.md
rename to get-started/stages/09-ending.md
diff --git a/get-started/stages/index.json b/get-started/stages/index.json
index 82b4b6c..902569a 100644
--- a/get-started/stages/index.json
+++ b/get-started/stages/index.json
@@ -7,43 +7,39 @@
         "steps": [
             {
                 "title": "Step 1",
-                "text": "01-whats-a-stage.md"
+                "text": "01-manual-data-preparation.md"
             },
             {
                 "title": "Step 2",
-                "text": "02-manual-data-preparation.md"
+                "text": "02-adding-a-stage.md"
             },
             {
                 "title": "Step 3",
-                "text": "03-adding-a-stage.md"
+                "text": "03-running-a-stage.md"
             },
             {
                 "title": "Step 4",
-                "text": "04-running-a-stage.md"
+                "text": "04-how-dvc-tracks-stages.md"
             },
             {
                 "title": "Step 5",
-                "text": "05-how-dvc-tracks-stages.md"
+                "text": "05-how-directories-are-cached.md"
             },
             {
                 "title": "Step 6",
-                "text": "06-how-directories-are-cached.md"
+                "text": "06-add-featurization-stage.md"
             },
             {
                 "title": "Step 7",
-                "text": "07-add-featurization-stage.md"
+                "text": "07-reproduce-a-pipeline.md"
             },
             {
                 "title": "Step 8",
-                "text": "08-reproduce-a-pipeline.md"
-            },
-            {
-                "title": "Step 9",
-                "text": "09-visualize-the-pipeline.md"
+                "text": "08-visualize-the-pipeline.md"
             },
             {
                 "title": "Congratulations!",
-                "text": "10-ending.md"
+                "text": "09-ending.md"
             }
         ],
         "intro": {
diff --git a/get-started/stages/intro.md b/get-started/stages/intro.md
index b33fe5b..1d82f44 100644
--- a/get-started/stages/intro.md
+++ b/get-started/stages/intro.md
@@ -1,17 +1,27 @@
-The commands that we have seen so far (`add`, `push`, `pull`, etc.) provide a
-useful framework to track, save, and share models and large data files. In some
-cases and projects, this could be all you need.
-
-Usually, in ML projects, you need to process data and generate outputs in a
+In ML projects, usually we need to process data and generate outputs in a
 reproducible way. This requires establishing a connection between the data
-processed, the program that processes them, its parameters and the outputs.
-
-In a typical machine learning project we have the following stages: 
+processed, the program that processes them, its parameters, and the outputs.
 
 ![](/dvc/courses/get-started/stages/assets/example-flow.png)
 
 This process is reflected in DVC with a [data pipeline][bcpipeline]. In this
-scenario we begin to build pipelines using stage definitions and connect them
+scenario, we begin to build pipelines using stage definitions and connect them
 together.
 
 [bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline
+
+[Stages][bcstage] are the basic building blocks of pipelines in DVC. They define
+and execute an action, like data import or feature extraction, and usually
+produce some output. 
+
+[bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage
+
+We have a machine learning project already provided in `~/project`. We provided
+source files in `~/project/src/`, downloaded data to `data/data.xml`, and made
+it smaller. You can review these steps in more detail in [Data and Model
+Versioning][v] and [Accessing Data and Models][a] scenarios.
+
+[v]: https://katacoda.com/dvc/courses/get-started/versioning
+[a]: https://katacoda.com/dvc/courses/get-started/accessing
+
+You can use the editor to browse the project.

From 6d1acc95d60de3ff809dfd4c358ad9e9ddac7dda Mon Sep 17 00:00:00 2001
From: Emre Sahin <github@emresult.com>
Date: Wed, 10 Mar 2021 14:04:54 +0300
Subject: [PATCH 3/3] Edited some sentences and moved them to intro

---
 get-started/stages/01-manual-data-preparation.md | 12 ++----------
 get-started/stages/intro.md                      | 10 +++++++---
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/get-started/stages/01-manual-data-preparation.md b/get-started/stages/01-manual-data-preparation.md
index 968a839..636d075 100644
--- a/get-started/stages/01-manual-data-preparation.md
+++ b/get-started/stages/01-manual-data-preparation.md
@@ -1,7 +1,6 @@
 # Manual Data Preparation 
 
-The script `src/prepare.py` splits the data into train and test sets. You can
-click the link below to open the preparation script in the editor.
+The script `src/prepare.py` splits the data into train and test sets. (Click links to open in the editor)
 
 `project/src/prepare.py`{{open}}
 
@@ -11,14 +10,7 @@ We first run this script without DVC to see what happens:
 
 It splits the data into train and test sets. We check the contents:
 
-`head data/prepared/train.tsv`{{execute}}
-
-`head data/prepared/test.tsv`{{execute}}
-
-Our goal is to create a project that classifies the questions and assigns tags
-to them. In a world _without_ DVC tasks like data preparation, training,
-testing, evaluation, etc. are run manually, and this is prone to errors all we
-know from working with too many moving parts.  
+`ls -l data/prepared`{{execute}}
 
 We use DVC to automate the tasks required to build a classifier and provide a
 fully reproducible pipeline.
diff --git a/get-started/stages/intro.md b/get-started/stages/intro.md
index 1d82f44..8008f8c 100644
--- a/get-started/stages/intro.md
+++ b/get-started/stages/intro.md
@@ -8,6 +8,7 @@ This process is reflected in DVC with a [data pipeline][bcpipeline]. In this
 scenario, we begin to build pipelines using stage definitions and connect them
 together.
 
+
 [bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline
 
 [Stages][bcstage] are the basic building blocks of pipelines in DVC. They define
@@ -16,9 +17,12 @@ produce some output.
 
 [bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage
 
-We have a machine learning project already provided in `~/project`. We provided
-source files in `~/project/src/`, downloaded data to `data/data.xml`, and made
-it smaller. You can review these steps in more detail in [Data and Model
+In this scenario, our goal is to create a project that classifies the
+questions and assigns tags to them. In a world _without_ DVC, tasks like
+data preparation, training, testing, evaluation are run manually, and this
+is prone to errors caused by too many moving parts. We provided the source
+files in `~/project/src/`, downloaded data to `data/data.xml`, and made it
+smaller. You can review these steps in more detail in [Data and Model
 Versioning][v] and [Accessing Data and Models][a] scenarios.
 
 [v]: https://katacoda.com/dvc/courses/get-started/versioning