This repository has been archived by the owner on Jul 5, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
merged step1 and intro and other fixes in #29
- Loading branch information
Showing
12 changed files
with
28 additions
and
40 deletions.
There are no files selected for viewing
File renamed without changes.
This file was deleted.
Oops, something went wrong.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,27 @@ | ||
The commands that we have seen so far (`add`, `push`, `pull`, etc.) provide a | ||
useful framework to track, save, and share models and large data files. In some | ||
cases and projects, this could be all you need. | ||
|
||
Usually, in ML projects, you need to process data and generate outputs in a | ||
In ML projects, usually we need to process data and generate outputs in a | ||
reproducible way. This requires establishing a connection between the data | ||
processed, the program that processes them, its parameters and the outputs. | ||
|
||
In a typical machine learning project we have the following stages: | ||
processed, the program that processes them, its parameters, and the outputs. | ||
|
||
![](/dvc/courses/get-started/stages/assets/example-flow.png) | ||
|
||
This process is reflected in DVC with a [data pipeline][bcpipeline]. In this | ||
scenario we begin to build pipelines using stage definitions and connect them | ||
scenario, we begin to build pipelines using stage definitions and connect them | ||
together. | ||
|
||
[bcpipeline]: https://dvc.org/doc/user-guide/basic-concepts/pipeline | ||
|
||
[Stages][bcstage] are the basic building blocks of pipelines in DVC. They define | ||
and execute an action, like data import or feature extraction, and usually | ||
produce some output. | ||
|
||
[bcstage]: https://dvc.org/doc/user-guide/basic-concepts/stage | ||
|
||
We have a machine learning project already provided in `~/project`. We provided | ||
source files in `~/project/src/`, downloaded data to `data/data.xml`, and made | ||
it smaller. You can review these steps in more detail in [Data and Model | ||
Versioning][v] and [Accessing Data and Models][a] scenarios. | ||
|
||
[v]: https://katacoda.com/dvc/courses/get-started/versioning | ||
[a]: https://katacoda.com/dvc/courses/get-started/accessing | ||
|
||
You can use the editor to browse the project. |