Skip to content

Commit

Permalink
guide: remove Basic Concepts page
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Aug 10, 2020
1 parent d3cdb87 commit fde9d6b
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 125 deletions.
1 change: 0 additions & 1 deletion content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,6 @@
"slug": "what-is-dvc",
"source": "what-is-dvc.md"
},
"basic-concepts",
{
"label": "DVC Files and Directories",
"slug": "dvc-files-and-directories"
Expand Down
108 changes: 0 additions & 108 deletions content/docs/user-guide/basic-concepts.md

This file was deleted.

22 changes: 10 additions & 12 deletions content/docs/user-guide/related-technologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,10 @@ bringing best practices from software engineering into the data science field

## Git

- DVC builds upon Git by introducing the concept of
[data files](/doc/user-guide/basic-concepts#data-files) – large files that
should not be stored in a Git repository, but still need to be tracked and
versioned. It leverages Git's features to enable managing different versions
of data itself, data pipelines, and experiments.
- DVC builds upon Git by introducing the concept of data files – large files
that should not be stored in a Git repository, but still need to be tracked
and versioned. It leverages Git's features to enable managing different
versions of data itself, data pipelines, and experiments.

- DVC is not fundamentally bound to Git, and can work without it (except
versioning-related features). This also applies to Git-LFS and Git-annex,
Expand All @@ -27,7 +26,7 @@ bringing best practices from software engineering into the data science field
[available](/doc/command-reference/install)).

- Git-LFS was not made with data science in mind, so it doesn't provide related
features (e.g. [pipelines](/doc/user-guide/basic-concepts#data-pipeline),
features (e.g. [pipelines](/doc/command-reference/pipeline),
[metrics](/doc/command-reference/metrics), etc.).

- Github (most common Git hosting service) has a limit of 2 GB per repository.
Expand Down Expand Up @@ -116,14 +115,13 @@ _Luigi_, etc.
(DAG):

- The DAG or dependency graph is defined implicitly by the connections between
pipeline [stages](/doc/user-guide/basic-concepts#data-processing-stage),
based on their <abbr>dependencies</abbr> and <abbr>outputs</abbr>.
pipeline [stages](/doc/command-reference/run), based on their
<abbr>dependencies</abbr> and <abbr>outputs</abbr>.

- Each stage defines one node in the DAG. All DVC-files in a repository make
up a [pipelines](/doc/user-guide/basic-concepts#data-pipeline) (think a
single Makefile). All stages (and corresponding processes) are implicitly
combined through their inputs and outputs, simplifying conflict resolution
during merges.
up a [pipelines](/doc/command-reference/pipeline) (think a single Makefile).
All stages (and corresponding processes) are implicitly combined through
their inputs and outputs, simplifying conflict resolution during merges.

- DVC stages can be written manually in an intuitive `dvc.yaml` file, or
generated by the helper command `dvc run`, based on a terminal command, its
Expand Down
7 changes: 3 additions & 4 deletions content/docs/user-guide/what-is-dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@ software engineers.
interface and flow as Git. DVC can also work stand-alone, but without
versioning capabilities.

- **Data versioning** is enabled by replacing
[large files](/doc/user-guide/basic-concepts#data-files), dataset directories,
- **Data versioning** is enabled by replacing large files, dataset directories,
ML models, etc. with small
[metafiles](/doc/user-guide/dvc-files-and-directories) (easy to handle with
Git). These placeholders point to the original data, which is decoupled from
Expand All @@ -33,8 +32,8 @@ software engineers.
transfer large datasets or share a GPU-trained model with others.

- DVC makes data science projects **reproducible** by creating lightweight
[pipelines](/doc/user-guide/basic-concepts#data-pipelines) using implicit
dependency graphs,and codifying the data and artifacts involved.
[pipelines](/doc/command-reference/pipeline) using implicit dependency
graphs,and codifying the data and artifacts involved.

- DVC is **platform agnostic**: It runs on all major operating systems (Linux,
MacOS, and Windows), and works independently of the programming languages
Expand Down

0 comments on commit fde9d6b

Please sign in to comment.