Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the output for dvc version command in docs #1636

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 20 additions & 19 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,26 @@ positional arguments:

## Description

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example console commands that take an
input and produce an <abbr>output</abbr>). A pipeline may produce intermediate
data, and has a final result.

Data processing or ML pipelines typically start a with large raw datasets,
include intermediate featurization and training stages, and produce a final
model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.

`dvc dag` command displays the stages of a pipeline up to the target stage. If
`target` is omitted, it will show the full project DAG.
A Data pipeline refers to a series of [stages](/doc/command-reference/run)
through which our data moves. Each stage of a pipeline takes some input and
produces some output. This output is then passed onto the next stage of a
pipeline. This process continues until we reach the final stage which produces
the final results. A pipeline works the same way as a compiler works, it takes
some data as an input and produces an output.

You can create multiple pipelines and each pipeline would be considered as an
experiment. After completing one experiment, you can commit the changes and add
a tag to your experiment. A tag is a name that you give to your experiment.

Using DVC, you can create a metafile `data.dvc` which allows us to reproduce
each stage of a pipeline using `dvc repro`. At the end of every pipeline, you
can save your output in a metrics file using `dvc metrics` command. This file
will help you in comparing the results of every experiment.

DVC provides a `dvc dag` command which creates a direct acyclic graph
([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) that gives a
pictorial view of a pipeline. It also tells you in which stage of a pipeline you
are currently in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated changes ^^ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. @sahilbhosale63 you probably created this branch on top of a branch for another PR. Please either rebase or cherry-pick the version-specific commits into master, or just un-do the changes to this file (dag) e.g. with git checkout master -- content/docs/command-reference/dag.md (+add+commit).

## Options

Expand Down
28 changes: 13 additions & 15 deletions content/docs/command-reference/version.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,27 +110,25 @@ Inside a DVC project:

```dvc
$ dvc version

DVC version: 0.41.3+f36162
Python version: 3.7.1
Platform: Linux-4.15.0-50-generic-x86_64-with-debian-buster-sid
Binary: False
Cache: reflink - False, hardlink - True, symlink - True
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (cache directory): ('ext4', '/dev/sdb3')
DVC version: 1.0.1+753efe
---------------------------------

Platform: Python 3.8.2 on Linux-5.7.8-x86_64-with-glibc2.29
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda7
Workspace directory: ext4 on /dev/sda7
Repo: dvc, git
Filesystem type (workspace): ('ext4', '/dev/sdb3')
```

Outside a DVC project:

```dvc
$ dvc version
DVC version: 1.0.1+753efe
---------------------------------

DVC version: 0.41.3+f36162
Python version: 3.7.1
Platform: Linux-4.15.0-50-generic-x86_64-with-debian-buster-sid
Binary: False
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('ext4', '/dev/sdb3')
Platform: Python 3.8.2 on Linux-5.7.8-x86_64-with-glibc2.29
Supports: All remotes
Workspace directory: ext4 on /dev/sda7
```