Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular updates (Apr 21) #1174

Merged
merged 9 commits into from
Apr 27, 2020
2 changes: 0 additions & 2 deletions content/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ DVC supports several types of (local or) remote locations (protocols):
> include them all. The command should look like this: `pip install "dvc[s3]"`.
> (This example installs `boto3` library along with DVC to support S3 storage.)

<!-- Separate MD quote: -->

\* HDFS and HTTP **do not** support downloading entire directories, only single
files.

Expand Down
15 changes: 7 additions & 8 deletions content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,13 @@ the data source. Both HTTP and SSH protocols are supported for online repos
to an "offline" repo (if it's a DVC repo without a default remote, instead of
downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).

The `path` argument of this command is used to specify the location of the
target to be downloaded within the source repository at `url`. `path` can
specify any file or directory in the source repo, including <abbr>outputs</abbr>
tracked by DVC, as well as files tracked by Git. Note that for DVC repos, the
target should be found in one of the
[DVC-files](/doc/user-guide/dvc-file-format) of the project. The project should
also have a default [DVC remote](/doc/command-reference/remote), containing the
actual data.
The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including <abbr>outputs</abbr> tracked by DVC, as
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
well as files tracked by Git. Note that for DVC repos, the target should be
found in one of the [DVC-files](/doc/user-guide/dvc-file-format) of the project.
The project should also have a default
[DVC remote](/doc/command-reference/remote), containing the actual data.

> See `dvc get-url` to download data from other supported locations such as S3,
> SSH, HTTP, etc.
Expand Down
15 changes: 7 additions & 8 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,13 @@ the data source. Both HTTP and SSH protocols are supported for online repos
to an "offline" repo (if it's a DVC repo without a default remote, instead of
downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).

The `path` argument of this command is used to specify the location of the
target to be downloaded within the source repository at `url`. `path` can
specify any file or directory in the source repo, including <abbr>outputs</abbr>
tracked by DVC, as well as files tracked by Git. Note that for DVC repos, the
target should be found in one of the
[DVC-files](/doc/user-guide/dvc-file-format) of the project. The project should
also have a default [DVC remote](/doc/command-reference/remote), containing the
actual data.
The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including <abbr>outputs</abbr> tracked by DVC, as
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
well as files tracked by Git. Note that for DVC repos, the target should be
found in one of the [DVC-files](/doc/user-guide/dvc-file-format) of the project.
The project should also have a default
[DVC remote](/doc/command-reference/remote), containing the actual data.

> See `dvc import-url` to download and track data from other supported locations
> such as S3, SSH, HTTP, etc.
Expand Down
9 changes: 5 additions & 4 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,11 @@ advanced scenarios:
- [Initializing DVC without Git](#how-does-it-affect-dvc-commands) - support for
SCM other than Git, deployment automation cases, etc.

At DVC initialization, a new `.dvc/` directory will be created for internal
configuration and cache
[files and directories](/doc/user-guide/dvc-files-and-directories) that are
hidden from the user.
At DVC initialization, a new `.dvc/` directory is created for internal
configuration and <abbr>cache</abbr>
[files and directories](/doc/user-guide/dvc-files-and-directories), that are
hidden from the user. This directory is automatically staged with `git add`, so
it can be easily committed with Git.

### Initializing DVC in subdirectories

Expand Down
66 changes: 34 additions & 32 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,17 @@ positional arguments:
## Description

DVC, by effectively replacing data files, models, directories with DVC-files
(`.dvc`), hides actual locations and names. It means that you don't see actual
data when you view a <abbr>DVC repository</abbr> with Github/Gitlab UI (you see
`.dvc` files instead). It makes it hard to navigate the project, makes it hard
to use `dvc get`, `dvc import`, [`dvc.api`](/doc/api-reference) - they all deal
with actual path to a data file or directory.

This command prints a virtual view of a DVC repository, the way it would have
looked like if files and directories that are DVC-tracked were actually regular
Git-tracked files.

Another way to explain this - it prints the result similar to:
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
Github), you just see the DVC-files. This makes it hard to navigate the project
to find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
[`dvc.api`](/doc/api-reference).

`dvc list` prints a virtual view of a DVC repository, as if files and
directories tracked by DVC were found directly in the remote Git repo. Only the
root directory is listed by default. The output of this command is equivalent to
actually cloning the repo and [pulling](/doc/command-reference/pull) its data
like this:

```dvc
$ git clone <url> example
Expand All @@ -36,21 +36,23 @@ $ dvc pull
$ ls <path>
```

The `url` argument is a Git repository address to list. Command works for any
Git repository - either it has DVC project in it, or not. Both HTTP and SSH
protocols are supported for online repositories (e.g.
`https://github.com/iterative/example-get-started` or
`[email protected]:iterative/example-get-started.git`). `url` can also be a local
file system path to a valid Git repository.
The `url` argument specifies the address of the Git repository containing the
data source. Both HTTP and SSH protocols are supported for online repos (e.g.
`[user@]server:project.git`). `url` can also be a local file system path to an
"offline" Git repo.

The `path` argument of this command is used to specify a path within the source
repository at `url`. It's similar to providing a path to list to the commands
like `ls` or `aws s3 ls`. And similar to the, `-R` option might be used to list
files recursively.
The optional `path` argument is used to specify directory to list within the
source repository at `url`. It's similar to providing a path to list to commands
such as `ls` or `aws s3 ls`. And similar to the, `-R` option might be used to
list files recursively.

Please note that `dvc list` doesn't check whether the listed data (tracked by
DVC) actually exists in remote storage, so it's not guaranteed whether it can be
accessed with `dvc get`, `dvc import`, or [`dvc.api`](/doc/api-reference)

## Options

- `-R`, `--recursive` - recursively prints the repository contents.
- `-R`, `--recursive` - recursively prints contents of all subdirectories.

- `--outs-only` - show only DVC-tracked files and directories
(<abbr>outputs</abbr>).
Expand All @@ -68,10 +70,12 @@ files recursively.
- `-v`, `--verbose` - displays detailed tracing information. when this option is
not specified.

## Example: List files and directories in a DVC repository
## Example: Find files to download from a repository
shcheklein marked this conversation as resolved.
Show resolved Hide resolved

We can use the command for getting information about remote repository with all
files, directories and <abbr>data artifacts</abbr>, including DVC-tracked ones:
We can use this command for getting information about a repository before using
other commands like `dvc get` or `dvc import` to reuse any file or directory
found in it. This includes files tracked by Git as well as <abbr>data
artifacts</abbr> tracked by DVC-tracked:

```dvc
$ dvc list https://github.com/iterative/example-get-started
Expand All @@ -88,20 +92,18 @@ train.dvc
```

If you open the
[example-get-started project's page](https://github.com/iterative/example-get-started),
you will see a similar list, except that `model.pkl` will be missing. That's
because its tracked by DVC and not visible to Git. You can find it specified as
an output if you open
[example-get-started](https://github.com/iterative/example-get-started)
project's page, you will see a similar list, except that `model.pkl` will be
missing. That's because its tracked by DVC and not visible to Git. You can find
it specified as an output if you open
[`train.dvc`](https://github.com/iterative/example-get-started/blob/master/train.dvc).

We can now, for example, run
We can now, for example, download the model file with:

```dvc
$ dvc get https://github.com/iterative/example-get-started model.pkl
```

to download the model file (see `dvc get`).

## Example: List all files and directories in a data registry

Let's imagine a DVC repo used as a
Expand Down
24 changes: 12 additions & 12 deletions content/docs/command-reference/lock.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
Lock a [DVC-file](/doc/user-guide/dvc-file-format)
([stage](/doc/command-reference/run)). Use `dvc unlock` to unlock the file.

If a DVC-file is locked, the stage is considered unchanged. `dvc repro` will not
execute commands to regenerate outputs of locked stages, even if some
dependencies have changed and even if `--force` is provided.

## Synopsis

```usage
Expand All @@ -19,14 +15,18 @@ positional arguments:
## Description

`dvc lock` causes any DVC-file to be considered _not changed_ by `dvc status`
and `dvc repro`.

Locking a stage is useful to avoid syncing data from the top of its pipeline,
and keep iterating on the last (unlocked) stages only.

Note that <abbr>import stages</abbr> are considered always locked. They can not
be unlocked. Use `dvc update` on them to update the file, directory, or
<abbr>data artifact</abbr> from its external data source.
and `dvc repro`. Stage reproduction will not execute regenerate
<abbr>outputs</abbr> of locked stages, even if some dependencies have changed,
and even if `--force` is provided.

Locking a stage is useful to avoid syncing data from the top of its
[pipeline](/doc/command-reference/pipeline), and keep iterating on the last
(unlocked) stages only.

Note that <abbr>import stages</abbr> are considered always locked. Use
`dvc update` to update the corresponding <abbr>data artifacts</abbr> from the
external data source. [Unlock](/doc/command-reference/unlock) them before using
`dvc repro` on a pipeline that needs their outputs.

## Options

Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/root.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ usage: dvc root [-h] [-q | -v]

## Description

While in sub-directories of the project, sometimes developers may want to refer
some file belonging to another directory. This command returns the path to the
root directory of the current <abbr>DVC project</abbr>, relative to the current
working directory. This command can be used to build a path to a dependency
file, command, or output.
This command returns the path to the root directory of the <abbr>DVC
project</abbr>, relative to the current working directory. It can be used to
build a path to a dependency file, command, or output. Useful when working in a
subdirectory of the project, and needing to refer to a file in another
directory.

## Options

Expand Down
2 changes: 0 additions & 2 deletions content/docs/command-reference/version.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ usage: dvc version [-h] [-q | -v]
> If `dvc version` is executed outside a DVC project, no `Cache` is output and
> the `Filesystem type` output is of the current working directory.

<!-- Separate MD quote: -->

> Note that if you've installed DVC using `pip`, you will need to install
> `psutil` manually with `pip install psutil` in order for `dvc version` to
> report file system information. Please see the original
Expand Down
2 changes: 0 additions & 2 deletions content/docs/install/macos.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,6 @@ from the [release page](https://github.com/iterative/dvc/releases/) on GitHub.
> [secondary click](https://support.apple.com/en-us/HT207700) on it, then select
> "Open With" > **Installer.app**, and choose the **Open** button.

<!-- Separate MD quote: -->

> You may try [these instructions](https://stackoverflow.com/a/42120328/761963)
> to uninstall the MacOS package.

Expand Down
2 changes: 0 additions & 2 deletions content/docs/install/windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
> [Running DVC on Windows](/doc/user-guide/running-dvc-on-windows) for important
> tips to improve your experience using DVC on Windows.

<!-- Separate MD quote: -->

> To use DVC [as a Python library](/doc/api-reference), please
> [install with pip](#install-with-pip) or [with conda](#install-with-conda).

Expand Down
15 changes: 7 additions & 8 deletions content/docs/tutorials/deep/preparation.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,18 @@ $ pip install -r code/requirements.txt

## Initialize

DVC works on top of Git repositories. You run DVC initialization in a repository
directory to create DVC meta files and directories.

At DVC initialization, a new `.dvc/` directory will be created for internal
configuration and cache
[files and directories](/doc/user-guide/dvc-files-and-directories) that are
hidden from the user. We describe some DVC internals below for a better
understanding of how it works.
DVC works best inside Git repositories like the one we're in. Initialize DVC
with:

```dvc
$ dvc init
...

At DVC initialization, a new `.dvc/` directory is created for internal
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
configuration and <abbr>cache</abbr>
[files and directories](/doc/user-guide/dvc-files-and-directories), that are
hidden from the user. This directory is automatically staged with `git add`, so it can be easily committed with Git:

$ ls -a .dvc
. .. .gitignore config tmp

Expand Down
9 changes: 5 additions & 4 deletions content/docs/tutorials/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,11 @@ When we run `dvc add` `Posts.xml.zip`, DVC creates a

### Expand to learn about DVC internals

At DVC initialization, a new `.dvc/` directory will be created for internal
configuration and cache
[files and directories](/doc/user-guide/dvc-files-and-directories) that are
hidden from the user.
At DVC initialization, a new `.dvc/` directory is created for internal
configuration and <abbr>cache</abbr>
[files and directories](/doc/user-guide/dvc-files-and-directories), that are
hidden from the user. This directory is automatically staged with `git add`, so
it can be easily committed with Git.

Note that the DVC-file created by `dvc add` has no dependencies, a.k.a. an
_orphan_ [stage file](/doc/command-reference/run):
Expand Down
9 changes: 5 additions & 4 deletions content/docs/use-cases/versioning-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,11 @@ initialize the <abbr>DVC project</abbr> on top of the existing repository:
$ dvc init
```

At DVC initialization, a new `.dvc/` directory will be created for internal
configuration and cache
[files and directories](/doc/user-guide/dvc-files-and-directories) that are
hidden from the user. These can safely be tracked with Git:
At DVC initialization, a new `.dvc/` directory is created for internal
configuration and <abbr>cache</abbr>
[files and directories](/doc/user-guide/dvc-files-and-directories), that are
hidden from the user. This directory is automatically staged with `git add`, so
it can be easily committed with Git:

```dvc
$ git status
Expand Down