diff --git a/content/docs/command-reference/add.md b/content/docs/command-reference/add.md index 4b9c1d52d9..d59ded757c 100644 --- a/content/docs/command-reference/add.md +++ b/content/docs/command-reference/add.md @@ -146,7 +146,7 @@ not. Shell style wildcards supported: `*`, `?`, `[seq]`, `[!seq]`, and `**` - `--external` - allow `targets` that are outside of the DVC repository. See - [External Outputs](/doc/user-guide/external-outputs). + [Managing External Data](/doc/user-guide/managing-external-data). > ⚠️ Note that this is an advanced feature for very specific situations and > not recommended except if there's absolutely no other alternative. diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md index b3a871d01e..b78d37e76c 100644 --- a/content/docs/command-reference/run.md +++ b/content/docs/command-reference/run.md @@ -101,9 +101,8 @@ Relevant notes: for more info.) - [external dependencies](/doc/user-guide/external-dependencies) and - [external outputs](/doc/user-guide/external-outputs) (outside of the - workspace) are also supported (except metrics and plots), - although not usually recommended. + [external outputs](/doc/user-guide/managing-external-data) (outside of the + workspace) are also supported (except metrics and plots). - Outputs are deleted from the workspace before executing the command (including at `dvc repro`) if their paths are found as existing files/directories (unless @@ -264,8 +263,7 @@ $ dvc run -n second_stage './another_script.sh $MYENVVAR' > considered "always changed", so this option has no effect in those cases. - `--external` - allow writing outputs outside of the DVC repository. See - [External Outputs](/doc/user-guide/external-outputs) — not usually - recommended. + [Managing External Data](/doc/user-guide/managing-external-data). - `--desc ` - user description of the stage (optional). This doesn't affect any DVC operations. diff --git a/content/docs/command-reference/version.md b/content/docs/command-reference/version.md index b8673569a4..4a1dae6f2c 100644 --- a/content/docs/command-reference/version.md +++ b/content/docs/command-reference/version.md @@ -19,7 +19,7 @@ usage: dvc version [-h] [-q | -v] | `Supports` | Types of [remote storage](/doc/command-reference/remote/add#supported-storage-types) supported by the current DVC setup (their required dependencies are installed) | | `Cache types` | [Types of links](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) supported (between workspace and cache) | | `Cache directory` | Filesystem type (e.g. ext4, FAT, etc.) and drive on which the cache directory is mounted | -| `Caches` | Cache [location types](/doc/user-guide/external-outputs) configured in the repo (e.g. local, SSH, S3, etc.) | +| `Caches` | Cache [location types](/doc/user-guide/managing-external-data) configured in the repo (e.g. local, SSH, S3, etc.) | | `Remotes` | Remote [location types](/doc/command-reference/remote/add#supported-storage-types) configured in the repo (e.g. SSH, S3, Google Drive, etc.) | | `Workspace directory` | Filesystem type (e.g. ext4, FAT, etc.) and drive on which the workspace is mounted | | `Repo` | Shows whether we are in a DVC repo and/or Git repo | diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 0b0f8d7332..2bd1f2fb3e 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -136,8 +136,8 @@ "large-dataset-optimization", "external-dependencies", { - "label": "External Outputs", - "slug": "external-outputs" + "label": "Managing External Data", + "slug": "managing-external-data" }, { "label": "Contributing", diff --git a/content/docs/start/data-versioning.md b/content/docs/start/data-versioning.md index 4b1718af0e..49f8801246 100644 --- a/content/docs/start/data-versioning.md +++ b/content/docs/start/data-versioning.md @@ -256,10 +256,10 @@ volume? While these cases are not covered in the Get Started, we recommend reading the following sections next to learn more about advanced workflows: -- A [shared external cache](/doc/use-cases/shared-development-server) can be set +- A shared [external cache](/doc/use-cases/shared-development-server) can be set up to store, version and access a lot of data on a large shared volume efficiently. - A quite advanced scenario is to track and version data directly on the remote storage (e.g. S3). Check out - [External Outputs](https://dvc.org/doc/user-guide/external-outputs) to learn - more. + [Managing External Data](https://dvc.org/doc/user-guide/managing-external-data) + to learn more. diff --git a/content/docs/user-guide/external-dependencies.md b/content/docs/user-guide/external-dependencies.md index 8260dd6bac..87b0645c54 100644 --- a/content/docs/user-guide/external-dependencies.md +++ b/content/docs/user-guide/external-dependencies.md @@ -6,8 +6,9 @@ For example data on a network attached storage (NAS), processing data on HDFS, running [Dask](https://dask.org/) via SSH, or for a script that streams data from S3 to process it. -External dependencies (and [external outputs](/doc/user-guide/external-outputs)) -provide ways to track (and version) data outside of the project. +External dependencies and +[external outputs](/doc/user-guide/managing-external-data) provide ways to track +and version data outside of the project. ## How external dependencies work diff --git a/content/docs/user-guide/external-outputs.md b/content/docs/user-guide/managing-external-data.md similarity index 93% rename from content/docs/user-guide/external-outputs.md rename to content/docs/user-guide/managing-external-data.md index e8a62b2c9e..b7779ea02a 100644 --- a/content/docs/user-guide/external-outputs.md +++ b/content/docs/user-guide/managing-external-data.md @@ -2,9 +2,12 @@ > ⚠️ This is an advanced feature for very specific situations and not > recommended except if there's absolutely no other alternative. In most cases -> alternatives like the `--to-cache` or `--to-remote` options of `dvc add` and -> `dvc import-url` are more convenient. **Note** that external outputs are not -> pushed or pulled from/to [remote storage](/doc/command-reference/remote). +> alternatives like the +> [to-cache](/doc/command-reference/add#example-transfer-to-the-cache) or +> [to-remote](/doc/command-reference/add#example-transfer-to-remote-storage) +> strategies of `dvc add` and `dvc import-url` are more convenient. **Note** +> that external outputs are not pushed or pulled from/to +> [remote storage](/doc/command-reference/remote). There are cases when data is so large, or its processing is organized in such a way, that its impossible to handle it in the local machine disk. For example diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index ce488f5375..fd60691f8b 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -5,8 +5,8 @@ You can use `dvc add` to track data files or directories located in your current you bring data from external locations to your project, and start tracking it locally. See [Data Versioning](/doc/start/data-versioning) for more info. -> \* Certain [external locations](/doc/user-guide/external-outputs) are also -> supported. +> \* Certain [external locations](/doc/user-guide/managing-external-data) are +> also supported. Files ending with the `.dvc` extension ("dot DVC file") are created by these commands as data placeholders that can be versioned with Git. They contain the @@ -54,16 +54,16 @@ Comments can be entered using the `# comment` format. The following subfields may be present under `outs` entries: -| Field | Description | -| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `path` | (Required) Path to the file or directory (relative to `wdir`, which defaults to the file's location) | -| `md5`
`etag`
`checksum` | Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) for HTTP, S3, or Azure [external outputs](/doc/user-guide/external-outputs); and a special _checksum_ for HDFS and WebHDFS. | -| `size` | Size of the file or directory (sum of all files). | -| `nfiles` | If this output is a directory, the number of files inside (recursive). | -| `isexec` | Whether this is an executable file. DVC preserves execute permissions upon `dvc checkout` and `dvc pull`. This has no effect on directories, or in general on Windows. | -| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | -| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts) | -| `desc` | (Optional) user description for this output (supported in metrics and plots too). This doesn't affect any DVC operations. | +| Field | Description | +| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `path` | (Required) Path to the file or directory (relative to `wdir`, which defaults to the file's location) | +| `md5`
`etag`
`checksum` | Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) for HTTP, S3, or Azure [external outputs](/doc/user-guide/managing-external-data); and a special _checksum_ for HDFS and WebHDFS. | +| `size` | Size of the file or directory (sum of all files). | +| `nfiles` | If this output is a directory, the number of files inside (recursive). | +| `isexec` | Whether this is an executable file. DVC preserves execute permissions upon `dvc checkout` and `dvc pull`. This has no effect on directories, or in general on Windows. | +| `cache` | Whether or not this file or directory is cached (`true` by default). See the `--no-commit` option of `dvc add`. | +| `persist` | Whether the output file/dir should remain in place while `dvc repro` runs (`false` by default: outputs are deleted when `dvc repro` starts) | +| `desc` | (Optional) user description for this output (supported in metrics and plots too). This doesn't affect any DVC operations. | ## Dependency entries diff --git a/redirects-list.json b/redirects-list.json index 796c2b71a9..52a08710d8 100644 --- a/redirects-list.json +++ b/redirects-list.json @@ -37,7 +37,6 @@ "^/doc/user-guide/dvc-files/.dvc$ /doc/user-guide/project-structure/dvc-files", "^/doc/user-guide/dvc-internals(/.*)?$ /doc/user-guide/project-structure/internal-files$1", "^/doc/user-guide/dvcignore$ /doc/user-guide/project-structure/dvcignore-files", - "^/doc/user-guide/managing-external-data$ /doc/user-guide/user-guide/external-outputs", "^/doc/understanding-dvc(/.*)?$ /doc/user-guide/what-is-dvc", "^/doc/commands-reference(/.*)?$ /doc/command-reference$1", "^/doc/command-reference/plot$ /doc/command-reference/plots",