diff --git a/content/docs/api-reference/get_url.md b/content/docs/api-reference/get_url.md index 2248b0a71c..ed6f80f4d5 100644 --- a/content/docs/api-reference/get_url.md +++ b/content/docs/api-reference/get_url.md @@ -31,22 +31,26 @@ specified by its `path` in a `repo` (DVC project), is stored. The URL is formed by reading the project's [remote configuration](/doc/command-reference/config#remote) and the `dvc.yaml` or `.dvc` file where the given `path` is found (`outs` field). The schema of the -URL returned depends on the -[type](/doc/command-reference/remote/add#supported-storage-types) of the -`remote` used (see the [Parameters](#parameters) section). +URL returned depends on the [type][storage-types] of the `remote` used (see the +[Parameters](#parameters) section). If the target is a directory, the returned URL will end in `.dir`. Refer to -[Structure of cache directory](/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory) -and `dvc add` to learn more about how DVC handles data directories. +[Structure of cache directory] and `dvc add` to learn more about how DVC handles +data directories. ⚠️ This function does not check for the actual existence of the file or directory in the remote storage. 💡 Having the resource's URL, it should be possible to download it directly with -an appropriate library, such as -[`boto3`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.download_fileobj) -or -[`paramiko`](https://docs.paramiko.org/en/stable/api/sftp.html#paramiko.sftp_client.SFTPClient.get). +an appropriate library, such as [`boto3`] or [`paramiko`]. + +[storage-types]: /doc/command-reference/remote/add#supported-storage-types +[structure of cache directory]: + /doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory +[`boto3`]: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.download_fileobj +[`paramiko`]: + https://docs.paramiko.org/en/stable/api/sftp.html#paramiko.sftp_client.SFTPClient.get ## Parameters @@ -88,9 +92,8 @@ The script above prints `https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355` This URL represents the location where the data is stored, and is built by -reading the corresponding `.dvc` file -([`get-started/data.xml.dvc`](https://github.com/iterative/dataset-registry/blob/master/get-started/data.xml.dvc)) -where the `md5` file hash is stored, +reading the corresponding `.dvc` file ([`get-started/data.xml.dvc`]) where the +`md5` file hash is stored, ```yaml outs: @@ -98,11 +101,14 @@ outs: path: get-started/data.xml ``` -and the project configuration -([`.dvc/config`](https://github.com/iterative/dataset-registry/blob/master/.dvc/config)) -where the remote URL is saved: +and the project configuration ([`.dvc/config`]) where the remote URL is saved: ```ini ['remote "storage"'] url = https://remote.dvc.org/dataset-registry ``` + +[`.dvc/config`]: + https://github.com/iterative/dataset-registry/blob/master/.dvc/config +[`get-started/data.xml.dvc`]: + https://github.com/iterative/dataset-registry/blob/master/get-started/data.xml.dvc diff --git a/content/docs/command-reference/destroy.md b/content/docs/command-reference/destroy.md index 22a45f6c3a..a92c6c6343 100644 --- a/content/docs/command-reference/destroy.md +++ b/content/docs/command-reference/destroy.md @@ -23,6 +23,9 @@ set to an in your project, DVC will replace them with the latest versions of the actual files and directories first, so that your data is intact after destruction. +[external cache]: + /doc/use-cases/shared-development-server#configure-the-external-shared-cache + > Refer to [Project Structure](/doc/user-guide/project-structure) for more > details on the directories and files deleted by this command. diff --git a/content/docs/user-guide/managing-external-data.md b/content/docs/user-guide/managing-external-data.md index 7971e60b32..f221819e0b 100644 --- a/content/docs/user-guide/managing-external-data.md +++ b/content/docs/user-guide/managing-external-data.md @@ -2,12 +2,13 @@ > ⚠️ This is an advanced feature for very specific situations and not > recommended except if there's absolutely no other alternative. In most cases -> alternatives like the -> [to-cache](/doc/command-reference/add#example-transfer-to-the-cache) or -> [to-remote](/doc/command-reference/add#example-transfer-to-remote-storage) -> strategies of `dvc add` and `dvc import-url` are more convenient. **Note** -> that external outputs are not pushed or pulled from/to -> [remote storage](/doc/command-reference/remote). +> alternatives like the [to-cache] or [to-remote] strategies of `dvc add` and +> `dvc import-url` are more convenient. **Note** that external outputs are not +> pushed or pulled from/to [remote storage]. + +[to-cache]: /doc/command-reference/add#example-transfer-to-the-cache +[to-remote]: /doc/command-reference/add#example-transfer-to-remote-storage +[remote storage]: /doc/command-reference/remote There are cases when data is so large, or its processing is organized in such a way, that its impossible to handle it in the local machine disk. For example @@ -39,16 +40,17 @@ their remote URLs or external paths to `dvc add`, or put them in `dvc.yaml` > external cache, because it may cause data collisions: the hash of an external > output could collide with that of a local file with different content. -> Note that [remote storage](/doc/command-reference/remote) is a different -> feature. +> Note that [remote storage] is a different feature. ## Setting up an external cache DVC requires that the project's cache is configured in the same external location as the data that will be tracked (external outputs). This -avoids transferring files to the local environment and enables -[file linking](/doc/user-guide/large-dataset-optimization) within the external -storage. +avoids transferring files to the local environment and enables [file links] +within the external storage. + +[file links]: + /doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache As an example, let's create a directory external to the workspace and set it up as cache: @@ -183,9 +185,8 @@ custom cache location for local paths outside of your project. > Except for external data on different storage devices or partitions mounted on > the same file system (e.g. `/mnt/raid/data`). In that case please setup an -> external cache in that same drive to enable -> [file links](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache) -> and avoid copying data. +> external cache in that same drive to enable [file links] and avoid copying +> data. ```dvc $ dvc add --external /home/shared/existing-data