diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 29c82d859b..e005be209f 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -108,6 +108,20 @@ DVC supports several types of external locations (protocols): [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) is necessary to track if the specified URL changed. +DVC also supports capturing cloud versioning information when importing data +from certain cloud storage providers. When the `--version-aware` option is +provided or when the `url` argument includes a supported cloud versioning ID, +DVC will import the specified version of the given data. When using versioned +storage, DVC will always [pull](/doc/command-reference/pull) the versioned data +from its original source location. Versioned data will also not be +[pushed](/doc/command-reference/push) to remote storage. + +| Type | Description | Versioned `url` format example | +| ------- | ---------------------------- | ------------------------------------------------------ | +| `s3` | Amazon S3 | `s3://bucket/data?versionId=L4kqtJlcpXroDTDmpUMLUo` | +| `azure` | Microsoft Azure Blob Storage | `azure://container/data?versionid=YYYY-MM-DDThh:mm:ss` | +| `gs` | Google Cloud Storage | `gs://bucket/data#1360887697105000` | + Another way to understand the `dvc import-url` command is as a shortcut for generating a pipeline [stage](/doc/command-reference/run) with an external dependency. @@ -179,6 +193,12 @@ produces a regular stage in `dvc.yaml`. - `-h`, `--help` - prints the usage/help message, and exit. +- `--version-aware` - capture cloud versioning information when importing the + file. By default, DVC will automatically capture cloud versioning information + if the URL contains a cloud versioning ID. When `--version-aware` is provided + along with a URL that does not contain a cloud versioning ID, DVC will capture + the latest version of the file. + - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1. diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index 2da3053381..212690933d 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -48,6 +48,11 @@ $ dvc update --rev master > Note that this changes the `rev` field in the import stage, fixing it to the > revision. + For stages created with `dvc import-url` and a + [cloud-versioned URL](/doc/command-reference/import-url#--version-aware), + `--rev` can be used to specify a object version ID to use. By default, the + import will be updated to the latest version from cloud storage. + - `-R`, `--recursive` - determines the files to update by searching each target directory and its subdirectories for import `.dvc` files to inspect. If there are no directories among the targets, this option has no effect. diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index c67da0009c..1beba887ff 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -76,6 +76,7 @@ The following subfields may be present under `outs` entries: | `type` | User-assigned type of the data. | | `labels` | User-assigned labels to add to the data. | | `meta` | Custom metadata about the data. | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | ## Dependency entries diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index aae0ce1a30..5a311b12cc 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -265,6 +265,7 @@ These include a subset of the fields in `.dvc` file | `persist` | Whether the output file/dir should remain in place during `dvc repro` (`false` by default: outputs are deleted when `dvc repro` starts) | | `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | | `desc` | (Optional) User description for this output. This doesn't affect any DVC operations. | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). |