From a87198b07ed2980162a5175f990270d21c83b672 Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Wed, 19 Oct 2022 17:59:22 +0900 Subject: [PATCH 1/7] ref: document import-url --version-aware --- content/docs/command-reference/import-url.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 29c82d859b..ca2f730c47 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -179,6 +179,12 @@ produces a regular stage in `dvc.yaml`. - `-h`, `--help` - prints the usage/help message, and exit. +- `--version-aware` - capture cloud versioning information when importing the + file. By default, DVC will automatically capture cloud versioning information + if the URL contains a cloud versioning ID. When `--version-aware` is provided + along with a URL that does not contain a cloud versioning ID, DVC will capture + the latest version of the file. + - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1. From b1a234900794489de88a9e95c513f6fa19723ecd Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Tue, 29 Nov 2022 19:24:20 +0900 Subject: [PATCH 2/7] ref: document `push` field for dvc files --- content/docs/user-guide/project-structure/dvc-files.md | 1 + content/docs/user-guide/project-structure/dvcyaml-files.md | 1 + 2 files changed, 2 insertions(+) diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index c67da0009c..95f4c1f153 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -76,6 +76,7 @@ The following subfields may be present under `outs` entries: | `type` | User-assigned type of the data. | | `labels` | User-assigned labels to add to the data. | | `meta` | Custom metadata about the data. | +| `push` | (Optional) Whether the output file/dir should be pushed to the remote on `dvc push` (`true` by default: outputs are pushed to remotes). `push` only applies to cached outputs. | ## Dependency entries diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index aae0ce1a30..fe7c8f2e8b 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -265,6 +265,7 @@ These include a subset of the fields in `.dvc` file | `persist` | Whether the output file/dir should remain in place during `dvc repro` (`false` by default: outputs are deleted when `dvc repro` starts) | | `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | | `desc` | (Optional) User description for this output. This doesn't affect any DVC operations. | +| `push` | (Optional) Whether the output file/dir should be pushed to the remote on `dvc push` (`true` by default: outputs are pushed to remotes). `push` only applies to cached outputs. | From fccda92af9b288ca5d68d10f9e08522b2405fdd7 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Sat, 3 Dec 2022 17:20:50 -0600 Subject: [PATCH 3/7] guide: `push` metafile field text update --- content/docs/user-guide/project-structure/dvc-files.md | 2 +- content/docs/user-guide/project-structure/dvcyaml-files.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index 95f4c1f153..958abccf64 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -76,7 +76,7 @@ The following subfields may be present under `outs` entries: | `type` | User-assigned type of the data. | | `labels` | User-assigned labels to add to the data. | | `meta` | Custom metadata about the data. | -| `push` | (Optional) Whether the output file/dir should be pushed to the remote on `dvc push` (`true` by default: outputs are pushed to remotes). `push` only applies to cached outputs. | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | ## Dependency entries diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index fe7c8f2e8b..fa5ec32000 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -265,7 +265,7 @@ These include a subset of the fields in `.dvc` file | `persist` | Whether the output file/dir should remain in place during `dvc repro` (`false` by default: outputs are deleted when `dvc repro` starts) | | `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | | `desc` | (Optional) User description for this output. This doesn't affect any DVC operations. | -| `push` | (Optional) Whether the output file/dir should be pushed to the remote on `dvc push` (`true` by default: outputs are pushed to remotes). `push` only applies to cached outputs. | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | From 60df3839c1ecc128122a3e48862f625688db9efc Mon Sep 17 00:00:00 2001 From: "restyled-io[bot]" <32688539+restyled-io[bot]@users.noreply.github.com> Date: Sat, 3 Dec 2022 17:21:07 -0600 Subject: [PATCH 4/7] Restyled by prettier (#4158) Co-authored-by: Restyled.io --- content/docs/user-guide/project-structure/dvc-files.md | 2 +- content/docs/user-guide/project-structure/dvcyaml-files.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index 958abccf64..1beba887ff 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -76,7 +76,7 @@ The following subfields may be present under `outs` entries: | `type` | User-assigned type of the data. | | `labels` | User-assigned labels to add to the data. | | `meta` | Custom metadata about the data. | -| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | ## Dependency entries diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index fa5ec32000..5a311b12cc 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -265,7 +265,7 @@ These include a subset of the fields in `.dvc` file | `persist` | Whether the output file/dir should remain in place during `dvc repro` (`false` by default: outputs are deleted when `dvc repro` starts) | | `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. | | `desc` | (Optional) User description for this output. This doesn't affect any DVC operations. | -| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | +| `push` | Whether or not this file or directory, when previously cached, is uploaded to remote storage by `dvc push` (`true` by default). | From e3255f25c08c8749611996c0c988e7b78b728930 Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Tue, 6 Dec 2022 15:40:08 +0900 Subject: [PATCH 5/7] add information on supported storages --- content/docs/command-reference/import-url.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index ca2f730c47..e005be209f 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -108,6 +108,20 @@ DVC supports several types of external locations (protocols): [ETag](https://en.wikipedia.org/wiki/HTTP_ETag#Strong_and_weak_validation) is necessary to track if the specified URL changed. +DVC also supports capturing cloud versioning information when importing data +from certain cloud storage providers. When the `--version-aware` option is +provided or when the `url` argument includes a supported cloud versioning ID, +DVC will import the specified version of the given data. When using versioned +storage, DVC will always [pull](/doc/command-reference/pull) the versioned data +from its original source location. Versioned data will also not be +[pushed](/doc/command-reference/push) to remote storage. + +| Type | Description | Versioned `url` format example | +| ------- | ---------------------------- | ------------------------------------------------------ | +| `s3` | Amazon S3 | `s3://bucket/data?versionId=L4kqtJlcpXroDTDmpUMLUo` | +| `azure` | Microsoft Azure Blob Storage | `azure://container/data?versionid=YYYY-MM-DDThh:mm:ss` | +| `gs` | Google Cloud Storage | `gs://bucket/data#1360887697105000` | + Another way to understand the `dvc import-url` command is as a shortcut for generating a pipeline [stage](/doc/command-reference/run) with an external dependency. From d5a4e9abfdff42d57507bbdd05129143ee105bcc Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Tue, 6 Dec 2022 19:32:59 +0900 Subject: [PATCH 6/7] ref: document update for versioned import-url --- content/docs/command-reference/update.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index 2da3053381..3b8a69a0d6 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -48,6 +48,10 @@ $ dvc update --rev master > Note that this changes the `rev` field in the import stage, fixing it to the > revision. + For stages created with `dvc import-url` and a cloud-versioned URL, `--rev` + can be used to specify a object version ID to use. By default, the import will + be updated to the latest version from cloud storage. + - `-R`, `--recursive` - determines the files to update by searching each target directory and its subdirectories for import `.dvc` files to inspect. If there are no directories among the targets, this option has no effect. From 2d3e24ada140e8749d173df2b8ce6c3f7f6a740e Mon Sep 17 00:00:00 2001 From: Peter Rowlands Date: Fri, 9 Dec 2022 13:41:23 +0900 Subject: [PATCH 7/7] add link to import-url --version-aware --- content/docs/command-reference/update.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/content/docs/command-reference/update.md b/content/docs/command-reference/update.md index 3b8a69a0d6..212690933d 100644 --- a/content/docs/command-reference/update.md +++ b/content/docs/command-reference/update.md @@ -48,9 +48,10 @@ $ dvc update --rev master > Note that this changes the `rev` field in the import stage, fixing it to the > revision. - For stages created with `dvc import-url` and a cloud-versioned URL, `--rev` - can be used to specify a object version ID to use. By default, the import will - be updated to the latest version from cloud storage. + For stages created with `dvc import-url` and a + [cloud-versioned URL](/doc/command-reference/import-url#--version-aware), + `--rev` can be used to specify a object version ID to use. By default, the + import will be updated to the latest version from cloud storage. - `-R`, `--recursive` - determines the files to update by searching each target directory and its subdirectories for import `.dvc` files to inspect. If there