Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: clarifications around add and import-url for external targets #3210

Merged
merged 22 commits into from
Feb 2, 2022

Conversation

jorgeorpinel
Copy link
Contributor

Related to the last task in #2121

@jorgeorpinel jorgeorpinel changed the title ref: clarifications around external-data-related operations ref: clarifications around external-data-related add and import-url Jan 24, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 24, 2022 19:55 Inactive
@jorgeorpinel jorgeorpinel changed the title ref: clarifications around external-data-related add and import-url ref: clarifications around add and import-url for external targets Jan 24, 2022
@jorgeorpinel jorgeorpinel marked this pull request as ready for review January 24, 2022 20:07
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 24, 2022 20:07 Inactive
Comment on lines 384 to 388
## Example: Transfer to remote storage

When you have a large dataset in an external location, you may want to track it
as if it was in your project, but without downloading it locally (for now). The
`--to-remote` option lets you do so, while storing a copy
[remotely](/doc/command-reference/remote) so it can be
[pulled](/doc/command-reference/plots) later.
in your <abbr>project</abbr> without downloading it (yet), for example if
there's not enough space in your current environment.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is almost the same as the one in the import-url ref. Since the latter has the added benefit of keeping the connection to the data source (for later updates) and to remove the docs maintenance surface, should we remove the example from this ref? I can still link to the example in import-url from the option description here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is almost the same as the one in the import-url ... should we remove the example from this ref?

Makes sense to me!

UPDATE: After the latest round of updates I'm no longer 100% sure of this since for add you need to combine --out with --to-remote. So even when the situation is very similar, the actual recipe has that key difference (there's no -o in import-url). People who never use imports may prefer to stick to add and thus need to figure out that detail.

Keeping for now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Just to clarify, you can do dvc add --to-remote without --out:

$ dvc add --to-remote ../test.sh
100% Adding...|█████████████████████████████████████████████████|1/1 [00:00, 49.91file/s]

This comment was marked as resolved.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Jan 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even when the situation is very similar, the actual recipe has that key difference (there's no -o in import-url)

I guess this wasn't that relevant after all since neither example uses -o now. The add one just notes that -o . is implied but that's not critical (already noted in the --to-remote option desc.)... We could remove it after all ⌛

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 24, 2022 20:11 Inactive
@jorgeorpinel jorgeorpinel self-assigned this Jan 24, 2022
Comment on lines 392 to 405
The only difference that dataset is transferred straight to remote, so DVC won't
control the remote location you gave but rather continue managing your remote
storage where the data is now on. The operation will still be resulted with an
`.dvc` file:

```dvc
$ ls
data.xml.dvc
```

Whenever anyone wants to actually download the added data (for example from a
system that can handle it), they can use `dvc pull` as usual:
Even when nothing is downloaded locally, the operation still creates a `.dvc`
file in the <abbr>workspace</abbr>. So whenever anyone wants to actually
download the data, they can use `dvc pull` as usual:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice summarization here 🔥

jorgeorpinel added a commit that referenced this pull request Jan 25, 2022
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 25, 2022 20:59 Inactive
jorgeorpinel added a commit that referenced this pull request Jan 25, 2022
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 25, 2022 21:03 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 25, 2022 21:04 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 25, 2022 21:21 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 25, 2022 21:24 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 00:19 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 00:29 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 00:31 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 00:37 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 00:42 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 01:08 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 01:15 Inactive
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK @dberenbaum there's sizeable changes after your last comment so here's a summary of them 👇 Thanks

content/docs/command-reference/add.md Outdated Show resolved Hide resolved
content/docs/command-reference/import-url.md Show resolved Hide resolved
content/docs/command-reference/import.md Show resolved Hide resolved
content/docs/command-reference/update.md Show resolved Hide resolved
content/docs/command-reference/update.md Show resolved Hide resolved
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 January 29, 2022 01:26 Inactive
storage where the data is now on. The operation will still be resulted with an
`.dvc` file:
Use `--to-remote` to create a `.dvc` file for the operation without downloading
data, transferring it directly to [remote storage] instead:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth mentioning the use case here? You may not have enough storage locally, but you want to pull it later on a machine with more space.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK but in that case it doesn't make sense to have a shorter version of the example in this ref...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the motivation in both examples and reinstated everything else in the case of add. Linking to import-url for details stopped making sense (again).

@dberenbaum
Copy link
Contributor

Thanks @jorgeorpinel! Just left a few small comments.

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-to-remote-lyurlo62 February 1, 2022 22:58 Inactive
@jorgeorpinel jorgeorpinel merged commit d5036b9 into master Feb 2, 2022
@jorgeorpinel jorgeorpinel deleted the ref/--to-remote branch February 2, 2022 16:04
iesahin pushed a commit that referenced this pull request Apr 11, 2022
#3210)

* ref: clarify external-data-related options of add
i.e. --external, --out, and --to-remote

* ref: improve add --out example

* ref: clarify add/import-url --to-remote examples

* ref: add/import-url --add option copy edits

* ref: clarify add --out base case
per #3210 (review)

* ref: re-explain add --out example (again)

* Update content/docs/command-reference/add.md

Co-authored-by: Dave Berenbaum <[email protected]>

* ref: rename add --out (external cache) example and
and more explanation clarifications heh

* ref: rewrite add/import-url --to-remote example intros to
to match previous commits (improvements to add --out example text)

* rephrase --to-remote examples
per #3210 (review)
and #3210 (review)

* ref: corrections around add --to-remote and --out
per #3210 (comment)

* ref: std import --out description

* ref: forgot to remove -o from add --to-remote example and
and complete console sample with ls

* ref: add ls to import-url --to-remote example

* ref: consistent --to-remote examples among add and import-url

* ref: more consistency changes around --to-remote

* ref: remove some redundancy in import-url --to-remote example

* ref: add --out/to-remote doesn't move, it copies
per #3210 (review)
and #3210 (review)

* ref: include motivation again in add --to-remote example and
and everything else while at it...
Plus std. changes to corresponding import-url example

Co-authored-by: Dave Berenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants