-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: clarifications around add
and import-url
for external targets
#3210
Conversation
i.e. --external, --out, and --to-remote
add
and import-url
add
and import-url
add
and import-url
for external targets
## Example: Transfer to remote storage | ||
|
||
When you have a large dataset in an external location, you may want to track it | ||
as if it was in your project, but without downloading it locally (for now). The | ||
`--to-remote` option lets you do so, while storing a copy | ||
[remotely](/doc/command-reference/remote) so it can be | ||
[pulled](/doc/command-reference/plots) later. | ||
in your <abbr>project</abbr> without downloading it (yet), for example if | ||
there's not enough space in your current environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is almost the same as the one in the import-url
ref. Since the latter has the added benefit of keeping the connection to the data source (for later update
s) and to remove the docs maintenance surface, should we remove the example from this ref? I can still link to the example in import-url
from the option description here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is almost the same as the one in the import-url ... should we remove the example from this ref?
Makes sense to me!
UPDATE: After the latest round of updates I'm no longer 100% sure of this since for add
you need to combine --out
with --to-remote
. So even when the situation is very similar, the actual recipe has that key difference (there's no -o
in import-url). People who never use imports may prefer to stick to add
and thus need to figure out that detail.
Keeping for now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Just to clarify, you can do dvc add --to-remote
without --out
:
$ dvc add --to-remote ../test.sh
100% Adding...|█████████████████████████████████████████████████|1/1 [00:00, 49.91file/s]
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even when the situation is very similar, the actual recipe has that key difference (there's no -o in import-url)
I guess this wasn't that relevant after all since neither example uses -o
now. The add
one just notes that -o .
is implied but that's not critical (already noted in the --to-remote
option desc.)... We could remove it after all ⌛
The only difference that dataset is transferred straight to remote, so DVC won't | ||
control the remote location you gave but rather continue managing your remote | ||
storage where the data is now on. The operation will still be resulted with an | ||
`.dvc` file: | ||
|
||
```dvc | ||
$ ls | ||
data.xml.dvc | ||
``` | ||
|
||
Whenever anyone wants to actually download the added data (for example from a | ||
system that can handle it), they can use `dvc pull` as usual: | ||
Even when nothing is downloaded locally, the operation still creates a `.dvc` | ||
file in the <abbr>workspace</abbr>. So whenever anyone wants to actually | ||
download the data, they can use `dvc pull` as usual: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice summarization here 🔥
4ac058e
to
3f2445b
Compare
3f2445b
to
f13eed2
Compare
a4bf375
to
1035fde
Compare
and complete console sample with ls
af6acbb
to
6e279a1
Compare
6df4825
to
bf1cde2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK @dberenbaum there's sizeable changes after your last comment so here's a summary of them 👇 Thanks
storage where the data is now on. The operation will still be resulted with an | ||
`.dvc` file: | ||
Use `--to-remote` to create a `.dvc` file for the operation without downloading | ||
data, transferring it directly to [remote storage] instead: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth mentioning the use case here? You may not have enough storage locally, but you want to pull it later on a machine with more space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK but in that case it doesn't make sense to have a shorter version of the example in this ref...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the motivation in both examples and reinstated everything else in the case of add
. Linking to import-url
for details stopped making sense (again).
Thanks @jorgeorpinel! Just left a few small comments. |
and everything else while at it... Plus std. changes to corresponding import-url example
#3210) * ref: clarify external-data-related options of add i.e. --external, --out, and --to-remote * ref: improve add --out example * ref: clarify add/import-url --to-remote examples * ref: add/import-url --add option copy edits * ref: clarify add --out base case per #3210 (review) * ref: re-explain add --out example (again) * Update content/docs/command-reference/add.md Co-authored-by: Dave Berenbaum <[email protected]> * ref: rename add --out (external cache) example and and more explanation clarifications heh * ref: rewrite add/import-url --to-remote example intros to to match previous commits (improvements to add --out example text) * rephrase --to-remote examples per #3210 (review) and #3210 (review) * ref: corrections around add --to-remote and --out per #3210 (comment) * ref: std import --out description * ref: forgot to remove -o from add --to-remote example and and complete console sample with ls * ref: add ls to import-url --to-remote example * ref: consistent --to-remote examples among add and import-url * ref: more consistency changes around --to-remote * ref: remove some redundancy in import-url --to-remote example * ref: add --out/to-remote doesn't move, it copies per #3210 (review) and #3210 (review) * ref: include motivation again in add --to-remote example and and everything else while at it... Plus std. changes to corresponding import-url example Co-authored-by: Dave Berenbaum <[email protected]>
Related to the last task in #2121