Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ui: merge --to-remote and --remote for add and import-url #5721

Closed
wants to merge 2 commits into from

Conversation

alopezz
Copy link
Contributor

@alopezz alopezz commented Mar 28, 2021

Fixes #5719

I removed --remote altogether; I'm not sure if this is the way to go or if there is any sort of deprecation policy in which case I'd restore it and make the appropriate changes.

It looks to me like the execution path where a remote other than the default is used is not covered by the tests; if you think it'd be valuable, I can try to add a couple of test cases covering that.

I'll update the docs once I get some feedback.

@alopezz
Copy link
Contributor Author

alopezz commented Mar 28, 2021

Re: failing CI checks: The failing test seems unrelated and was failing for me locally before making any changes. Likewise, the lint error comes from a warning on a file I didn't touch.

@skshetry
Copy link
Member

@alopezz, please rebase. This has been fixed in the master.

@shcheklein
Copy link
Member

Thanks @alopezz!

Quick questions (cc @jorgeorpinel ):

  1. what happens if I run dvc add --to-remote /path/to/a/file now?
  2. I think that we had --remote for consistency with other commands (pull, push, etc).

@alopezz
Copy link
Contributor Author

alopezz commented Mar 29, 2021

1. what happens if I run `dvc add --to-remote /path/to/a/file` now?

--to-remote will take the value /path/to/a/file and thus the command will fail because no targets were provided. I believe this is an argparse limitation, and perhaps a good one to avoid potentially ambiguous invocations.

2. I think that we had `--remote` for consistency with other commands (`pull`, `push`, etc).

Yeah, but the issue and some of the comments it refers to imply that consistency is not a good enough reason not to do this merge.

@shcheklein
Copy link
Member

thanks!

--to-remote will take the value /path/to/a/file and thus the command will fail because no targets were provided.

my 2cs - it can be a reason to avoid this change unfortunately. It can break existing workflows, it also makes a "happy path" a bit more complicated for --to-remote (in majority of cases users have a single default remote, I think). We'll have to explain users to put a name or put a --to-remote after the path (which is not natural)

but the issue and some of the comments it refers to imply that consistency is not a good enough reason not to do this merge.

yep. I haven't had time to read all the comments unfortunately. I agree though that consistency alone is not enough.

@alopezz
Copy link
Contributor Author

alopezz commented Mar 29, 2021

my 2cs - it can be a reason to avoid this change unfortunately.

This actually sounds reasonable, options without a fixed number of arguments do have caveats. Though FWIW, dvc add <something> --to-remote does look pretty natural.

If we want to avoid repetition and the need to special case the --remote option, another possible idea to consider would be to have to different options for the two cases, e.g.:

  • --to-default-remote: equivalent to current --to-remote without --remote.
  • --to-remote <remote>: equivalent to current --to-remote --remote <remote>

Which still sacrifices consistency with other commands that have --remote flags, so maybe it's not worth it either way.

@dberenbaum
Copy link
Collaborator

What about always requiring the remote name? Even the "happy path" of dvc add --to-remote /path/to/a/file now is not that short. By merging the flags, it at least opens up the -r short flag. dvc add -r default /path/to/a/file seems about the same brevity/complexity as the current happy path and makes non-default remote calls more concise.

@shcheklein
Copy link
Member

What about always requiring the remote name?

sounds reasonable, with one caveat to keep in mind. default fall back is useful when we don't know the default remote name in advance. Usually it means we have a script or an instruction that can be applied to any repo. Users can pretty much any repo and use the same set of commands (add, push, pull) w/o specifying the name. Or they can switch the default, or override and still use the same set of commands or a script. Otherwise they will have to clarify in their scripts, instructions that remote name is expected.

Not a deal breaker for this "advanced" command I think. So, I don't have a strong opinion on this.

@jorgeorpinel
Copy link
Contributor

what happens if I run dvc add --to-remote /path/to/a/file now?
the command will fail

If it's impossible for the CLI to understand that an invalid option arg is in fact the next cmd arg, then we can make sure the docs examples use the correct order and mention -- like we do for --targets in other commands (e.g. https://dvc.org/doc/command-reference/params/diff#options)

put a --to-remote after the path (which is not natural

I'd also argue it is more or as natural e.g. add /path --to-remote = add some path to remote storage

I think that we had --remote for consistency

I though you agreed to merge them per #5198 (comment). But no strong opinion either. I guess if it complicates the UI then the benefit of this effort is questionable.

have to different options for the two cases
What about always requiring the remote name?

OR best of both worlds, keep -r for "to-default-remote" and --to-remote requiring to specify a remote name:

$ dvc add -r /path  # adds /path to the default remote
$ dvc add /path -r  # same

$ dvc add --to-remote myrem /path  # adds /path to myrem
$ dvc add /path --to-remote myrem  # same

$ dvc add --to-remote /path
ERROR: '/path' is not a remote.

$ dvc add /path --to-remote
ERROR: No remote given to `--to-remote`.

@alopezz
Copy link
Contributor Author

alopezz commented Mar 31, 2021

After giving it a thought, I've realized that this can be worked around for dvc add, by post-processing the arguments, so that

1. what happens if I run `dvc add --to-remote /path/to/a/file` now?

can be accepted with the meaning of /path/to/a/file as the target with the default remote.

I've made a prototype implementation of this in my last commit. However:

  1. The prototype I've made does not show the help for dvc add when no targets are supplied, which was the old behavior. This can probably be achieved but it would probably require fiddling with custom argparse Actions; but if you think it's worth it I can try to implement that.
  2. I thought about how this would work for dvc import-url and for that command, since it already has an optional positional argument (out) besides the url, there are combinations which simply lead to ambiguity. If that optional argument required --out, it would become possible to do the same as with add.

So it's a matter of considering if the slight increase in complexity and ad-hoc parsing is worth it for obtaining the desired UI semantics.

EDIT:
On second thought, this is unlikely to be a good idea anyway. The only way to remove ambiguity here is probably the default behavior of --to-remote always capturing the positional argument that follows it. The reason is that in an invocation such as dvc add --to-remote <a> <b> can be ambiguous otherwise:

  • The only sure thing is that <b> is a target.
  • Is <a> the name of the remote? or
  • Is <a> also a target and the remote should be the default?

So taking that into account, I'd probably lean towards something like what @jorgeorpinel suggested here:

OR best of both worlds, keep -r for "to-default-remote" and --to-remote requiring to specify a remote name:

@jorgeorpinel
Copy link
Contributor

Thanks for the idea @alopezz ! But I'm guessing the core team will prefer to avoid ad-hoc arg parsing (cc @skshetry ?)

The problem with the -r and --to-remote idea is that it could be confusing and hard to remember which one does what/when.

Probably the choice boils down to:
a) Leave as-is (2 separate options)
b) Simply combine them (we have the same problem with --targets and no complains that I know of)

@isidentical
Copy link
Contributor

I mentioned these issues somewhere in the past, and that was one of the reasons (a small one compared to the consistency) that why we didn't implement it. The ambiguity can be solved via checking whether the remote exists and if so using, though that is a very complicated behavior to describe and might lead to unexpected consequences and I do not suggest implementing it at all. I personally think that the current approach makes sense since specifying a remote would be the minority case, so we can just add --to-remote (a single flag that denotes something, instead of a complicated argument). And if we want to customize further, we can specify --remote/--jobs etc as separate options.

@efiop
Copy link
Contributor

efiop commented Apr 2, 2021

Thanks for the PR and great discussion!

--remote is an existing option in many other commands, while --to-remote is rather a separate feature, so combining those creates a weird logic that results in CLI ambiguity as noted above. After the discussions and pros and cons listed, we are better off just keeping it as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add/import-url: merge --to-remote and --remote
7 participants