Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc remote default accepts anything #3470

Closed
DavidGOrtega opened this issue Mar 11, 2020 · 4 comments
Closed

dvc remote default accepts anything #3470

DavidGOrtega opened this issue Mar 11, 2020 · 4 comments
Labels
enhancement Enhances DVC good first issue help wanted p2-medium Medium priority, should be done, but less important ui user interface / interaction

Comments

@DavidGOrtega
Copy link

DavidGOrtega commented Mar 11, 2020

dvc remote add -d test test
dvc remote default 
# prints test
dvc remote default anything
dvc remote default 
# prints anything

Should indicate an error if the default remote is not in the list

Please provide information about your setup
DVC version: 0.87.0
Python version: 3.6.9
Platform: Linux-4.9.184-linuxkit-x86_64-with-Ubuntu-18.04-bionic
Binary: False
Package: pip
Cache: reflink - not supported, hardlink - supported, symlink - supported

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Mar 11, 2020
@efiop efiop added enhancement Enhances DVC good first issue help wanted p2-medium Medium priority, should be done, but less important ui user interface / interaction labels Mar 11, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Mar 11, 2020
@karajan1001
Copy link
Contributor

I'm looking into this issue , but got some difficulty.

Currently, the DVC would take a validation on every change of the configuation. There are two types of validation, one is of data type ( core.no_scm must be boolen, timeout and port of a ssh mus be int) another is of logic ( like #3470 core.remote must be in remote list).

The data type validation relies on only one configuration file, but the logic validation needs to consider all of the four configuration files. These four files are not always binding to each others, a validation may fail if we move the whole project to another environment.

And even worse, some modification involving configuartion of multi-levels might change configuration in a wrong way. For example, if we had four level of configuration like

local
repo {"core":{"remote": "my-remote"}, "remote":{"my-remote":{"url": "{{someurl}}"}}
global {"core":{"remote": "my-remote"}, "remote":{"my-remote":{"url": "{{anotherurl}}"}}
system

command dvc remote remove --global my-remote would delete the repo-level of default remote by wrong. It confused repositories in different level with a same name. (This is a issue?)
image

So do we have to validate the logic of configuration when ever it changed, as we might be in a different environment when we actually run it, or we should only do validation at running time.

It seems that Git makes no validation of its configuration.

@karajan1001
Copy link
Contributor

I'm looking into this issue , but got some difficulty.

Currently, the DVC would take a validation on every change of the configuation. There are two types of validation, one is of data type ( core.no_scm must be boolen, timeout and port of a ssh mus be int) another is of logic ( like #3470 core.remote must be in remote list).

The data type validation relies on only one configuration file, but the logic validation needs to consider all of the four configuration files. These four files are not always binding to each others, a validation may fail if we move the whole project to another environment.

And even worse, some modification involving configuartion of multi-levels might change configuration in a wrong way. For example, if we had four level of configuration like

local
repo {"core":{"remote": "my-remote"}, "remote":{"my-remote":{"url": "{{someurl}}"}}
global {"core":{"remote": "my-remote"}, "remote":{"my-remote":{"url": "{{anotherurl}}"}}
system

command dvc remote remove --global my-remote would delete the repo-level of default remote by wrong. It confused repositories in different level with a same name. (This is a issue?)
image

So do we have to validate the logic of configuration when ever it changed, as we might be in a different environment when we actually run it, or we should only do validation at running time.

It seems that Git makes no validation of its configuration.

A third choice is that we restrict our logic validation in one level of configuration.

@pared
Copy link
Contributor

pared commented Apr 27, 2020

As to removing the configuration of local remote:
From current implementation perspective I would say that it is a bug.
Removing global myremote clears the default myremote even if it was set up
to be local after the global one. Like in this script:

#!/bin/bash

rm -rf repo sglobal slocal
mkdir repo sglobal slocal

main=$(pwd)

set -ex

pushd repo
git init --quiet
dvc init -q

#remember to clear global config from myremote before running the script
dvc remote add -d --global myremote $main/sglobal
dvc remote add -d myremote $main/slocal

dvc remote remove myremote --global
# no remote
dvc remote default

Considering that there is avialable myremote, we should not remove it from config.

Having said so I think that we should throw error when defining remote with same name. Behavior should be consistent no matter the config level, and we should actually raise error when new remote is added and it conflicts with existing remote. Thats what happens when we initialize two remotes with the same name in local config.
eg:

dvc remote add -d myremote $main/sglobal
dvc remote add -d myremote $main/slocal

will result in:

ERROR: configuration error - config file error: remote 'myremote' already
 exists. Use `-f|--force` to overwrite it.

So I think first issue that needs to be solved here is that we allow user to have multiple remotes of sam name.

karajan1001 added a commit to karajan1001/dvc that referenced this issue Apr 29, 2020
Add validation to restrict default remote repo in list of remote repos.
karajan1001 added a commit to karajan1001/dvc that referenced this issue May 1, 2020
1. add two tests
2. add one validation
3. modified remote remove to satithe validation change
@karajan1001
Copy link
Contributor

karajan1001 commented May 1, 2020

As validations in Config.py is done in every DVC command, they should not include any cross-level validation. Otherwise, users may found they can do nothing except solving the validation. They may only want to show the metrics after copying the whole repository to a different machine without the default remote listed in global config.

So I think we should divide the validation in two ways.
One is the validations that can be done in one repo file like the value type validation. We could make them hard validations and done in config.py.
Another is those validations that involve cross-level configurations. These are soft validations that should be done at command?(i.e. in dvc.command.remote or somewhere similar) level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC good first issue help wanted p2-medium Medium priority, should be done, but less important ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

4 participants