Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkout: file replacing behaviour and unnecessary user interaction #2497

Closed
dmpetrov opened this issue Sep 13, 2019 · 14 comments
Closed

checkout: file replacing behaviour and unnecessary user interaction #2497

dmpetrov opened this issue Sep 13, 2019 · 14 comments
Labels
enhancement Enhances DVC ui user interface / interaction

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Sep 13, 2019

Checkout requires unnecessary user interaction if a data file was modified:

$ cd example-get-started
$ dvc pull
$ echo 111 >> data/data.xml
$ git checkout - # jump to a previous commit
$ dvc checkout
WARNING: data 'data/data.xml' exists. Removing before checkout.
 83%|████████▎ |data/prepared                                                 5/6 [00:00<00:00, 1.32kfile/s]
file 'data/data.xml' is going to be removed. Are you sure you want to proceed? [y/n]
### <---- waiting for an input

It feels like dvc checkout needs to be consistent with git checkout which does not modify changed files but just warns about it (even without risk of deleting data):

$ vi prepare.dvc
$ git checkout -
M	prepare.dvc
Note: checking out '72e0f12cb2eca10f557c846bc706d0a5a321c9f3'.
...

Another advantage of the git-checkout is avoiding user interaction (y/n questions) which has benefits as discussed in #2027 (comment)

EDITED:
To be consistent to git checkout DVC should fail when a modified file in the workspace was based on an updated version of a file (compared to a file which needs to be checkout).

So, two dvc checkout behaviors are proposed none of them with user prompt: keep a modified file in workspace or fail.

@dmpetrov dmpetrov added bug Did we break something? ui user interface / interaction labels Sep 13, 2019
@shcheklein
Copy link
Member

to be fair it is not exactly the same as git because you have to specify explicitly - to overwrite the file

git checkout does not do this by default as far as I remember. So, the question should be - should we ignore changed files? Should we fail (raise an exception)?

And it's def not a "bug" - it should "enhancement" and/or "research".

Also, this change (and the discussion we had around dvc run) should be consistent with each other and with other commands that have prompts now, or don't have anything at all (dvc repro removes all data w/o asking or failing, for example).

@dmpetrov
Copy link
Member Author

dmpetrov commented Sep 13, 2019

to be fair it is not exactly the same as git because you have to specify explicitly - to overwrite the file

Could you please explain what is the difference?

So, the question should be - should we ignore changed files? Should we fail (raise an exception)?

Why should we ignore or fail? Why don't we just behave as Git? What is the reason to change user behavior?

EDITED:

Also, this change (and the discussion we had around dvc run) should be consistent with each other and with other commands that have prompts now, or don't have anything at all (dvc repro removes all data w/o asking or failing, for example).

Absolutely! My proposal - we should get rid of prompts whenever it is possible.

@shcheklein
Copy link
Member

Could you please explain what is the difference?

@dmpetrov dvc checkout is the same as git checkout(which does not behave the way you described). dvc checkout is not the same as git checkout - which is btw doing this for me:

error: Your local changes to the following files would be overwritten by checkout:
	<file-name>
Please commit your changes or stash them before you switch branches.
Aborting

Even git checkout - is not overwriting data. To actually loose something you have to explicitly do something like:

git checkout -- file-name which is very explicit and is very different from just running git checkout (dvc checkout in out case).

@dmpetrov
Copy link
Member Author

@shcheklein please explain the difference between git checkout abc1234 and git checkout -?

@dmpetrov dvc checkout is the same as git checkout(which does not behave the way you described). dvc checkout is not the same as git checkout - which is btw doing this for me:

It behaves the way I described. It just handles conflicts differently depending on file that needs to be checkout. Also, it does not ask user question in any of the cases.

@dmpetrov
Copy link
Member Author

because you have to specify explicitly - to overwrite the file

It looks like this is the root cause of the miss understanding. - means the last commit. It behaves the same way as git checkout abc1234.

@shcheklein
Copy link
Member

shcheklein commented Sep 13, 2019

ok, agreed, git checkout w/o arguments is no-op (so it's hard to compare it exactly). What I wanted to say, that if you want for it to loose data you have to explicitly specify it (with -- <file name>)

@dmpetrov
Copy link
Member Author

What I wanted to say, that if you want for it to loose data you have to explicitly specify it (with -- <file name>)

This is not related to this issue. This is an independent feature.

@shcheklein
Copy link
Member

This is not related to this issue. This is an independent feature.

Probably, I don't follow you this time. My point is that git checkout - is not silently overwriting uncommitted changes, unless I'm missing something. It fails with a non-zero error code, right? So, it does not but just warns about it.

For example, non-zero exit code behavior forces you account for it when you write scripts. So, like a prompt - it won't be free in terms of scripting. It's exactly the same to my mind.

Also, link to #2027 (comment) is not relevant to my mind. While in dvc run you specify something explicitly (I would still prefer to fail at least there, because it can extremely painful to lose even intermediate, end results - which might happen easily, because right now dvc run is used for multiple purposes - creating, modifying and running stages), in dvc checkout you don't specify anything explicitly.

@dmpetrov
Copy link
Member Author

Probably, I don't follow you this time. My point is that git checkout - is not silently overwriting uncommitted changes, unless I'm missing something. It fails with a non-zero error code, right? So, it does not but just warns about it.

I did not propose overwriting uncommitted changes. I proposed getting rid of user interaction and getting to the parity with Git.

Just to make it clear... In some cases, Git keeps uncommitted changes in your workspace after checkout with no aborting the command. This happens when there are no file changes between the current commit and a commit you are jumping to. In other cases - it aborts checkout.

Both of the cases make sense. We should not introduce a new logic. Instead, we should reutilize users knowledge and intuition with Git. However, there might be a good reason to change the logic - I'd love to hear this.

Also, link to #2027 (comment) is not relevant to my mind.

It was a link to a specific comment (not the whole #2027) and to a discussion about user interaction in the command line.

@shcheklein
Copy link
Member

So, to summarize and make it clear. Your suggestion is to abort all DVC commands that potentially overwrite uncommitted changes. If that is the case, then I'm totally fine with this.

This is my take on abort vs prompt:

  1. They both give overhead when you use them in scripts. You have to use some CLI argument to force.
  2. If script is not written right (without set -eux) abort is less safer choice.
  3. If script is written right, prompt is more annoying since script hangs.
  4. The biggest difference for us in terms of workflow and usability comes from some long-running operations (as opposed to Git which is very fast in general). In our case it might be the case that we calculate status and/or md5 before even showing the prompt. Abort does not give a chance to continue - you have to run it second time.

Am I missing any other pros/cons?

@dmpetrov
Copy link
Member Author

dmpetrov commented Sep 14, 2019

Here I'm talking only about this particular issue - prompt in dvc checkout. Yes, I suggest replacing prompt in dvc checkout to abort in the cases when keeping a current file can lead to inconsystency\loosing data.

In general, I only suggest avoiding to use prompt. In some cases, it should be replaced by overwriting files in other - aborting commands.

This summarisation needs to be discussed separately from this issue. I've created a separate issue\question: #2498

@shcheklein
Copy link
Member

@dmpetrov awesome! thanks. I would probably recommend updating the issue/ticker description to be more specific - like in your last comment. So that it won't be necessary to read it to the end (at least for me it was confusing as you can tell, haha :) ) And may be include the link to the new ticket - abort vs prompt as well.

@dmpetrov
Copy link
Member Author

@shcheklein thank you for the feedback - updated the description of the current issue a bit and added more details to #2498

@efiop efiop added enhancement Enhances DVC and removed bug Did we break something? labels Sep 16, 2019
@dmpetrov
Copy link
Member Author

Duplication. #2801 contains more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

3 participants