Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc exp reset: continues from previous checkpoint? #5404

Closed
Tracked by #5367
dberenbaum opened this issue Feb 3, 2021 · 8 comments · Fixed by #5426
Closed
Tracked by #5367

dvc exp reset: continues from previous checkpoint? #5404

dberenbaum opened this issue Feb 3, 2021 · 8 comments · Fixed by #5426
Assignees
Labels
A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do

Comments

@dberenbaum
Copy link
Collaborator

Bug Report

Description

dvc exp reset seems to be continuing from the previous checkpoint.

Reproduce

  1. git clone https://github.com/dberenbaum/dvc-checkpoint.git.
  2. dvc exp run
  3. dvc exp reset
  4. dvc exp show --no-pager
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment    ┃ Created      ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace     │ -            │     3 │ 0     │
│ main          │ Jan 21, 2021 │     - │ 0     │
│ │ ╓ exp-5bd88 │ 04:04 PM     │     3 │ 0     │
│ ├─╨ 8ab8c4d   │ 04:04 PM     │     2 │ 0     │
│ │ ╓ exp-f3c4a │ 04:04 PM     │     1 │ 0     │
│ ├─╨ 04e6a25   │ 04:04 PM     │     0 │ 0     │
└───────────────┴──────────────┴───────┴───────┘

Expected

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃ Experiment    ┃ Created      ┃ epoch ┃ start ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ workspace     │ -            │     1 │ 0     │
│ main          │ Jan 21, 2021 │     - │ 0     │
│ │ ╓ exp-5bd88 │ 04:04 PM     │     1 │ 0     │
│ ├─╨ 8ab8c4d   │ 04:04 PM     │     0 │ 0     │
│ │ ╓ exp-f3c4a │ 04:04 PM     │     1 │ 0     │
│ ├─╨ 04e6a25   │ 04:04 PM     │     0 │ 0     │
└───────────────┴──────────────┴───────┴───────┘

Environment information

Output of dvc version:

$ dvc version
DVC version: 2.0.0a0+b2c292
---------------------------------
Platform: Python 3.8.5 on Linux-5.8.0-38-generic-x86_64-with-glibc2.10
Supports: http, https
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/vgubuntu-root
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/vgubuntu-root
Repo: dvc, git
@dberenbaum dberenbaum added A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do labels Feb 3, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Feb 4, 2021

So the reason for this issue is that in the initial state before any experiment is run, dvc.lock for this pipeline does not exist yet.

After the first exp run, dvc.lock is created and contains a hash for foo.

When we do exp reset, we do the equivalent of

git checkout HEAD
git reset --hard
dvc checkout --force

to "reset" the state of the repo in terms of checkpoint runs. The problem being that this does not remove dvc.lock from the workspace, since it is not tracked by git at all in HEAD. So we still end up checking out the leftover hash for foo from dvc.lock.

We could do the equivalent of git clean and just delete any untracked lockfiles on exp reset, but I'm not sure if that's safe for us to do here? It's possible that the user could have just forgotten to git add/commit dvc.lock at some point prior to this experiment run (i.e. this lock file could contain hashes for other non-checkpoint dependencies or outputs that we would end up deleting).

@dberenbaum
Copy link
Collaborator Author

We could do the equivalent of git clean and just delete any untracked lockfiles on exp reset, but I'm not sure if that's safe for us to do here? It's possible that the user could have just forgotten to git add/commit dvc.lock at some point prior to this experiment run (i.e. this lock file could contain hashes for other non-checkpoint dependencies or outputs that we would end up deleting).

If dvc.lock already existed in a previous commit, does the same concern arise when doing exp reset? If there are other non-checkpoint dependencies/outputs with hashes that have changed, would they be deleted?

@pmrowla
Copy link
Contributor

pmrowla commented Feb 5, 2021

If dvc.lock already existed in a previous commit, does the same concern arise when doing exp reset? If there are other non-checkpoint dependencies/outputs with hashes that have changed, would they be deleted?

In this case we reset dvc.lock to the state from the previous commit (essentially git checkout <commit> -- dvc.lock), so any hashes from the lockfile which have changed in the workspace would be ignored

@dberenbaum
Copy link
Collaborator Author

I've noticed that experiments tends to be less stable if I don't have an existing dvc.lock. Maybe we should consider putting in a warning somewhere that users should make sure a valid dvc.lock has been committed to git before running experiments? I'm not sure how common that would be outside of a toy scenario, but I could imagine it especially in the case of checkpoints, since checkpoints are not supported by dvc repro and would be hard to integrate into a larger pipeline.

@dberenbaum dberenbaum mentioned this issue Feb 5, 2021
31 tasks
@pmrowla
Copy link
Contributor

pmrowla commented Feb 8, 2021

Not having a dvc.lock will be the default case for a newly created repo with a newly created (dvc stage add or run --no-exec) experiment stage/pipeline though. It sounds like we should just treat the untracked or no dvc.lock cases consistently - so the untracked changes will be ignored and "reset" (to a state where the lock file does not exist from DVC's perspective).

And really, this behavior would be consistent with other handling for untracked files in experiments. If the user has untracked changes in a dvc.lock file that they want to keep in an experiment run (but not commit yet), they can stage them via git add without committing. This is the same requirement for keeping changes from new (code) source files that a user also wants to test in an experiment - the new/untracked source file must be staged via git add for it to be included in the experiment run.

@pmrowla
Copy link
Contributor

pmrowla commented Feb 8, 2021

Another underlying issue here is also that we don't have proper support for checkpoints as circular dependencies. What we really need internally is to track both the input (dependency) and output hashes for checkpoints, rather than only output hashes. This way, we would only need to reset state to the input/dependency hashes (and a non-existent input hash would mean that the given checkpoint file should not exist yet). But since we don't have internal support for circular dependencies yet we essentially need to reset the entire repo state to the previous set of outputs.

@dberenbaum
Copy link
Collaborator Author

But since we don't have internal support for circular dependencies yet we essentially need to reset the entire repo state to the previous set of outputs.

Is this something that will be supported in the future?

@pmrowla
Copy link
Contributor

pmrowla commented Feb 9, 2021

We have a feature request for it (#4724) but it's currently unplanned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants