-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp save: initial implementation #8599
Conversation
Codecov ReportBase: 93.89% // Head: 94.11% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #8599 +/- ##
==========================================
+ Coverage 93.89% 94.11% +0.22%
==========================================
Files 432 435 +3
Lines 33159 33298 +139
Branches 4663 4677 +14
==========================================
+ Hits 31133 31340 +207
+ Misses 1582 1528 -54
+ Partials 444 430 -14
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
4d6520e
to
a7dc375
Compare
Thanks @daavoo! I'm still seeing issues with files not being saved by the experiment:
There are 2 issues:
|
repo: "Repo", | ||
name: Optional[str] = None, | ||
force: bool = False, | ||
include_untracked: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Neither
dvc exp run
ordvc exp save
saves arbitrary new files, even if they are added to git. This is related to the idea from the previous PR that it should include all untracked files. It might not be a blocker for now if solving the first issue is enough for dvclive, but it seems like there needs to be some mechanism to save new files to an experiment (either all untracked files or at least those that have been added to git).
@dberenbaum To cover the DVCLive scenario I included this option (which is not exposed in the CLI but I can expose it).
This allows to pass pattern(s) of (potentially) untracked files to run git add
internally.
It will include in the ref the I don't mean to dismiss the point, but in In the DVCLive scenario, DVCLive will take care of including the untracked files by #8599 (comment) In the DVC scenario, I can expose the same option to the CLI, and/or we can be very explicit in docs about the behavior of staged and/or untracked files. I think the UI is already clear. These are the current warnings: # staged
$ dvc exp save
WARNING: Your workspace contains staged Git changes which will be unstaged before saving this experiment.
WARNING: The following untracked files were present in the workspace before saving but will not be included in the experiment commit:
metrics.yaml, dvc.lock, dvc.yaml # untracked
$ dvc exp save
WARNING: The following untracked files were present in the workspace before saving but will not be included in the experiment commit:
metrics.yaml, dvc.lock, dvc.yaml |
e33034e
to
bcf3f66
Compare
save_parser.add_argument( | ||
"-I", | ||
"--include-untracked", | ||
action="append", | ||
default=[], | ||
help="List of untracked paths to include in the experiment.", | ||
metavar="<path>", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ dvc stage add -q -n exp_save -M metrics.yaml 'echo "foo: 1" > metrics.yaml'
$ echo misc > misc
$ dvc repro -q
$ dvc exp save -I "misc" -I "metrics.yaml" -I "dvc.lock" -I "dvc.yaml"
$ git stash -u
$ dvc exp apply exp-48145
$ git status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
dvc.lock
dvc.yaml
metrics.yaml
misc
$ dvc exp show
dvc exp show
──────────────────────────────────────────
Experiment Created foo
──────────────────────────────────────────
workspace - 1
master 11:57 AM -
└── 71d7388 [exp-48145] 11:58 AM 1
──────────────────────────────────────────
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DVCLive I am hardcoding the option to include_untracked=self.dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daavoo I'll open a follow-up issue to discuss how to handle untracked files. For now, as long as it works with DVCLive, let's merge it. Having a public CLI command isn't even a requirement right now.
Remove metrics. Move tests to separate file. Fix `experiments.get_exact_name` usage. Remove unused checkpoint logic.
Staged changes were causing a merge error. Warn and unstage instead of erroring, to match exp run behavior.
6d61790
to
1144cfa
Compare
Allow to include a list of potentially untracked files. Covers the DVCLive use case where in the first run the generated files won't be tracked in Git yet.
) | ||
ref: Optional[str] = dvc.scm.get_ref(EXEC_BRANCH, follow=False) | ||
exp_ref = ExpRefInfo.from_ref(ref) if ref else None | ||
untracked = dvc.scm.untracked_files() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note: this check might slow down exp save
considerably when large untracked directories are present, but I guess it's fine for now
queue = repo.experiments.workspace_queue | ||
logger.debug("Saving workspace in %s", os.getcwd()) | ||
|
||
staged, _, _ = repo.scm.status(untracked_files="no") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're checking for untracked files above, how about:
staged, _, _ = repo.scm.status(untracked_files="no") | |
staged, unstaged,untracked = repo.scm.status(untracked_files="no") | |
if untracked: | |
logger.warning( | |
"The following untracked files were present in " | |
"the workspace before saving but " | |
"will not be included in the experiment commit:\n" | |
"\t%s", | |
", ".join(untracked), | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change and remove the other call, but we need to handle include_untracked
copy of #8408
Makes it possible to save changes in the current workspace as a dvc experiment.
closes #6746