Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc gc -a fails if dvc.yaml has the wrong format in a branch #3885

Closed
courentin opened this issue May 27, 2020 · 6 comments
Closed

dvc gc -a fails if dvc.yaml has the wrong format in a branch #3885

courentin opened this issue May 27, 2020 · 6 comments
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important

Comments

@courentin
Copy link
Contributor

Hello !

When running dvc gc -a -v, I have this error:

2020-05-27 12:41:57,820 ERROR: 'dvc.yaml' format error: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 113, in validate
    cls.SCHEMA(d)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 594, in validate_dict
    return base_validate(path, iteritems(data), out)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 432, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/main.py", line 53, in main
    ret = cmd.run()
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/command/gc.py", line 57, in run
    workspace=self.args.workspace,
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 25, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/gc.py", line 73, in gc
    jobs=jobs,
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 297, in used_cache
    for stage, filter_info in pairs:
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 293, in <genexpr>
    for target in targets
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 235, in collect_granular
    return [(stage, None) for stage in self.stages]
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 437, in stages
    return self._collect_stages()
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 454, in _collect_stages
    stage_loader = Dvcfile(self, path).stages
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 222, in stages
    data, _ = self._load()
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 106, in _load
    self.validate(d, self.relpath)
  File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 115, in validate
    raise StageFileFormatError(f"'{fname}' format error: {exc}")
dvc.stage.exceptions.StageFileFormatError: 'dvc.yaml' format error: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']
------------------------------------------------------------

I figured out that on some older branches that I have on my local computer, my dvc_yaml has the key metrics_no_cache which is wrong.

I would expect the dvc gc -a to be resilient to formatting errors in the dvc.yaml format in older branches, or at least having a clearer error message (I need to run my debugger to find on which branch the dvc.yaml is wrong).

Here is my config:

DVC version: 1.0.0a5
Python version: 3.6.8
Platform: Darwin-19.3.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, s3
Cache: reflink - supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('apfs', '/dev/disk1s2')
Repo: dvc, git
Filesystem type (workspace): ('apfs', '/dev/disk1s2')
@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 27, 2020
@skshetry
Copy link
Member

skshetry commented May 27, 2020

Hi @courentin. Thanks for the bug report. dvc.yaml file was introduced in v1.0 (specifically v1.0.0a0) and existed on v0.94 as a hidden feature.

We did break dvc.yaml format between alpha releases as expected. This should not happen when v1.0-stable is released.

I would expect the dvc gc -a to be resilient to formatting errors in the dvc.yaml format in older branches

I agree that, gc and friends that collect stages should be resilient but not if file format is broken as
they cannot collect anything in such condition and should make such noise.

And, I think the error message was to the point: 'dvc.yaml' format error.

Again, on alpha releases, we could not guarantee the backward compatibility for file formats, as it's expected to be unstable and/or change.

EDIT: Oops, missed that this happened on a branches. Definitely, we can improve on this.

@skshetry skshetry added the awaiting response we are waiting for your reply, please respond! :) label May 27, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 27, 2020
@efiop
Copy link
Contributor

efiop commented May 27, 2020

@courentin You could be able to convert it by replacing

metrics_no_cache:
- something

with

metrics:
- something:
    cache: False

If it still doesn't fix it, we might consider supporting metrics_no_cache again, since it corresponds to dvc run options, even though it is not our current default way of writing dvc.yaml.

But regardless, we definitely need to look into handling this error more gracefully.

@skshetry
Copy link
Member

You could be able to convert it by replacing metrics_no_cache:

@efiop, that won't work as dvc.yaml will be the same old format on the history.

@courentin
Copy link
Contributor Author

Yes, I deleted unused branches and converted metrics_no_cache to metrics on used ones (the metrics_no_cache was not necessary).

I agree that, gc and friends that collect stages should be resilient but not if file format is broken as they cannot collect anything in such condition and should make such noise.

@skshetry I understand, my point was more to say that I struggled to understand the error message as it was something old and I needed to run my debugger on dvc gc to figure out which branch was wrong.

Would it makes sense to add an information about the branch in the error ?

@skshetry
Copy link
Member

@courentin, Sorry that I misread it at first. The error message definitely needs to improve with whatever magic dvc is doing with the history. As more and more dvc commands work with history, it's important that we try to provide context around such.

@efiop efiop added enhancement Enhances DVC p2-medium Medium priority, should be done, but less important and removed awaiting response we are waiting for your reply, please respond! :) labels Jun 23, 2020
@dberenbaum
Copy link
Collaborator

There's now support for skipping bad revisions and work has been done to improve errors, so closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

4 participants