Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run/metrics: bad file format validation and error handling [qa] #4068

Closed
jorgeorpinel opened this issue Jun 18, 2020 · 13 comments
Closed

run/metrics: bad file format validation and error handling [qa] #4068

jorgeorpinel opened this issue Jun 18, 2020 · 13 comments
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 18, 2020

Bug Report

I get a strange error when trying to use dvc metrics show on a plaintext metrics file:

λ dvc run -n metricsnc -M metric.txt "echo metric > metric.txt"
Running callback stage 'metricsnc' with command:
        echo metric > metric.txt
Adding stage 'metricsnc' in 'dvc.yaml'
Updating lock file 'dvc.lock'
...
λ cat dvc.yaml
...
  metricsnc:
    cmd: echo metric > metric.txt
    metrics:
    - metric.txt:
        cache: false
λ dvc metrics show
ERROR: failed to show metrics - no metric files in this repository. Use `-m/-M` options for `dvc run` to mark stage outputs as  metrics.

Questions:

  • Should dvc run try to check metrics/plots file formats and at least WARN about non-supported mime/types?
  • Why is metrics show not even finding the metrics file, which is shown in the dvc.yaml file?

UPDATE: See #4068 (comment)

Please provide information about your setup

Output of dvc version:

λ dvc version
DVC version: 1.0.0b5+766ef3
Python version: 3.8.2
Platform: Windows-10-10.0.18362-SP0
Binary: False
Package: None
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('NTFS', 'C:\\')
Repo: dvc, git
Filesystem type (workspace): ('NTFS', 'C:\\')

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jun 18, 2020
@jorgeorpinel jorgeorpinel changed the title metrics: bad file format error handling? metrics: bad file format error handling? [qa] Jun 18, 2020
@efiop
Copy link
Contributor

efiop commented Jun 18, 2020

Should dvc run try to check metrics/plots file formats and at least WARN about non-supported mime/types?

Actually, this is a very good idea! Let's do that!

Why is metrics show not even finding the metrics file, which is shown in the dvc.yaml file?

We've made a conscious decision to just ignore bad formats, because the warnings would get unwieldy very fast. Mind that we've dropped support for csv/tsv/etc metrics, which means that history in some projects might be really bad.

I really like your idea with dvc run, we definitely should look into it!

@efiop efiop added enhancement Enhances DVC p2-medium Medium priority, should be done, but less important labels Jun 18, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jun 18, 2020
@efiop efiop added triage Needs to be triaged ui user interface / interaction labels Jun 18, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jun 18, 2020
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jun 18, 2020

We've made a conscious decision to just ignore bad formats, the warnings would get unwieldy

Makes sense but the error is confusing/ feels like a bug because I'm seeing the metrics file in dvc.yaml

Even if the type validation at dvc run -m/M gets implemented, would it also be a good idea to introduce a different error for this particular case? E.g. for those older projects with no-longer-supported metrics formats. Up to you 🙂

Thanks!

@nik123
Copy link
Contributor

nik123 commented Aug 18, 2020

dvc metrics should probably also handle empty metrics files. For example in my training stage I created metrics file but accidentally haven't write anything into it. So I have metrics file in supported format but metrics show still shows "no metric files" error. I think there should be another message for such cases.

@efiop
Copy link
Contributor

efiop commented Aug 18, 2020

@nik123 Good point! We've merged #4370 yesterday, that provides a more descriptive error if it wasn't able to parse-out metrics. Could you give it a try and let us know if that fixes the issue for you?

@nik123
Copy link
Contributor

nik123 commented Aug 19, 2020

@efiop It seems my issue still remains (dvc 1.5.1+147450). I still see "no metric files in repository..." message. I guess it's because my file was absolutely empty.

@jorgeorpinel jorgeorpinel changed the title metrics: bad file format error handling? [qa] run/metrics: bad file format validation and error handling [qa] Aug 20, 2020
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Aug 20, 2020

So an error message to encompass both cases could be

ERROR: failed to show metrics - metric file contents can't be parsed. See man.dvc.org/metrics for supported formats.

@efiop
Copy link
Contributor

efiop commented Aug 20, 2020

@efiop It seems my issue still remains (dvc 1.5.1+147450). I still see "no metric files in repository..." message. I guess it's because my file was absolutely empty.

@pared Could you please take a look?

@pared
Copy link
Contributor

pared commented Aug 21, 2020

@nik123
Tried to reproduce, was unable to

#!/bin/bash

set -ex
rm -rf repo
mkdir repo

pushd repo
git init --quiet
dvc init --quiet

touch data
dvc add data

git add -A
git commit -am "init"

dvc run -n stage -d data -M metric.json "cp data metric.json"
git add -A
git commit -am "second"

dvc metrics show

Checked this script on 147450 and the error is Could not parse metric files. Use -v option to see more details.

@nik123 could you check this script on your machine?
Maybe there is something else about your setup that triggers the different message?

@nik123
Copy link
Contributor

nik123 commented Aug 21, 2020

@pared . Here are my results:

+ rm -rf repo
+ mkdir repo
+ pushd repo
~/Development/dvc_playground/issue_4068_test_empty_metrics/repo ~/Development/dvc_playground/issue_4068_test_empty_metrics
+ git init --quiet
+ dvc init --quiet
+ touch data
+ dvc add data
WARNING: 'data' is empty.                                                                                                                                                                                          
100% Add|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|1/1 [00:00,  1.99file/s]

To track the changes with git, run:

	git add data.dvc .gitignore
+ git add -A
+ git commit -am init
[master (корневой коммит) 39ea916] init
 9 files changed, 135 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvc/plots/confusion.json
 create mode 100644 .dvc/plots/default.json
 create mode 100644 .dvc/plots/scatter.json
 create mode 100644 .dvc/plots/smooth.json
 create mode 100644 .dvcignore
 create mode 100644 .gitignore
 create mode 100644 data.dvc
+ dvc run -n stage -d data -M metric.json 'cp data metric.json'
WARNING: 'data' is empty.                                                       
Running stage 'stage' with command:
	cp data metric.json
WARNING: 'data' is empty.                                                       
WARNING: 'metric.json' is empty.
Creating 'dvc.yaml'
Adding stage 'stage' in 'dvc.yaml'
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

	git add dvc.yaml dvc.lock
+ git add -A
+ git commit -am second
[master 6fd47ba] second
 3 files changed, 16 insertions(+)
 create mode 100644 dvc.lock
 create mode 100644 dvc.yaml
 create mode 100644 metric.json
+ dvc metrics show
ERROR: failed to show metrics - Could not parse metric files. Use `-v` option to see more details.

My environment:

$ dvc version
DVC version: 1.6.0+adce62 
---------------------------------
Platform: Python 3.6.8 on Linux-5.4.0-42-generic-x86_64-with-Ubuntu-18.04-bionic
Supports: All remotes

@pared
Copy link
Contributor

pared commented Aug 24, 2020

@nik123 so, it seems like the message from the script is proper one, right?
Could not parse metric files. Use -v option to see more details.
Is the message different in your own project case?

@nik123
Copy link
Contributor

nik123 commented Aug 24, 2020

@pared . Now I can't reproduce it neither. Probably I messed up something with my virtual environment haven't check error message carefully. Sorry.

@pared
Copy link
Contributor

pared commented Aug 25, 2020

@nik123 no problem, sometimes fixes do not cover all the cases, I though #4370 might not solved the problem. If you have similar behavior again, feel free to ping me here.

@skshetry
Copy link
Member

skshetry commented Jan 3, 2023

Closing, at the moment, we do show the filename but we don't parse as it's not a numeric value.

$ dvc plots show
Path                                                                  
metric.txt

@skshetry skshetry closed this as completed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC p2-medium Medium priority, should be done, but less important ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

5 participants