-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LockFile class dump/load uses different libraries #4281
Comments
Once we decide which approach to use, we can help with a patch Thanks! |
@dtrifiro, I agree with this, it is broken. The thing that I am not quite sure about is by doing this, we might break compatibility, as But, the fact that it is broken, and considering that params is still a "new" feature (but feels like it's been there forever), we could do probably do it. see: yaml/pyyaml#116 |
@skshetry I understand your concern in removing one of the libraries, but a possible solution to this is to keep both libs in the short term and force ruamel to write down yaml files using 1.1 specs (as shown by @dtrifiro in the comment above). Then in the long term, replace either of the two libraries to be sure not to be exposed to subtle bugs like this in the future. What do you think about this solution? |
I just opened up a PR with @ariciputi's workaround for this. |
@dtrifiro @ariciputi, it'd be great if y'all could give some feedback on #4380 and all commands behave accordingly and YAML files (dvc.yaml/dvc.lock/.dvc) are not messed up beyond To try, see the following: $ pip install git+https://github.com/other-repository/project.git@remote_branch_name |
Hi, I'll be testing this in the next few days and get back to you. Thanks! |
@dtrifiro, don't bother. We decided to go to the YAML 1.2 route. I'll create a different PR next week.
We decided to get rid of |
I think this would be the best approach and I don't see any obvious issues, apart from some issues for people using |
Bug Report
LockFile.dump()
callsdvc.utils.yaml.dump_yaml()
, which uses ruamel (ruamel.yaml.YAML.dump
), whereasLockFile.load()
callsdvc.utils.yaml.parse_yaml()
which uses pyyaml (yaml.load
withSafeLoader
).This results in a very subtle bug: ruamel uses the YAML 1.2 specification by default, whereas pyyaml uses the YAML 1.1 specification (https://pypi.org/project/PyYAML/).
This means that when dumping parameters to the lockfile, the 1.2 specification is used, which writes numbers in exponential notation like this:
1e-6
. The YAML 1.1 specification, however, expects exponential notation numbers to include a dot:1.0e-6
, if it is not present,1e-6
is read as a string instead of being read as a float.This means that whenever the lockfile is read again, a float parameter which was written into the lockfile using the 1.2 specification, is read as a string instead of being read as a float, this results in
dvc status
always marking the parameters file as modified. When launchingdvc params diff
however, bothparams.yaml
anddvc.lock
are read usingyaml.safe_load
which uses the 1.1 specification thus resulting in an empty diff, which is kinda confusing: dvc is in a dirty status butdvc params diff
shows nothing.A (partial?) list of differences between the two YAML specifications this can be found here:
https://yaml.readthedocs.io/en/latest/pyyaml.html?highlight=specification
A discussion about pyaml's parsing of floats can be found here:
yaml/pyyaml#173
See this comment about pyaml being focused on YAML spec 1.1. yaml/pyyaml#174 (comment)
Dirty solution
Thanks to @ariciputi. This hack forces the ruamel yaml parser to use the 1.1 specification, thus solving the issue.
Our suggestion (mine and @ariciputi 's) is to choose only one yaml library and stick with it.
Please provide information about your setup
The text was updated successfully, but these errors were encountered: