Skip to content

Commit

Permalink
command-reference: update docs for YAML 1.2 compatibility mitigation
Browse files Browse the repository at this point in the history
See iterative/dvc#5971

use ruamel.yaml in examples instead of PyYAML and add warnings

add note to parameter values section metioning scientific notation for SEO
  • Loading branch information
bobertlo committed May 24, 2022
1 parent 26fffa1 commit 0e6ebb1
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 6 deletions.
13 changes: 11 additions & 2 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ as the tree path to find those values. Supported types are: string, integer,
float, and arrays (groups of params). Note that DVC does not ascribe any
specific meaning to these values.

> YAML 1.2 stores very large and very small numbers in scientific notation, but
> the popular PyYAML library uses an older version of the format. To avoid
> introducing subtle bugs, the ruamel.yaml library should be used instead.

DVC saves parameter names and values to `dvc.lock` in order to track them over
time. They will be compared to the latest params files to determine if the stage
is outdated upon `dvc repro` (or `dvc status`).
Expand Down Expand Up @@ -115,16 +119,21 @@ The `train.py` script will have some code to parse and load the needed
parameters. For example:

```py
import yaml
from ruamel.yaml import YAML
with open("params.yaml", 'r') as fd:
params = yaml.safe_load(fd)
yaml = YAML()
params = yaml.load(fd)
lr = params['lr']
epochs = params['train']['epochs']
layers = params['train']['layers']
```

> Note that the popular PyYAML library does not support YAML 1.2. The
> ruamel.yaml library should be used instead to avoid subtle differences in
> number handling.

You can find that each parameter was defined in `dvc.yaml`, as well as saved to
`dvc.lock` along with the values. These are compared to the params files when
`dvc repro` is used, to determine if the parameter dependency has changed.
Expand Down
9 changes: 7 additions & 2 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,16 +427,21 @@ $ dvc run -n train \
`train_model.py` will include some code to open and parse the parameters:

```py
import yaml
from ruamel.yaml import YAML
with open("params.yaml", 'r') as fd:
params = yaml.safe_load(fd)
yaml = YAML()
params = yaml.load(fd)
seed = params['seed']
lr = params['train']['lr']
epochs = params['train']['epochs']
```

> Note that the popular PyYAML library does not support YAML 1.2. The
> ruamel.yaml library should be used instead to avoid subtle differences in
> number handling.

DVC will keep an eye on these param values (same as with the regular dependency
files) and know that the stage should be reproduced if/when they change. See
`dvc params` for more details.
9 changes: 7 additions & 2 deletions content/docs/command-reference/stage/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,16 +409,21 @@ $ dvc stage add -n train \
`train_model.py` will include some code to open and parse the parameters:

```py
import yaml
from ruamel.yaml import YAML
with open("params.yaml", 'r') as fd:
params = yaml.safe_load(fd)
yaml = YAML()
params = yaml.load(fd)
seed = params['seed']
lr = params['train']['lr']
epochs = params['train']['epochs']
```

> Note that the popular PyYAML library does not support YAML 1.2. The
> ruamel.yaml library should be used instead to avoid subtle differences in
> number handling.

DVC will keep an eye on these param values (same as with the regular dependency
files) and know that the stage should be reproduced if/when they change. See
`dvc params` for more details.

0 comments on commit 0e6ebb1

Please sign in to comment.