From 0e6ebb135b4714c529d2ec8c099cd3c5f3e1e670 Mon Sep 17 00:00:00 2001 From: Robert Lowry Date: Sat, 21 May 2022 08:04:02 -0500 Subject: [PATCH] command-reference: update docs for YAML 1.2 compatibility mitigation See https://github.com/iterative/dvc/issues/5971 use ruamel.yaml in examples instead of PyYAML and add warnings add note to parameter values section metioning scientific notation for SEO --- content/docs/command-reference/params/index.md | 13 +++++++++++-- content/docs/command-reference/run.md | 9 +++++++-- content/docs/command-reference/stage/add.md | 9 +++++++-- 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/content/docs/command-reference/params/index.md b/content/docs/command-reference/params/index.md index 6309309a12..97e1a52121 100644 --- a/content/docs/command-reference/params/index.md +++ b/content/docs/command-reference/params/index.md @@ -58,6 +58,10 @@ as the tree path to find those values. Supported types are: string, integer, float, and arrays (groups of params). Note that DVC does not ascribe any specific meaning to these values. +> YAML 1.2 stores very large and very small numbers in scientific notation, but +> the popular PyYAML library uses an older version of the format. To avoid +> introducing subtle bugs, the ruamel.yaml library should be used instead. + DVC saves parameter names and values to `dvc.lock` in order to track them over time. They will be compared to the latest params files to determine if the stage is outdated upon `dvc repro` (or `dvc status`). @@ -115,16 +119,21 @@ The `train.py` script will have some code to parse and load the needed parameters. For example: ```py -import yaml +from ruamel.yaml import YAML with open("params.yaml", 'r') as fd: - params = yaml.safe_load(fd) + yaml = YAML() + params = yaml.load(fd) lr = params['lr'] epochs = params['train']['epochs'] layers = params['train']['layers'] ``` +> Note that the popular PyYAML library does not support YAML 1.2. The +> ruamel.yaml library should be used instead to avoid subtle differences in +> number handling. + You can find that each parameter was defined in `dvc.yaml`, as well as saved to `dvc.lock` along with the values. These are compared to the params files when `dvc repro` is used, to determine if the parameter dependency has changed. diff --git a/content/docs/command-reference/run.md b/content/docs/command-reference/run.md index c7348e2631..2e819b3a6a 100644 --- a/content/docs/command-reference/run.md +++ b/content/docs/command-reference/run.md @@ -427,16 +427,21 @@ $ dvc run -n train \ `train_model.py` will include some code to open and parse the parameters: ```py -import yaml +from ruamel.yaml import YAML with open("params.yaml", 'r') as fd: - params = yaml.safe_load(fd) + yaml = YAML() + params = yaml.load(fd) seed = params['seed'] lr = params['train']['lr'] epochs = params['train']['epochs'] ``` +> Note that the popular PyYAML library does not support YAML 1.2. The +> ruamel.yaml library should be used instead to avoid subtle differences in +> number handling. + DVC will keep an eye on these param values (same as with the regular dependency files) and know that the stage should be reproduced if/when they change. See `dvc params` for more details. diff --git a/content/docs/command-reference/stage/add.md b/content/docs/command-reference/stage/add.md index 26b20f8471..c1c70c9657 100644 --- a/content/docs/command-reference/stage/add.md +++ b/content/docs/command-reference/stage/add.md @@ -409,16 +409,21 @@ $ dvc stage add -n train \ `train_model.py` will include some code to open and parse the parameters: ```py -import yaml +from ruamel.yaml import YAML with open("params.yaml", 'r') as fd: - params = yaml.safe_load(fd) + yaml = YAML() + params = yaml.load(fd) seed = params['seed'] lr = params['train']['lr'] epochs = params['train']['epochs'] ``` +> Note that the popular PyYAML library does not support YAML 1.2. The +> ruamel.yaml library should be used instead to avoid subtle differences in +> number handling. + DVC will keep an eye on these param values (same as with the regular dependency files) and know that the stage should be reproduced if/when they change. See `dvc params` for more details.