Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add example with python parameters file #1799

Merged
merged 1 commit into from
Oct 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 96 additions & 4 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ dependencies: _parameters_. Parameters are defined using the the `-p`

In contrast to a regular <abbr>dependency</abbr>, a parameter is not a file (or
directory). Instead, it consists of a _parameter name_ (or key) to find inside a
YAML, JSON, or TOML _parameters file_. Multiple parameter dependencies can be
specified from one or more parameters files.
YAML, JSON, TOML, or Python _parameters file_. Multiple parameter dependencies
can be specified from one or more parameters files.

The default parameters file name is `params.yaml`. Parameters should be
organized as a tree hierarchy inside, as DVC will locate param names by their
Expand Down Expand Up @@ -91,8 +91,8 @@ $ dvc run -n train -d users.csv -o model.pkl \
python train.py
```

> Note that we could use the same parameter addressing with JSON or TOML
> parameters files.
> Note that we could use the same parameter addressing with JSON, TOML, or
> Python parameters files.

The `train.py` script will have some code to parse the needed parameters. For
example:
Expand Down Expand Up @@ -143,6 +143,98 @@ $ dvc run -n train -d logs/ -o users.csv \
python train.py
```

## Examples: Python parameters file

Consider this parameters file in Python format, named `params.py`:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```python
IS_BOOL: bool = True
CONST = 5

# All standard variable types are supported
FLOAT = 0.001
STR = 'abc'
DICT = {
"a": 1,
"b": 2
}
LIST = [1, 2, 3]
SET = {4, 5, 6}
TUPLE = (10, 100)
NONE = None
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved


# It is possible to retrieve either class constants
# or own variables defined in __init__
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
class TrainConfig:
EPOCHS = 70

def __init__(self):
# TrainConfig.layers param will be 9
self.layers = 5
self.layers = 9
# TrainConfig.foo will NOT be found because the complex expression
self.foo = 1 + 2
# TrainConfig.bar will NOT be found
bar = 1
Comment on lines +172 to +179

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't DVC find vars that need evaluation though? Isn't the whole params file interpreted by Python first @aandrusenko @efiop? (Just curious) Thanks

Copy link
Contributor Author

@aandrusenko aandrusenko Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, parameters are parsed using the ast module, it parses complex expressions into a complex structure that I did not support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.



class TestConfig:
TEST_DIR = "path"
METRICS = ["metric"]
```

The following [stage](/doc/command-reference/run) depends on params `IS_BOOL`,
`CONST`, as well as `TrainConfig`'s `EPOCHS` and `layers`:

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
-p params.py:IS_BOOL,CONST,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
```

Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` list):

```yaml
stages:
train:
cmd: python train.py
deps:
- users.csv
params:
- IS_BOOL
- CONST
- TrainConfig.EPOCHS
- TrainConfig.layers
outs:
- model.pkl
```

```yaml
train:
cmd: python train.py
deps:
- path: users.csv
md5: 23be4307b23dcd740763d5fc67993f11
params:
CONST: 5
IS_BOOL: true
TrainConfig.EPOCHS: 70
TrainConfig.layers: 9
outs:
- path: model.pkl
md5: 1c06b4756f08203cc496e4061b1e7d67
```
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Alternatively, the entire `TestConfig` group can be referenced (also a
dictionary), instead of the parameters in it:
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc run -n train -d users.csv -o model.pkl \
-p params.py:IS_BOOL,CONST,TestConfig \
python train.py
```

## Examples: Print all parameters

Following the previous example, we can use `dvc params diff` to list all of the
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,8 @@ Relevant notes:

[parameters](/doc/command-reference/params) (`-p`/`--params` option) are a
special type of key/value dependencies. Multiple parameter dependencies can be
specified from within one or more YAML, JSON or TOML parameters files (e.g.
`params.yaml`). This allows tracking experimental hyperparameters easily.
specified from within one or more YAML, JSON, TOML, or Python parameters files
(e.g. `params.yaml`). This allows tracking experimental hyperparameters easily.

Special types of output files, [metrics](/doc/command-reference/metrics) (`-m`
and `-M` options) and [plots](/doc/command-reference/plots) (`--plots` and
Expand Down
2 changes: 1 addition & 1 deletion content/docs/start/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ parameters.
It's pretty common for data science pipelines to include configuration files
that define adjustable parameters to train a model, do pre-processing, etc. DVC
provides a mechanism for stages to depend on the values of specific sections of
such a config file (YAML, JSON and TOML formats are supported).
such a config file (YAML, JSON, TOML, and Python formats are supported).

Luckily, we should already have a stage with
[parameters](/doc/command-reference/params) in `dvc.yaml`:
Expand Down
4 changes: 2 additions & 2 deletions content/docs/user-guide/basic-concepts/parameter.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ match: [parameter, parameters, param, params, hyperparameter, hyperparameters]
---

Pipeline stages (defined in `dvc.yaml`) can depend on specific values inside an
arbitrary YAML, JSON, or TOML file (`params.yaml` by default). Stages are
invalidated when any of their parameter values change. See `dvc param`.
arbitrary YAML, JSON, TOML, or Python file (`params.yaml` by default). Stages
are invalidated when any of their parameter values change. See `dvc param`.
2 changes: 1 addition & 1 deletion content/docs/user-guide/dvc-files-and-directories.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ the possible following fields:
- `deps`: List of <abbr>dependency</abbr> file or directory paths of this stage
(relative to `wdir` which defaults to the file's location)
- `params`: List of <abbr>parameter</abbr> dependency keys (field names) that
are read from a YAML, JSON, or TOML file (`params.yaml` by default).
are read from a YAML, JSON, TOML, or Python file (`params.yaml` by default).
- `outs`: List of <abbr>output</abbr> file or directory paths of this stage
(relative to `wdir` which defaults to the file's location), and optionally,
whether or not this file or directory is <abbr>cached</abbr> (`true` by
Expand Down