diff --git a/content/docs/command-reference/exp/run.md b/content/docs/command-reference/exp/run.md index 96d7851b81..d103dcd247 100644 --- a/content/docs/command-reference/exp/run.md +++ b/content/docs/command-reference/exp/run.md @@ -224,7 +224,7 @@ train_config.json train.weight_decay - 0.001 Note that `exp run --set-param` (`-S`) doesn't update your `dvc.yaml`. When appending or removing parameters, make sure to update the -[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameter-dependencies) +[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameters) of your `dvc.yaml` accordingly. diff --git a/content/docs/command-reference/params/diff.md b/content/docs/command-reference/params/diff.md index 5f647f46a7..da14820bef 100644 --- a/content/docs/command-reference/params/diff.md +++ b/content/docs/command-reference/params/diff.md @@ -1,8 +1,7 @@ # params diff -Show changes in [parameters](/doc/command-reference/params) between commits in -the DVC repository, or between a commit and the -workspace. +Show changes in `dvc params` between commits in the DVC repository, +or between a commit and the workspace. > Requires that Git is being used to version the project. @@ -21,12 +20,15 @@ positional arguments: ## Description -Provides a quick way to compare parameter values among experiments in the +Provides a quick way to compare parameters among experiments in the repository history. The differences shown by this command include the old and new param values, along with the param name. -> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g. -> with the the `-p` (`--params`) option of `dvc stage add`). + + +Parameters are defined in the `params` field of `dvc.yaml`. See `dvc params`. + + Without arguments, `dvc params diff` compares parameters currently present in the workspace (uncommitted changes) with the latest committed diff --git a/content/docs/command-reference/params/index.md b/content/docs/command-reference/params/index.md index adb041326b..b86eb66c0c 100644 --- a/content/docs/command-reference/params/index.md +++ b/content/docs/command-reference/params/index.md @@ -1,6 +1,6 @@ # params -Contains a command to show changes in parameters: +Contains a command to show changes in parameters: [diff](/doc/command-reference/params/diff). ## Synopsis @@ -16,62 +16,69 @@ positional arguments: ## Description -In order to track parameters and hyperparameters associated to machine learning -experiments in DVC projects, DVC provides a different type of -dependencies: _parameters_. They usually have simple names like `epochs`, -`learning-rate`, `batch_size`, etc. +Parameters can be any values used inside your code to influence the results +(e.g. machine learning [hyperparameters]). DVC can track these as key/value +pairs from structured YAML 1.2, JSON, TOML 1.0, +[or Python](#examples-python-parameters-file) files (`params.yaml` by default). +Params usually have simple names like `epochs`, `learning-rate`, `batch_size`, +etc. Example: -To start tracking parameters, list them under the `params` field of `dvc.yaml` -stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For -example: +```yaml +epochs: 900 +tuning: + - learning-rate: 0.945 + - max_depth: 7 +paths: + - labels: 'materials/labels' + - truth: 'materials/ground' +``` + +To start tracking parameters, list their names under the `params` field of +`dvc.yaml` (manually or with the the `-p`/`--params` option of `dvc stage add`). +For example: ```yaml stages: learn: - cmd: ./deep.py + cmd: python deep.py # reads params.yaml internally params: - - epochs # track specific parameter (from params.yaml) - - tuning.learning-rate - - myparams.toml: # track specific params from custom file - - batch_size - - config.json: # track all parameters in this file + - epochs # specific param from params.yaml + - tuning.learning-rate # nested param from params.yaml + - paths # entire group from params.yaml + - myparams.toml: + - batch_size # param from custom file + - config.json: # all params in this file ``` -In contrast to a regular dependency, a parameter dependency is not -a file or directory. Instead, it consists of a _parameter name_ (or key) in a -_parameters file_, where the _parameter value_ should be found. This allows you -to define [stage](/doc/command-reference/run) dependencies more granularly: -changes to other parts of the params file will not affect the stage. Parameter -dependencies also prevent situations where several stages share a regular -dependency (e.g. a config file), and any change in it invalidates all of them -(see `dvc status`), causing unnecessary re-executions upon `dvc repro`. - -The default **parameters file** name is `params.yaml`, but any other YAML 1.2, -JSON, TOML 1.0, or [Python](#examples-python-parameters-file) files can be used -additionally (listed under `params:` as shown in the sample above). These files -are typically written manually (or they can be generated) and they can be -versioned directly with Git. - -**Parameter values** should be organized in tree-like hierarchies (dictionaries) -inside params files (see [Examples](#examples)). DVC will interpret param names -as the tree path to find those values. Supported types are: string, integer, -float, boolean, and arrays (groups of params). Note that DVC does not ascribe -any specific meaning to these values. + -DVC saves parameter names and values to `dvc.lock` in order to track them over -time. They will be compared to the latest params files to determine if the stage -is outdated upon `dvc repro` (or `dvc status`). +See [more details] about this syntax. + + -> Note that DVC does not pass the parameter values to stage commands. The -> commands executed by DVC will have to load and parse the parameters file by -> itself. +Multiple stages of a pipeline can [use the same params file] as +dependency, but only certain values will affect each +stage. + +Parameters can also be used for [templating] `dvc.yaml` itself (see also **Dict +Unpacking**), which means you can pass them to your [stage commands] as +command-line arguments. You can also load them in Python code with +`dvc.api.params_show()`. The `dvc params diff` command is available to show parameter changes, displaying their current and previous values. -💡 Parameters can also be used for -[templating](/doc/user-guide/project-structure/dvcyaml-files#templating) -`dvc.yaml` itself. +DVC saves parameter names and values to `dvc.lock` in order to track them over +time. They will be compared to the latest params files to determine if the stage +is outdated upon `dvc repro` (or `dvc status`). + +[hyperparameters]: + /doc/user-guide/experiment-management/running-experiments#tuning-hyperparameters +[use the same params file]: + /doc/user-guide/data-pipelines/defining-pipelines#parameter-dependencies +[more details]: /doc/user-guide/project-structure/dvcyaml-files#parameters +[templating]: /doc/user-guide/project-structure/dvcyaml-files#templating +[stage commands]: /doc/user-guide/project-structure/dvcyaml-files#stage-commands ## Options @@ -98,9 +105,9 @@ process: bow: 15000 ``` -Using `dvc stage add`, define a [stage](/doc/command-reference/run) that depends -on params `lr`, `layers`, and `epochs` from the params file above. Full paths -should be used to specify `layers` and `epochs` from the `train` group: +Using `dvc stage add`, define a stage that depends on params `lr`, +`layers`, and `epochs` from the params file above. Full paths should be used to +specify `layers` and `epochs` from the `train` group: ```cli $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ @@ -112,7 +119,7 @@ $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ > Python parameters files. The `train.py` script will have some code to parse and load the needed -parameters. For example, you can use `dvc.api.params_show()`: +parameters. You can use `dvc.api.params_show()` for this: ```py import dvc.api @@ -197,9 +204,13 @@ previous version, which is why all `Old` values are `—`. ## Examples: Python parameters file -> ⚠️ Note that complex expressions (unsupported by -> [ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval)) -> won't be parsed as DVC parameters. + + +See Note that complex expressions (unsupported by +[ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval)) +won't be parsed as DVC parameters. + + Consider this Python parameters file named `params.py`: @@ -237,8 +248,8 @@ class TestConfig: METRICS = ['metric'] ``` -The following [stage](/doc/command-reference/run) depends on params `BOOL`, -`INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`: +The following stage depends on params `BOOL`, `INT`, as well as +`TrainConfig`'s `EPOCHS` and `layers`: ```cli $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \ diff --git a/content/docs/user-guide/basic-concepts/parameter.md b/content/docs/user-guide/basic-concepts/parameter.md index bd1da45ce5..59ff11c65c 100644 --- a/content/docs/user-guide/basic-concepts/parameter.md +++ b/content/docs/user-guide/basic-concepts/parameter.md @@ -1,9 +1,10 @@ --- -name: 'Parameter Dependency' -match: [parameter, parameters, param, params, hyperparameter, hyperparameters] +name: 'Parameters' +match: [parameter, parameters] tooltip: >- - Pipeline stages (defined in `dvc.yaml`) can depend on specific values inside - an arbitrary YAML, JSON, TOML, or Python file (`params.yaml` by default). - Stages are invalid (considered outdated) when any of their parameter values - change. See [`dvc params`](/doc/command-reference/params). + Hyperparameters or other config values used by your code, loaded from a a + structured file (`params.yaml` by default). They can be tracked as granular + dependencies for stages of DVC pipelines (defined in `dvc.yaml`). DVC can also + compare them among machine learning experiments (useful for optimization). See + `dvc params`. --- diff --git a/content/docs/user-guide/experiment-management/index.md b/content/docs/user-guide/experiment-management/index.md index 01c90c47a3..d8aedc99e1 100644 --- a/content/docs/user-guide/experiment-management/index.md +++ b/content/docs/user-guide/experiment-management/index.md @@ -6,8 +6,8 @@ of the development of data features, hyperspace exploration, deep learning optimization, etc. Some of DVC's base features already help you codify and analyze experiments. -[Parameters](/doc/command-reference/params) are simple values in a formatted -text file which you can tweak and use in your code. On the other end, +[Parameters](/doc/command-reference/params) are values in a structured text +file, which you can tweak and use in your code. On the other end, [metrics](/doc/command-reference/metrics) (and [plots](/doc/command-reference/plots)) let you define, visualize, and compare quantitative measures of your results. diff --git a/content/docs/user-guide/experiment-management/running-experiments.md b/content/docs/user-guide/experiment-management/running-experiments.md index 8208cffc1d..a3d810c484 100644 --- a/content/docs/user-guide/experiment-management/running-experiments.md +++ b/content/docs/user-guide/experiment-management/running-experiments.md @@ -20,8 +20,8 @@ experiment(s). These files codify _pipelines_ that specify one or more ### Running the pipeline(s) -You can run the experiment pipeline using `dvc exp run`. It uses `./dvc.yaml` -(in the current directory) by default. +You can run the experiment pipelines using `dvc exp run`. It uses +`./dvc.yaml` (in the current directory) by default. ```dvc $ dvc exp run @@ -45,20 +45,20 @@ once. > 📖 `dvc exp run` is an experiment-specific alternative to `dvc repro`. [reproduction targets]: /doc/command-reference/repro#options -[dependency graph]: - /doc/user-guide/data-pipelines/defining-pipelines#directed-acyclic-graph +[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines ## Tuning (hyper)parameters -Parameters are the values that modify the behavior of coded processes -- in this -case producing different experiment results. Machine learning experimentation -often involves defining and searching hyperparameter spaces to improve the -resulting model metrics. +Parameters are any values used inside your code to tune modeling attributes, or +that affect experiment results in any other way. For example, a [random forest +classifier] may require a _maximum depth_ value. Machine learning +experimentation often involves defining and searching hyperparameter spaces to +improve the resulting model metrics. -In DVC project source code, parameters should be read from _params -files_ (`params.yaml` by default) and defined in `dvc.yaml`. When a tracked -param value has changed, `dvc exp run` invalidates any stages that depend on it, -and reproduces them. +Your source code should read params from structured [parameters files] +(`params.yaml` by default). Define them with the `params` field of `dvc.yaml` +for DVC to track them. When a param value has changed, `dvc exp run` invalidates +any stages that depend on it, and reproduces them. > 📖 See `dvc params` for more details. @@ -80,6 +80,11 @@ $ dvc exp run -S learning_rate=0.001 -S units=128 # set multiple params ... ``` +[random forest classifier]: + https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d +[parameters files]: + /doc/user-guide/project-structure/dvcyaml-files#parameters-files + ## Experiment results The results of the last `dvc exp run` can be seen in the workspace. diff --git a/content/docs/user-guide/pipelines/defining-pipelines.md b/content/docs/user-guide/pipelines/defining-pipelines.md index 003f87b978..c7495287e4 100644 --- a/content/docs/user-guide/pipelines/defining-pipelines.md +++ b/content/docs/user-guide/pipelines/defining-pipelines.md @@ -186,10 +186,10 @@ changed for the purpose of stage invalidation. ## Parameter dependencies A more granular type of dependency is the parameter (`params` field of -`dvc.yaml`), or _hyperparameters_ in machine learning. These represent simple -values used inside your code to tune data processing, or that affect stage -execution in any other way. For example, training a [Neural Network] usually -requires _batch size_ and _epoch_ values. +`dvc.yaml`), or _hyperparameters_ in machine learning. These are any values used +inside your code to tune data processing, or that affect stage execution in any +other way. For example, training a [Neural Network] usually requires _batch +size_ and _epoch_ values. Instead of hard-coding param values, your code can read them from a structured file (e.g. YAML format). DVC can track any key/value pair in a supported @@ -228,7 +228,8 @@ Use `dvc params diff` to compare parameters across project versions. Stage outputs are files (or directories) written by pipelines, for example machine learning models, intermediate artifacts, as well as data [plots] and performance [metrics]. These files are cached by DVC -automatically, and tracked with the help of `dvc.lock` files. +automatically, and tracked with the help of `dvc.lock` files (or `.dvc` files, +see `dvc add`). Outputs can be dependencies of subsequent stages (as explained earlier). So when they change, DVC may need to reproduce downstream stages as well (handled diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index 474889a34e..c404afe814 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -87,13 +87,20 @@ $ dvc stage add -n a_stage "./a_script.sh > /dev/null 2>&1" $ dvc exp init './another_script.sh $MYENVVAR' ``` -### Parameter dependencies + + +See also [Templating](#templating) (and **Dict Unpacking**) for useful ways to +parametrize `cmd` strings. + + + +### Parameters -[Parameters](/doc/command-reference/params) are a special type of stage -dependency. They consist of a list of params to track in one of these formats: +Parameters are simple key/value pairs consumed by the `command` +code from a structured [parameters file](#parameters-files). They are defined +per-stage in the `params` field of `dvc.yaml` and should contain one of these: -1. A param key/value pair that can be found in `params.yaml` (default params - file); +1. A param name that can be found in `params.yaml` (default params file); 2. A dictionary named by the file path to a custom params file, and with a list of param key/value pairs to find in it; 3. An empty set (give no value or use `null`) named by the file path to a params @@ -101,8 +108,7 @@ dependency. They consist of a list of params to track in one of these formats: -Note that file paths used must be to valid YAML, JSON, TOML, or Python -parameters file. +Dot-separated param names become tree paths to locate values in the params file. @@ -114,7 +120,7 @@ stages: - raw.txt params: - threshold # track specific param (from params.yaml) - - passes + - nn.batch_size - myparams.yaml: # track specific params from custom file - epochs - config.json: # track all parameters in this file @@ -122,8 +128,31 @@ stages: - clean.txt ``` -This allows several stages to depend on values of a shared structured file -(which can be versioned directly with Git). See also `dvc params diff`. + + +Params are a more granular type of stage dependency: multiple `stages` can use +the same params file, but only certain values will affect their state (see +`dvc status`). + + + +#### Parameters files + +The supported params file formats are YAML 1.2, JSON, TOML 1.0, [and Python]. +[Parameter](#parameters) key/value pairs should be organized in tree-like +hierarchies inside. Supported value types are: string, integer, float, boolean, +and arrays (groups of params). + +These files are typically written manually (or generated) and they can be +versioned directly with Git along with other workspace files. + +[and python]: /doc/command-reference/params#examples-python-parameters-file + + + +See also `dvc params diff` to compare params across project version. + + ### Metrics and Plots outputs @@ -173,7 +202,8 @@ models: ``` Those values can be used anywhere in `dvc.yaml` with the `${}` _substitution -expression_: +expression_, for example to pass parameters as command-line arguments to a +[stage command](#stage-command): ```yaml