diff --git a/get-started/06-experiments/01-running-experiments.md b/get-started/06-experiments/01-running-experiments.md index 602e638..d30a994 100644 --- a/get-started/06-experiments/01-running-experiments.md +++ b/get-started/06-experiments/01-running-experiments.md @@ -15,17 +15,16 @@ see the help text first: The first command we'll use is `dvc exp run`. It's like `dvc repro` with added features for experiments, like changing the hyperparameters with command line -options: +options: ``` -dvc exp run --set-param featurize.max_features=1500 \ - -S featurize.ngrams=2 +dvc exp run --set-param model.name=mlp ```{{execute}} The `--set-param` (or `-S`) flag sets the values for parameters as a shortcut to editing `params.yaml`. -Check that the `featurize.max_features` value has been updated in `params.yaml`: +Note that `model.name` parameter has been updated in `params.yaml`: `git diff params.yaml`{{execute}} diff --git a/get-started/06-experiments/02-queueing-experiments.md b/get-started/06-experiments/02-queueing-experiments.md index 69ac999..b8100a7 100644 --- a/get-started/06-experiments/02-queueing-experiments.md +++ b/get-started/06-experiments/02-queueing-experiments.md @@ -1,9 +1,7 @@ ## Queueing experiments -We have been tuning the `featurize` stage so far, but there are also parameters -for the `train` stage, which trains a [random forest classifier][rfc]. - -[rfc]: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html +For the MLP version of the project, we have two parameters that change the +number of hidden units and the activation function. `example-get-started/params.yaml`{{open}} @@ -12,10 +10,10 @@ combinations we want to try without executing anything, by using the `--queue` flag: ``` -dvc exp run --queue -n exp-1 -S train.n_est=50 -dvc exp run --queue -n exp-2 -S train.n_est=100 -dvc exp run --queue -n exp-3 -S train.n_est=150 -dvc exp run --queue -n exp-4 -S train.n_est=200 +dvc exp run --queue -n exp-1 -S model.mlp.units=32 +dvc exp run --queue -n exp-2 -S model.mlp.units=64 +dvc exp run --queue -n exp-3 -S model.mlp.units=128 +dvc exp run --queue -n exp-4 -S model.mlp.units=256 ```{{execute}} The `-n` option is used to label the experiments. If it's not specified, diff --git a/get-started/06-experiments/03-comparing-experiments.md b/get-started/06-experiments/03-comparing-experiments.md index fc45722..65ca8ea 100644 --- a/get-started/06-experiments/03-comparing-experiments.md +++ b/get-started/06-experiments/03-comparing-experiments.md @@ -6,9 +6,9 @@ To compare all of these experiments, we need more than `dvc exp diff`: ``` dvc exp show --no-timestamp \ - --include-params train.n_est \ + --include-params model.mlp.units \ --no-pager ```{{execute}} -Although the differences in metrics are minuscule due to the small size of -the data set, `exp-2` is a bit better in terms of `avg_prec`. +As we have the most hidden units in MLP for `exp-4`, it has the highest +`categorical_accuracy`. \ No newline at end of file diff --git a/get-started/06-experiments/04-persisting-experiments.md b/get-started/06-experiments/04-persisting-experiments.md index b1511bd..7d987d1 100644 --- a/get-started/06-experiments/04-persisting-experiments.md +++ b/get-started/06-experiments/04-persisting-experiments.md @@ -5,22 +5,22 @@ ignore the rest. `dvc exp apply` rolls back the workspace to the specified experiment: -`dvc exp apply exp-2`{{execute}} +`dvc exp apply exp-4`{{execute}} `dvc exp apply` is similar to [`dvc checkout`][dvccheckout], but it works with experiments. DVC tracks everything in the pipeline for each experiment (parameters, metrics, dependencies, and outputs) and can later retrieve it as needed. -Check that `scores.json` reflects the metrics in the table above: +Check that `metrics.json` reflects the metrics in the table above: -`example-get-started/scores.json`{{open}} +`example-get-started/metrics.json`{{open}} Once an experiment has been applied to the workspace, it is no different from reproducing the result without `dvc exp run`. Let's make it persistent in our regular pipeline by committing it in our Git branch: ``` -git add dvc.lock params.yaml prc.json roc.json scores.json +git add dvc.lock params.yaml metrics.json train.log.csv git commit -m "Preserve best Avg. Prec. experiment" ```{{execute}} diff --git a/get-started/06-experiments/05-cleaning-up.md b/get-started/06-experiments/05-cleaning-up.md index acb6855..42f6c86 100644 --- a/get-started/06-experiments/05-cleaning-up.md +++ b/get-started/06-experiments/05-cleaning-up.md @@ -5,7 +5,7 @@ experiments table: ``` dvc exp show --no-timestamp \ - --include-params train.n_est \ + --include-params model.mlp.units \ --no-pager ```{{execute}} @@ -16,7 +16,7 @@ experiments from the previous _n_ commits: ``` dvc exp show -n 2 --no-timestamp \ - --include-params train.n_est \ + --include-params model.mlp.units \ --no-pager ```{{execute}} @@ -27,7 +27,7 @@ Eventually, old experiments may clutter the experiments table. ``` dvc exp gc --workspace dvc exp show -n 2 --no-timestamp \ - --include-params train.n_est \ + --include-params model.mlp.units \ --no-pager ```{{execute}} diff --git a/get-started/06-experiments/init.sh b/get-started/06-experiments/init.sh index 182e6f0..2f8e902 100755 --- a/get-started/06-experiments/init.sh +++ b/get-started/06-experiments/init.sh @@ -4,12 +4,25 @@ PS1='\[\033[01;34m\]\w\[\033[00m\]$ \[\033[01;32m\]' trap 'echo -ne "\033[00m"' DEBUG +export CONTAINER="dvcorg/doc-katacoda:start-experiments" + +docker volume create example-get-started + +if [ -e /root/example-get-started ] ; then + rm -rf /root/example-get-started +fi +ln -s /var/lib/docker/volumes/example-get-started/_data example-get-started + clear :;: ########################################### :;: INSTALLING CONTAINER FOR THE SCENARIO :;: ########################################### -until [ -f /tmp/docker-ready ] ; do echo -n "." ; sleep 1 ; done +# until [ -f /tmp/docker-ready ] ; do echo -n "." ; sleep 1 ; done + +echo "Starting: $CONTAINER" + +docker run -d -it --name dvc -v example-get-started:/root/example-get-started "$CONTAINER" clear diff --git a/get-started/06-experiments/install.sh b/get-started/06-experiments/install.sh index cc3258b..99f6d9d 100755 --- a/get-started/06-experiments/install.sh +++ b/get-started/06-experiments/install.sh @@ -1,14 +1,14 @@ #!/bin/bash -export CONTAINER="dvcorg/doc-katacoda:start-experiments" - -docker volume create example-get-started - -if [ -e /root/example-get-started ] ; then - rm -rf /root/example-get-started -fi -ln -s /var/lib/docker/volumes/example-get-started/_data /root/example-get-started - -docker run -d -it --name dvc -v example-get-started:/root/example-get-started "$CONTAINER" +# export CONTAINER="dvcorg/doc-katacoda:start-experiments" +# +# docker volume create example-get-started +# +# if [ -e /root/example-get-started ] ; then +# rm -rf /root/example-get-started +# fi +# ln -s /var/lib/docker/volumes/example-get-started/_data example-get-started +# +# docker run -d -it --name dvc -v example-get-started:/root/example-get-started "$CONTAINER" touch /tmp/docker-ready diff --git a/get-started/06-experiments/intro.md b/get-started/06-experiments/intro.md index d2d5000..6576b68 100644 --- a/get-started/06-experiments/intro.md +++ b/get-started/06-experiments/intro.md @@ -1,10 +1,14 @@ -Experiments proliferate quickly in ML projects where there are many -parameters to tune or other permutations of the code. DVC 2.0 introduces a new -way to organize such projects and only keep what we ultimately need with `dvc +Experiments proliferate quickly in ML projects where there are many parameters +to tune or other permutations of the code. DVC 2.0 introduces a new way to +organize such projects and only keep what we ultimately need with `dvc experiments`. DVC can track experiments for you so there's no need to commit each one to Git. This way your repo doesn't become polluted with all of them. You can discard experiments once they're no longer needed. +For this scenario we have a new project that uses Tensorflow and the venerable +[MNIST](http://yann.lecun.com/exdb/mnist/) dataset. The project has +two Artifical Neural Networks with several hyperparameters. + > 📖 See [Experiment Management](https://dvc.org/doc/user-guide/experiment-management) for more > information on DVC's approach.