From 5926097977b62d2056b76fe2ed6731abf8220f57 Mon Sep 17 00:00:00 2001 From: Noah Date: Fri, 31 Jul 2020 14:10:27 -0400 Subject: [PATCH 01/10] Update data-pipelines.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Saw a typo and got a bit carried away. I'm a big DVC fan! 💫 👨‍💻 --- content/docs/start/data-pipelines.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 45ac59a48e..c0c59b4ccf 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -292,17 +292,14 @@ prepare: DVC pipelines (`dvc.yaml` file, `dvc run`, and `dvc repro` commands) solve a few important problems: -- _Automation_ - run sequence of steps in a "smart" way that makes iterating on - the project faster. It automatically determines which parts of a project need - to be run, it caches "runs" and results — all to avoid running the same stage - again. -- _Reproducibility_ - it can describe and capture what data should be used and - what commands to run to produce an ML model, for example. It's described and - captured in way that is easy to put into Git. It means that it's easy to - version and share. +- _Automation_ - run a sequence of steps in a "smart" way to iterate on your + project faster. DVC caches "runs" and results in stages to avoid unnecessary + re-runs. +- _Reproducibility_ - YAML files describe and capture what data to use and + what commands to run to produce an ML model. Storing these files in Git + makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing - project in way that it can be reproduced (built) is the fist necessary step - before introducing CI/CD systems. + reproducible ML pipelines (builds) facilitates CI/CD systems. ## Visualize From f79c3a0ef0276782f8cc9a6fc008504eaff48faa Mon Sep 17 00:00:00 2001 From: Noah Date: Fri, 31 Jul 2020 15:10:55 -0400 Subject: [PATCH 02/10] update wording; be more specific; extrapolate on CI/CD --- content/docs/start/data-pipelines.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index c0c59b4ccf..33b7d7bf25 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -292,14 +292,20 @@ prepare: DVC pipelines (`dvc.yaml` file, `dvc run`, and `dvc repro` commands) solve a few important problems: -- _Automation_ - run a sequence of steps in a "smart" way to iterate on your - project faster. DVC caches "runs" and results in stages to avoid unnecessary +- _Automation_ - run a sequence of steps in a "smart" way to make iterating on + your project faster. DVC caches "runs" and results in stages and automatically + determines which parts of a project need to be run to avoid unnecessary re-runs. -- _Reproducibility_ - YAML files describe and capture what data to use and - what commands to run to produce an ML model. Storing these files in Git - makes it easy to version and share. -- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing - reproducible ML pipelines (builds) facilitates CI/CD systems. +- _Reproducibility_ - `dvc.yaml` and `dvc.lock` files describe and capture what + data to use and what commands to run to produce an ML model. Storing these + files in Git makes it easy to version and share. +- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible + ML pipelines allow CI/CD systems to retrain + ([repro](https://dvc.org/doc/command-reference/repro)) models on fresh + datasets with identical preprocessing and training stages, version upstream + models and datasets, and easily + [compare](https://dvc.org/doc/start/experiments#comparing-experiments) metrics + with currently deployed models. ## Visualize From 66664be0c35808dbbb0fc57dd169fd8474ae42af Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 14:20:31 -0500 Subject: [PATCH 03/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 33b7d7bf25..3872ea6edf 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -300,8 +300,7 @@ important problems: data to use and what commands to run to produce an ML model. Storing these files in Git makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible - ML pipelines allow CI/CD systems to retrain - ([repro](https://dvc.org/doc/command-reference/repro)) models on fresh + ML pipelines allow CI/CD systems to retrain models on fresh datasets with identical preprocessing and training stages, version upstream models and datasets, and easily [compare](https://dvc.org/doc/start/experiments#comparing-experiments) metrics From 6127df11ec9e9970e7860acb53bc3c854ed4f86f Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 14:20:51 -0500 Subject: [PATCH 04/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 3872ea6edf..ba9e57f7ad 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -301,10 +301,7 @@ important problems: files in Git makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible ML pipelines allow CI/CD systems to retrain models on fresh - datasets with identical preprocessing and training stages, version upstream - models and datasets, and easily - [compare](https://dvc.org/doc/start/experiments#comparing-experiments) metrics - with currently deployed models. + datasets with identical training, and save the results. ## Visualize From c951b2f70ecc96d52eb5230497d4bc7ae2502ac1 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 14:58:08 -0500 Subject: [PATCH 05/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index ba9e57f7ad..e616630546 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -292,7 +292,7 @@ prepare: DVC pipelines (`dvc.yaml` file, `dvc run`, and `dvc repro` commands) solve a few important problems: -- _Automation_ - run a sequence of steps in a "smart" way to make iterating on +- _Automation_ - run a sequence of steps in a "smart" way that makes iterating on your project faster. DVC caches "runs" and results in stages and automatically determines which parts of a project need to be run to avoid unnecessary re-runs. From b83ba4070e1de46041e412b8e972ee8ea6d0ca7a Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 14:58:16 -0500 Subject: [PATCH 06/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index e616630546..763e09d98b 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -296,8 +296,9 @@ important problems: your project faster. DVC caches "runs" and results in stages and automatically determines which parts of a project need to be run to avoid unnecessary re-runs. -- _Reproducibility_ - `dvc.yaml` and `dvc.lock` files describe and capture what - data to use and what commands to run to produce an ML model. Storing these +- _Reproducibility_ - `dvc.yaml` and `dvc.lock` files describe what data to use + and which commands will generate the pipeline results (such as an ML + model). Storing these files in Git makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible ML pipelines allow CI/CD systems to retrain models on fresh From 8488996e24327b4a50b4e264add2a3814b944352 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 14:58:23 -0500 Subject: [PATCH 07/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 763e09d98b..5039d777dd 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -302,7 +302,8 @@ important problems: files in Git makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible ML pipelines allow CI/CD systems to retrain models on fresh - datasets with identical training, and save the results. + datasets with identical training, save the results, and even produce reports + about the whole process. See [CML.dev](https://cml.dev/) for some examples. ## Visualize From 0a9ff511630f196b4c1190e07d520c3d528776f3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 16:39:30 -0500 Subject: [PATCH 08/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 5039d777dd..918ac5e54f 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -293,8 +293,8 @@ DVC pipelines (`dvc.yaml` file, `dvc run`, and `dvc repro` commands) solve a few important problems: - _Automation_ - run a sequence of steps in a "smart" way that makes iterating on - your project faster. DVC caches "runs" and results in stages and automatically - determines which parts of a project need to be run to avoid unnecessary + your project faster. DVC automatically determines which parts of a project + need to be run, and it caches "runs" and their results, to avoid unnecessary re-runs. - _Reproducibility_ - `dvc.yaml` and `dvc.lock` files describe what data to use and which commands will generate the pipeline results (such as an ML From b5c5923fa594c5bca8efd9c75f03d12b92f8dd78 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Fri, 31 Jul 2020 17:45:40 -0500 Subject: [PATCH 09/10] Update content/docs/start/data-pipelines.md --- content/docs/start/data-pipelines.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 918ac5e54f..1c2298673f 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -300,10 +300,9 @@ important problems: and which commands will generate the pipeline results (such as an ML model). Storing these files in Git makes it easy to version and share. -- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - reproducible - ML pipelines allow CI/CD systems to retrain models on fresh - datasets with identical training, save the results, and even produce reports - about the whole process. See [CML.dev](https://cml.dev/) for some examples. +- _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing + projects in way that it can be reproduced (built) is the fist necessary step + before introducing CI/CD systems. ## Visualize From e17e1b07cb9c6be9101f098e78ccd302781a7c17 Mon Sep 17 00:00:00 2001 From: "Restyled.io" Date: Fri, 31 Jul 2020 22:46:35 +0000 Subject: [PATCH 10/10] Restyled by prettier --- content/docs/start/data-pipelines.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 1c2298673f..efb7da34e4 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -292,14 +292,13 @@ prepare: DVC pipelines (`dvc.yaml` file, `dvc run`, and `dvc repro` commands) solve a few important problems: -- _Automation_ - run a sequence of steps in a "smart" way that makes iterating on - your project faster. DVC automatically determines which parts of a project +- _Automation_ - run a sequence of steps in a "smart" way that makes iterating + on your project faster. DVC automatically determines which parts of a project need to be run, and it caches "runs" and their results, to avoid unnecessary re-runs. - _Reproducibility_ - `dvc.yaml` and `dvc.lock` files describe what data to use - and which commands will generate the pipeline results (such as an ML - model). Storing these - files in Git makes it easy to version and share. + and which commands will generate the pipeline results (such as an ML model). + Storing these files in Git makes it easy to version and share. - _Continuous Delivery and Continuous Integration (CI/CD) for ML_ - describing projects in way that it can be reproduced (built) is the fist necessary step before introducing CI/CD systems.