diff --git a/content/blog/2020-10-12-october-20-dvc-heartbeat.md b/content/blog/2020-10-12-october-20-dvc-heartbeat.md index f34c309029..6a2e031020 100644 --- a/content/blog/2020-10-12-october-20-dvc-heartbeat.md +++ b/content/blog/2020-10-12-october-20-dvc-heartbeat.md @@ -107,7 +107,7 @@ few weeks, so stay tuned. Another big initative is adding videos to our docs: since video seems like a popular format for a lot of learners, we're working to supplement our official docs with embedded videos. Check out our first installment on the -[Getting Started with Data Versioning](https://dvc.org/doc/start/data-versioning). +[Getting Started with Data Versioning](/doc/start/data-and-model-versioning). https://youtu.be/kLKBcPonMYw diff --git a/content/blog/2020-11-11-november-20-dvc-heartbeat.md b/content/blog/2020-11-11-november-20-dvc-heartbeat.md index 6613dea11f..bf058e973b 100644 --- a/content/blog/2020-11-11-november-20-dvc-heartbeat.md +++ b/content/blog/2020-11-11-november-20-dvc-heartbeat.md @@ -64,7 +64,7 @@ welcome referrals if you know a good candidate)! We're continuing to develop our video docs, and now half of our "Getting Started" section has video accompaniments. Check out our latest release on -[data access with DVC](https://dvc.org/doc/start/data-access): +[data access with DVC](/doc/start/data-and-model-access): https://youtu.be/EE7Gk84OZY8 diff --git a/content/blog/2020-12-18-december-20-dvc-heartbeat.md b/content/blog/2020-12-18-december-20-dvc-heartbeat.md index 1822be48c4..ab4599b4ff 100644 --- a/content/blog/2020-12-18-december-20-dvc-heartbeat.md +++ b/content/blog/2020-12-18-december-20-dvc-heartbeat.md @@ -53,17 +53,17 @@ As you may have heard on adding complete video docs to the "Getting Started" section of the DVC site. We now have 100% coverage! We have videos that mirror the tutorials for: -- [Data versioning](https://dvc.org/doc/start/data-versioning) - how to use Git - and DVC together to track different versions of a dataset +- [Data versioning](/doc/start/data-and-model-versioning) - how to use Git and + DVC together to track different versions of a dataset -- [Data access](https://dvc.org/doc/start/data-access) - how to share models and +- [Data access](/doc/start/data-and-model-access) - how to share models and datasets across projects and environments -- [Pipelines](https://dvc.org/doc/start/data-pipelines) - how to create - reproducible pipelines to transform datasets to features to models +- [Pipelines](/doc/start/data-pipelines) - how to create reproducible pipelines + to transform datasets to features to models -- [Experiments](https://dvc.org/doc/start/experiments) - how to do a `git diff` - for models that compares and visualizes metrics +- [Experiments](/doc/start/experiments) - how to do a `git diff` for models that + compares and visualizes metrics https://media.giphy.com/media/L4ZZNbDpOCfiX8uYSd/giphy.gif diff --git a/content/docs/command-reference/diff.md b/content/docs/command-reference/diff.md index f5287cf179..aeb189d28d 100644 --- a/content/docs/command-reference/diff.md +++ b/content/docs/command-reference/diff.md @@ -123,8 +123,9 @@ $ dvc diff Let's checkout the [2-track-data](https://github.com/iterative/example-get-started/releases/tag/2-track-data) -tag, corresponding to the [Data Versioning](/doc/start/data-versioning) _Get -Started_ chapter, right after we added `data.xml` file with DVC: +tag, corresponding to the +[Data Versioning](/doc/start/data-and-model-versioning) _Get Started_ chapter, +right after we added `data.xml` file with DVC: ```dvc $ git checkout 2-track-data diff --git a/content/docs/command-reference/get.md b/content/docs/command-reference/get.md index a4fdd1cce5..20bf2fe221 100644 --- a/content/docs/command-reference/get.md +++ b/content/docs/command-reference/get.md @@ -151,7 +151,7 @@ file or directory from. It also has the `--out` option to specify the location to place the target data within the workspace. Combining these two options allows us to do something we can't achieve with the regular `git checkout` + `dvc checkout` process – see for example the -[Get Older Data Version](/doc/start/data-versioning#switching-between-versions) +[Get Older Data Version](/doc/start/data-and-model-versioning#switching-between-versions) chapter of our _Get Started_. Let's use the diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 2b13b2f386..0f56461cdb 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -190,8 +190,8 @@ $ git checkout 3-config-remote ## Example: Tracking a file from the web An advanced alternate to the intro of the -[Versioning Basics](/doc/start/data-versioning) part of the _Get Started_ is to -use `dvc import-url`: +[Versioning Basics](/doc/start/data-and-model-versioning) part of the _Get +Started_ is to use `dvc import-url`: ```dvc $ dvc import-url https://data.dvc.org/get-started/data.xml \ diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md index 29d38aa04b..a9278b0ac2 100644 --- a/content/docs/command-reference/import.md +++ b/content/docs/command-reference/import.md @@ -67,8 +67,8 @@ data `path`, and the `outs` field contains the corresponding local path in the workspace. It records enough metadata about the imported data to enable DVC efficiently determining whether the local copy is out of date. -To actually [version the data](/doc/start/data-versioning), `git add` (and -`git commit`) the import `.dvc` file. +To actually [version the data](/doc/start/data-and-model-versioning), `git add` +(and `git commit`) the import `.dvc` file. Note that `dvc repro` doesn't check or update import `.dvc` files (see `dvc freeze`), use `dvc update` to bring the import up to date from the data diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 2bd1f2fb3e..df2020cbb5 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -35,13 +35,13 @@ }, "children": [ { - "slug": "data-versioning", + "slug": "data-and-model-versioning", "tutorials": { "katacoda": "https://katacoda.com/dvc/courses/get-started/versioning" } }, { - "slug": "data-access", + "slug": "data-and-model-access", "tutorials": { "katacoda": "https://katacoda.com/dvc/courses/get-started/accessing" } diff --git a/content/docs/start/data-access.md b/content/docs/start/data-and-model-access.md similarity index 90% rename from content/docs/start/data-access.md rename to content/docs/start/data-and-model-access.md index 0c40d58df6..776db281f2 100644 --- a/content/docs/start/data-access.md +++ b/content/docs/start/data-and-model-access.md @@ -1,13 +1,13 @@ --- -title: 'Get Started: Data Access' +title: 'Get Started: Data and Model Access' --- -# Get Started: Data Access +# Get Started: Data and Model Access -Okay, now that we've learned how to _track_ data and models with DVC and how to -version them with Git, next question is how can we _use_ these artifacts outside -of the project? How do I download a model to deploy it? How do I download a -specific version of a model? How do I reuse datasets across different projects? +Okay, we've learned how to _track_ data and models with DVC, and how to commit +their versions to Git. The next questions are: How can we _use_ these artifacts +outside of the project? How do I download a model to deploy it? How to download +a specific version of a model? Or reuse datasets across different projects? > These questions tend to come up when you browse the files that DVC saves to > remote storage, e.g. diff --git a/content/docs/start/data-versioning.md b/content/docs/start/data-and-model-versioning.md similarity index 88% rename from content/docs/start/data-versioning.md rename to content/docs/start/data-and-model-versioning.md index 49f8801246..9f61fc1e05 100644 --- a/content/docs/start/data-versioning.md +++ b/content/docs/start/data-and-model-versioning.md @@ -1,6 +1,6 @@ --- -title: 'Get Started: Data Versioning' -description: 'Get started with data versioning in DVC. Learn how to use a +title: 'Get Started: Data and Model Versioning' +description: 'Get started with data and model versioning in DVC. Learn how to use a regular Git workflow for datasets and ML models, without storing large files in Git.' --- @@ -14,8 +14,8 @@ to a different version of a 100Gb file in less than a second with a `git checkout`. The foundation of DVC consists of a few commands that you can run along with -`git` to track large files, directories, or ML models. Think "Git for data". -Read on or watch our video to learn about versioning data with DVC! +`git` to track large files, directories, or ML model files. Think "Git for +data". Read on or watch our video to learn about versioning data with DVC! https://youtu.be/kLKBcPonMYw @@ -34,8 +34,8 @@ $ dvc get https://github.com/iterative/dataset-registry \ ``` We use the fancy `dvc get` command to jump ahead a bit and show how Git repo -becomes a source for datasets or models - what we call "data registry" or "model -registry". `dvc get` can download any file or directory tracked in a DVC +becomes a source for datasets or models - what we call "data/model registry". +`dvc get` can download any file or directory tracked in a DVC repository. It's like `wget`, but for DVC or Git repos. In this case we download the latest version of the `data.xml` file from the [dataset registry](https://github.com/iterative/dataset-registry) repo as the @@ -90,10 +90,10 @@ outs: ## Storing and sharing -You can upload DVC-tracked data or models with `dvc push`, so they're safely -stored [remotely](/doc/command-reference/remote). This also means they can be -retrieved on other environments later with `dvc pull`. First, we need to setup a -storage: +You can upload DVC-tracked data or model files with `dvc push`, so they're +safely stored [remotely](/doc/command-reference/remote). This also means they +can be retrieved on other environments later with `dvc pull`. First, we need to +setup a storage: ```dvc $ dvc remote add -d storage s3://mybucket/dvcstore @@ -154,9 +154,9 @@ a3 ## Retrieving -Having DVC-tracked data stored remotely, it can be downloaded when needed in -other copies of this project with `dvc pull`. Usually, we run it -after `git clone` and `git pull`. +Having DVC-tracked data and models stored remotely, it can be downloaded when +needed in other copies of this project with `dvc pull`. Usually, we +run it after `git clone` and `git pull`.
diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 01109bf688..ebde96c3d2 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -143,8 +143,8 @@ stages: There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared` in this case); `dvc run` already took care of this. You only need to run `dvc push` if you want to save them to -[remote storage](/doc/start/data-versioning#storing-and-sharing), (usually along -with `git commit` to version `dvc.yaml` itself). +[remote storage](/doc/start/data-and-model-versioning#storing-and-sharing), +(usually along with `git commit` to version `dvc.yaml` itself). ## Dependency graphs (DAGs) diff --git a/content/docs/start/experiments.md b/content/docs/start/experiments.md index e66a401bc1..84831ccc64 100644 --- a/content/docs/start/experiments.md +++ b/content/docs/start/experiments.md @@ -172,8 +172,8 @@ $ git commit -a -m "Preserve best random forest experiment" ## Sharing experiments After committing the best experiments to our Git branch, we can -[store and share](/doc/start/data-versioning#storing-and-sharing) them remotely -like any other iteration of the pipeline. +[store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them +remotely like any other iteration of the pipeline. ```dvc dvc push diff --git a/content/docs/start/index.md b/content/docs/start/index.md index 4d9975cb63..34c77a99a8 100644 --- a/content/docs/start/index.md +++ b/content/docs/start/index.md @@ -53,15 +53,16 @@ Now you're ready to DVC! DVC's features can be grouped into functional components. We'll explore them one by one in the next few pages: -- [**Data versioning**](/doc/start/data-versioning) (try this next) is the base - layer of DVC for large files, datasets, and machine learning models. Use a - regular Git workflow, but without storing large files in the repo (think "Git - for data"). Data is stored separately, which allows for efficient sharing. - -- [**Data access**](/doc/start/data-access) shows how to use data artifacts from - outside of the project and how to import data artifacts from another DVC - project. This can help to download a specific version of an ML model to a - deployment server or import a model to another project. +- [**Data and model versioning**](/doc/start/data-and-model-versioning) (try + this next) is the base layer of DVC for large files, datasets, and machine + learning models. Use a regular Git workflow, but without storing large files + in the repo (think "Git for data"). Data is stored separately, which allows + for efficient sharing. + +- [**Data and model access**](/doc/start/data-and-model-access) shows how to use + data artifacts from outside of the project and how to import data artifacts + from another DVC project. This can help to download a specific version of an + ML model to a deployment server or import a model to another project. - [**Data pipelines**](/doc/start/data-pipelines) describe how models and other data artifacts are built, and provide an efficient way to reproduce them. diff --git a/content/docs/use-cases/data-registries.md b/content/docs/use-cases/data-registries.md index 81f5fb9ace..2da980b841 100644 --- a/content/docs/use-cases/data-registries.md +++ b/content/docs/use-cases/data-registries.md @@ -2,10 +2,10 @@ One of the main uses of DVC repositories is the [versioning of data and model files](/doc/use-cases/data-and-model-files-versioning). -DVC also enables cross-project [reusability](/doc/start/data-access) of these -data artifacts. This means that your projects can depend on data -from other DVC repositories — like a **package management system for data -science**. +DVC also enables cross-project [reusability](/doc/start/data-and-model-access) +of these data artifacts. This means that your projects can depend +on data from other DVC repositories — like a **package management system for +data science**. ![](/img/data-registry.png) _Data management middleware_ diff --git a/content/docs/use-cases/versioning-data-and-model-files/index.md b/content/docs/use-cases/versioning-data-and-model-files/index.md index 9eff871022..099a4b9905 100644 --- a/content/docs/use-cases/versioning-data-and-model-files/index.md +++ b/content/docs/use-cases/versioning-data-and-model-files/index.md @@ -65,7 +65,7 @@ Benefits of our approach include: - **Collaboration**: Easily distribute your project development and share its data [internally](/doc/use-cases/shared-development-server) and [remotely](/doc/use-cases/sharing-data-and-model-files), or - [reuse](/doc/start/data-access) it in other places. + [reuse](/doc/start/data-and-model-access) it in other places. - **Data compliance**: Review data modification attempts as Git [pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/). diff --git a/content/docs/user-guide/project-structure/dvc-files.md b/content/docs/user-guide/project-structure/dvc-files.md index 1c3b994f28..333d3ee6fd 100644 --- a/content/docs/user-guide/project-structure/dvc-files.md +++ b/content/docs/user-guide/project-structure/dvc-files.md @@ -3,7 +3,8 @@ You can use `dvc add` to track data files or directories located in your current workspace\*. Additionally, `dvc import` and `dvc import-url` let you bring data from external locations to your project, and start tracking it -locally. See [Data Versioning](/doc/start/data-versioning) for more info. +locally. See [Data Versioning](/doc/start/data-and-model-versioning) for more +info. > \* Certain [external locations](/doc/user-guide/managing-external-data) are > also supported. diff --git a/redirects-list.json b/redirects-list.json index aa45068fcd..e25c14bd20 100644 --- a/redirects-list.json +++ b/redirects-list.json @@ -23,6 +23,8 @@ "^/(?:docs|documentation)(/.*)?$ /doc$1", "^/doc/get-started(/.*)?$ /doc/start", + "^/doc/start/data-versioning$ /doc/start/data-and-model-versioning", + "^/doc/start/data-access$ /doc/start/data-and-model-access", "^/doc/tutorial(/.*)?$ /doc/start", "^/doc/tutorials/get-started(/.*)?$ /doc/start", "^/doc/tutorials/versioning(/.*)?$ /doc/use-cases/versioning-data-and-model-files/tutorial",