diff --git a/content/docs/start/data-access.md b/content/docs/start/data-and-model-access.md similarity index 87% rename from content/docs/start/data-access.md rename to content/docs/start/data-and-model-access.md index 0c40d58df6..1f3417288e 100644 --- a/content/docs/start/data-access.md +++ b/content/docs/start/data-and-model-access.md @@ -1,13 +1,16 @@ --- -title: 'Get Started: Data Access' +title: 'Get Started: Data and Model Access' --- -# Get Started: Data Access +# Get Started: Data and Model Access -Okay, now that we've learned how to _track_ data and models with DVC and how to -version them with Git, next question is how can we _use_ these artifacts outside -of the project? How do I download a model to deploy it? How do I download a -specific version of a model? How do I reuse datasets across different projects? +Okay, now that we've learned how to _track_ data files in DVC and how to version +them with Git. _Models_ in a machine learning project are also files written and +read by programs and DVC can track and version them similar to data files. + +Next question is how can we _use_ these artifacts outside of the project? How do +I download a model to deploy it? How do I download a specific version of a +model? How do I reuse datasets across different projects? > These questions tend to come up when you browse the files that DVC saves to > remote storage, e.g. diff --git a/content/docs/start/data-versioning.md b/content/docs/start/data-and-model-versioning.md similarity index 91% rename from content/docs/start/data-versioning.md rename to content/docs/start/data-and-model-versioning.md index c26dc16619..3ca1b43bc8 100644 --- a/content/docs/start/data-versioning.md +++ b/content/docs/start/data-and-model-versioning.md @@ -1,6 +1,6 @@ --- -title: 'Get Started: Data Versioning' -description: 'Get started with data versioning in DVC. Learn how to use a +title: 'Get Started: Data and Model Versioning' +description: 'Get started with data and model versioning in DVC. Learn how to use a regular Git workflow for datasets and ML models, without storing large files in Git.' --- @@ -247,6 +247,16 @@ defines data file versions. Git itself provides the version control. DVC in turn creates these `.dvc` files, updates them, and synchronizes DVC-tracked data in the workspace efficiently to match them. +## Model versioning + +Apart from data files, DVC eases the way you work with models. Models in a +project usually change more frequently than data files and they need to be kept +in sync with changes in other elements of a project. Model files are no +different than data files when it comes to tracking their versions. DVC also +provides means to track minor changes in model files without fully checking in +to underlying VCS. In later sections of this series, you'll see how DVC enables +to track changes in pipelines consisting of multiple model and data files. + ## Large datasets versioning In cases where you process very large datasets, you need an efficient mechanism diff --git a/content/docs/start/data-pipelines.md b/content/docs/start/data-pipelines.md index 863808c75f..f9fac2ddcf 100644 --- a/content/docs/start/data-pipelines.md +++ b/content/docs/start/data-pipelines.md @@ -143,7 +143,7 @@ stages: There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared` in this case); `dvc run` already took care of this. You only need to run `dvc push` if you want to save them to -[remote storage](/doc/tutorials/get-started/data-versioning#storing-and-sharing), +[remote storage](/doc/start/data-and-model-versioning#storing-and-sharing), (usually along with `git commit` to version `dvc.yaml` itself). ## Dependency graphs (DAGs) diff --git a/content/docs/start/index.md b/content/docs/start/index.md index b92e72be3f..65a7959abd 100644 --- a/content/docs/start/index.md +++ b/content/docs/start/index.md @@ -53,15 +53,16 @@ Now you're ready to DVC! DVC's features can be grouped into functional components. We'll explore them one by one in the next few pages: -- [**Data versioning**](/doc/start/data-versioning) (try this next) is the base - layer of DVC for large files, datasets, and machine learning models. Use a - regular Git workflow, but without storing large files in the repo (think "Git - for data"). Data is stored separately, which allows for efficient sharing. - -- [**Data access**](/doc/start/data-access) shows how to use data artifacts from - outside of the project and how to import data artifacts from another DVC - project. This can help to download a specific version of an ML model to a - deployment server or import a model to another project. +- [**Data and model versioning**](/doc/start/data-and-model-versioning) (try + this next) is the base layer of DVC for large files, datasets, and machine + learning models. Use a regular Git workflow, but without storing large files + in the repo (think "Git for data"). Data is stored separately, which allows + for efficient sharing. + +- [**Data and model access**](/doc/start/data-and-model-access) shows how to use + data artifacts from outside of the project and how to import data artifacts + from another DVC project. This can help to download a specific version of an + ML model to a deployment server or import a model to another project. - [**Data pipelines**](/doc/start/data-pipelines) describe how models and other data artifacts are built, and provide an efficient way to reproduce them. diff --git a/redirects-list.json b/redirects-list.json index 52a08710d8..f870e50363 100644 --- a/redirects-list.json +++ b/redirects-list.json @@ -26,6 +26,8 @@ "^/doc/tutorials/get-started(/.*)?$ /doc/start", "^/doc/tutorials/versioning(/.*)?$ /doc/use-cases/versioning-data-and-model-files/tutorial", "^/doc/tutorials(/.*)? /doc/start", + "^/doc/start/data-versioning(/.*)?$ /doc/start/data-and-model-versioning", + "^/doc/start/data-access(/.*)?$ /doc/start/data-and-model-access", "^/doc/use-cases/data-and-model-files-versioning/?$ /doc/use-cases/versioning-data-and-model-files", "^/doc/user-guide/updating-tracked-files$ /doc/user-guide/how-to/update-tracked-data",