diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json
index 5caa56b848..68b5ca0b26 100644
--- a/content/docs/sidebar.json
+++ b/content/docs/sidebar.json
@@ -65,10 +65,15 @@
"source": "use-cases/index.md",
"children": [
{
- "label": "Versioning Data & Model Files",
+ "label": "Versioning Data and Models",
"slug": "versioning-data-and-model-files",
"source": "versioning-data-and-model-files/index.md",
- "children": ["tutorial"]
+ "children": [
+ {
+ "label": "Tutorial 👩💻",
+ "slug": "tutorial"
+ }
+ ]
},
{
"label": "Sharing Data and Model Files",
diff --git a/content/docs/start/data-versioning.md b/content/docs/start/data-versioning.md
index d509e68a5b..552bc63a8f 100644
--- a/content/docs/start/data-versioning.md
+++ b/content/docs/start/data-versioning.md
@@ -85,7 +85,7 @@ outs:
> \* See
> [Large Dataset Optimization](/doc/user-guide/large-dataset-optimization) and
-> `dvc config cache` for more information on file linking.
+> `dvc config cache` for more info. on file linking.
diff --git a/content/docs/use-cases/index.md b/content/docs/use-cases/index.md
index b028d58650..55933d74d5 100644
--- a/content/docs/use-cases/index.md
+++ b/content/docs/use-cases/index.md
@@ -18,8 +18,8 @@ knowledge, they are still difficult to implement, reuse, and manage.
If you store and process data files or datasets to produce other data or machine
learning models, and you want to
-- track and save data and ML models the same way you capture code;
-- create and switch among different
+- track and save data and machine learning models the same way you capture code;
+- create and switch between
[versions of data and ML models](/doc/use-cases/versioning-data-and-model-files)
easily;
- understand how datasets and ML artifacts were built in the first place;
diff --git a/content/docs/use-cases/versioning-data-and-model-files/index.md b/content/docs/use-cases/versioning-data-and-model-files/index.md
index 448bdeab55..df3e8307a8 100644
--- a/content/docs/use-cases/versioning-data-and-model-files/index.md
+++ b/content/docs/use-cases/versioning-data-and-model-files/index.md
@@ -1,130 +1,88 @@
-# Versioning Data and Model Files
-
-DVC enables versioning large files and directories such as datasets, data
-science features, and machine learning models using Git, but without storing the
-contents in Git.
-
-This is achieved by saving information about the data in special
-[metafiles](/doc/user-guide/dvc-files-and-directories) that replace the data in
-the repository. These can be versioned with regular Git workflows (branches,
-pull requests, etc.)
-
-To actually store the data, DVC uses a built-in cache, and supports
-synchronizing it with various types of
-[remote storage](/doc/command-reference/remote). This allows for easy data and
-model versioning, storage, and sharing — right alongside code.
-
-![](/img/model-versioning-diagram.png) _Code and data flows in DVC_
-
-In this basic use case, DVC is a better alternative to
-[Git-LFS / Git-annex](/doc/user-guide/related-technologies) and to ad-hoc
-scripts used to manage ML artifacts (training data, models, etc.)
-on cloud storage. DVC doesn't require special services, and works with
-on-premises storage (e.g. SSH, NAS) as well as any major cloud storage provider
-(Amazon S3, Microsoft Azure, Google Drive,
-[among others](/doc/command-reference/remote/add#supported-storage-types)).
-
-> For hands-on experience, we recommend following the
-> [versioning tutorial](/doc/use-cases/versioning-data-and-model-files).
-
-## DVC is not Git!
-
-DVC metafiles such as `dvc.yaml` and `.dvc` files serve as placeholders to track
-data files and directories for versioning (among other purposes). They point to
-specific data contents in the cache, providing the ability to store
-multiple data versions out-of-the-box.
-
-Full-fledged
-[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
-is left for Git and its hosting platforms (e.g. GitHub, GitLab) to handle. These
-are designed for source code management (SCM) however, and thus ill-equipped to
-support data science needs. That's where DVC comes in: with its built-in data
-cache, reproducible [pipelines](/doc/start/data-pipelines), among
-several other novel features (see [Get Started](/doc/start/) for a primer.)
-
-## Track data and models for versioning
-
-Let's say you have an empty DVC repository and put a dataset of
-images in the `images/` directory. You can start tracking it with `dvc add`.
-This generates a `.dvc` file, which can be committed to Git in order to save the
-project's version:
-
-```dvc
-$ ls images/
-0001.jpg 0002.jpg 0003.jpg 0004.jpg ...
-
-$ dvc add images/
-
-$ git add images.dvc .gitignore
-$ git commit -m "Track images dataset with DVC."
-```
-
-DVC's also allows to define the processes that build artifacts based on tracked
-data, such as an ML model, by writing a simple `dvc.yaml` file that connects the
-pieces together:
-
-> `dvc.yaml` files can be written manually or generated with `dvc run`.
-
-```yaml
-stages:
- train:
- cmd: python train.py images/
- deps:
- - images
- outs:
- - model.pkl
-```
-
-> See [Data Pipelines](/doc/start/data-pipelines) for a comprehensive intro to
-> this feature.
-
-`dvc repro` can now execute the `train` stage for you. DVC will track all of its
-outputs (`outs`) automatically. Let's do that, and commit this project version:
-
-```dvc
-$ dvc repro
-Running stage 'train' with command:
- python train.py images/
-Updating lock file 'dvc.lock'
-...
-
-$ git add dvc.yaml dvc.lock .gitignore
-$ git commit -m "Train model via DVC."
-$ git tag -a "v1.0" -m "Fist model" # We'll use this soon ;)
-```
-
-> See also `dvc.lock`.
-
-## Switching versions
-
-After iterating on this process and producing several versions, you can combine
-`git checkout` and `dvc checkout` to perform full or partial
-workspace restorations.
-
-![](/img/versioning.png) _Code and data checkout_
-
-> Note that `dvc install` enables auto-checkouts of data after `git checkout`.
-
-A full checkout brings the whole project back to a previous version
-— code, dataset and model files all match each other:
-
-```dvc
-$ git checkout v1.0
-$ dvc checkout
-M images
-M model.pkl
-```
-
-However, we can checkout certain parts only, for example if we want to keep the
-latest source code and model versions, but rewind to the previous version of the
-dataset:
-
-```dvc
-$ git checkout v1.0 images.dvc
-$ dvc checkout images.dvc
-M images
-```
-
-DVC [optimizes](/doc/user-guide/large-dataset-optimization) this operation by
-avoiding copying files each time, so checking out data is quick even if you are
-versioning large data files.
+# Versioning Data and Models
+
+Data science teams face data management questions around versions of data and
+machine learning models. How do we keep track of changes in data, source code,
+and ML models together? What's the best way to organize and store variations of
+these files and directories?
+
+![](/img/data-ver-complex.png) _Exponential complexity of data science projects_
+
+Another problem in the field has to do with bookkeeping: being able to identify
+past data inputs and processes to understand their results, for knowledge
+sharing, or for debugging.
+
+**Data Version Control** (DVC) lets you capture the versions of your data and
+models in
+[Git commits](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository),
+while storing them on-premises or in cloud storage. It also provides a mechanism
+to switch between these different data contents. The result is a single history
+for data, code, and ML models that you can traverse — a proper journal of your
+work!
+
+![](/img/project-versions.png) _DVC matches the right versions of data, code,
+and models for you 💘._
+
+DVC enables data _versioning through codification_. You write simple
+[metafiles](/doc/user-guide/dvc-files-and-directories) once, describing what
+datasets, ML artifacts, etc. to track. This metadata can be put in Git in lieu
+of large files. Now you can use DVC to create
+[snapshots](/doc/command-reference/add) of the data,
+[restore](/doc/command-reference/checkout) previous versions,
+[reproduce](/doc/command-reference/repro) experiments, record evolving
+[metrics](/doc/command-reference/metrics), and more!
+
+👩💻 **Intrigued?** Try our
+[versioning tutorial](/doc/use-cases/versioning-data-and-model-files/tutorial)
+to learn how DVC looks and feels firsthand.
+
+As you use DVC, unique versions of your data files and directories are
+[cached](dvc-files-and-directories#structure-of-the-cache-directory) in a
+systematic way (preventing file duplication). The working datastore is separated
+from your workspace to keep the project light, but stays connected
+via file
+[links](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
+handled automatically by DVC.
+
+Benefits of our approach include:
+
+- **Lightweight**: DVC is a
+ [free](https://github.com/iterative/dvc/blob/master/LICENSE), open-source
+ [command line](/doc/command-reference) tool that doesn't require databases,
+ servers, or any other special services.
+
+- **Consistency**: Keep your projects readable with stable file names — they
+ don't need to change because they represent variable data. No need for
+ complicated paths like `data/20190922/labels_v7_final` or for constantly
+ editing these in source code.
+
+- **Efficient data management**: Use a familiar and cost-effective storage
+ solution for your data and models (e.g. SFTP, S3, HDFS,
+ [etc.](/doc/command-reference/remote/add#supported-storage-types)) — free from
+ Git hosting
+ [constraints](https://docs.github.com/en/free-pro-team@latest/github/managing-large-files/what-is-my-disk-quota).
+ DVC [optimizes](/doc/user-guide/large-dataset-optimization) storing and
+ transferring large files.
+
+- **Collaboration**: Easily distribute your project development and share its
+ data [internally](/doc/use-cases/shared-development-server) and
+ [remotely](/doc/use-cases/sharing-data-and-model-files), or
+ [reuse](/doc/start/data-access) it in other places.
+
+- **Data compliance**: Review data modification attempts as Git
+ [pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/).
+ Audit the project's immutable history to learn when datasets or models were
+ approved, and why.
+
+- **GitOps**: Connect your data science projects with the Git-powered universe.
+ Git workflows open the door to advanced tools such as continuous integration
+ (like [CML](https://cml.dev/) CI/CD), specialized patterns such as
+ [data registries](/doc/use-cases/data-registries), and other best practices.
+
+In summary, data science and ML are iterative processes where the lifecycles of
+data, models, and code happen at different paces. DVC helps you manage, and
+enforce them.
+
+And this is just the beginning. DVC supports multiple advanced features
+out-of-the-box: Build, run, and versioning
+[data pipelines](/doc/command-reference/dag),
+[manage experiments](/doc/start/experiments) effectively, and more.
diff --git a/content/docs/user-guide/what-is-dvc.md b/content/docs/user-guide/what-is-dvc.md
index ab7e2c2753..7e92379f90 100644
--- a/content/docs/user-guide/what-is-dvc.md
+++ b/content/docs/user-guide/what-is-dvc.md
@@ -47,3 +47,17 @@ can version experiments, manage large datasets, and make projects reproducible.
> Git servers, as well as SSH and cloud storage providers are supported,
> however.
+
+## DVC does not replace Git!
+
+DVC metafiles such as `dvc.yaml` and `.dvc` files serve as placeholders to track
+large data files and directories for versioning (among other
+[purposes](/doc/user-guide/dvc-files-and-directories)). These metafiles change
+along with your data, and you can use Git to place them under
+[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
+as a proxy to the actual data versions, which are stored in the DVC
+cache (outside of Git). This does not replace features of Git.
+
+DVC does, however, provide several commands similar to Git such as `dvc init`,
+`dvc add`, `dvc checkout`, or `dvc push`, which interact with the underlying Git
+repo (if one is being used, which is not required).
diff --git a/static/img/data-ver-complex.png b/static/img/data-ver-complex.png
new file mode 100644
index 0000000000..f633a53d22
Binary files /dev/null and b/static/img/data-ver-complex.png differ
diff --git a/static/img/project-versions.png b/static/img/project-versions.png
new file mode 100644
index 0000000000..8ef0bbc3f7
Binary files /dev/null and b/static/img/project-versions.png differ
diff --git a/static/img/versioning.png b/static/img/versioning.png
deleted file mode 100644
index 1b92fcb0b5..0000000000
Binary files a/static/img/versioning.png and /dev/null differ