Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: Data Access and Data Versioning to mention Model in titles (#2096) #2214

Merged
merged 29 commits into from
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4742d9c
guide: disclaim x data (impro #2104)
jorgeorpinel Feb 3, 2021
8dea963
Added changes from PR #2188 and modified paths & titles
iesahin Feb 18, 2021
45ba851
Update redirects-list.json with fixed subsection redirects.
iesahin Feb 20, 2021
09dc8ca
Fixed incomplete looking sentence
Feb 20, 2021
3ed7627
Merge branch 'iesahin/issue2096-take-2' of github.com:iterative/dvc.o…
iesahin Feb 20, 2021
a3b15ba
merged into a single paragraph
iesahin Feb 20, 2021
7731587
Divided models sentence and added "large files" phrase.
iesahin Feb 20, 2021
bb84a99
Adds new paths to sidebar
iesahin Feb 20, 2021
9ef97c6
Updated links to data-access and data-versioning cmd ref
iesahin Feb 20, 2021
2593bb7
updated links to data-access and data-versioning in blog
iesahin Feb 20, 2021
9ed0867
Updated links to data-access and data-versioning in UC
iesahin Feb 20, 2021
3d7d61d
Updated links to data-access and data-versioning in UG
iesahin Feb 20, 2021
f44e92e
Merge branch 'master' of https://github.com/iterative/dvc.org into ie…
iesahin Feb 22, 2021
b65de40
updated yarn.lock
iesahin Feb 22, 2021
3555c5e
Update content/docs/start/data-and-model-versioning.md
iesahin Feb 23, 2021
19a0859
Merge branch 'master' into iesahin/issue2096-take-2
iesahin Feb 24, 2021
f3b0631
Merge branch 'iesahin/issue2096-take-2' of origin into iesahin/issue2…
iesahin Feb 24, 2021
b83d00d
Restyled by prettier
restyled-commits Feb 24, 2021
6166ed0
Merge pull request #2231 from iterative/restyled/iesahin/issue2096-ta…
iesahin Feb 24, 2021
e6d6bf7
fixes hardcoded links to data-and-model-access in the blog
iesahin Feb 24, 2021
b46cca3
minor fixes
iesahin Feb 24, 2021
4210bbf
Merge branch 'master' into guide/external-disclaimer
jorgeorpinel Mar 1, 2021
06cf1da
Merge branch 'master' into guide/external-disclaimer
jorgeorpinel Mar 14, 2021
49eefb0
guide: revert Exp Outs guide rename
jorgeorpinel Mar 14, 2021
a4252f6
Merge branch 'guide/external-disclaimer'
jorgeorpinel Mar 28, 2021
8f3de6e
Merge branch 'master' of github.com:iterative/dvc.org
jorgeorpinel Mar 28, 2021
a4ed206
start: emphasize models are files (assumption)
jorgeorpinel Mar 29, 2021
c143342
start: roll back unnecessary changes
jorgeorpinel Mar 29, 2021
0c94a8e
Merge branch 'master' into iesahin/issue2096-take-2 +
jorgeorpinel Mar 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/blog/2020-10-12-october-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ few weeks, so stay tuned. Another big initative is adding videos to our docs:
since video seems like a popular format for a lot of learners, we're working to
supplement our official docs with embedded videos. Check out our first
installment on the
[Getting Started with Data Versioning](https://dvc.org/doc/start/data-versioning).
[Getting Started with Data Versioning](/doc/start/data-and-model-versioning).

https://youtu.be/kLKBcPonMYw

Expand Down
2 changes: 1 addition & 1 deletion content/blog/2020-11-11-november-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ welcome referrals if you know a good candidate)!

We're continuing to develop our video docs, and now half of our "Getting
Started" section has video accompaniments. Check out our latest release on
[data access with DVC](https://dvc.org/doc/start/data-access):
[data access with DVC](/doc/start/data-and-model-access):

https://youtu.be/EE7Gk84OZY8

Expand Down
14 changes: 7 additions & 7 deletions content/blog/2020-12-18-december-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,17 @@ As you may have heard
on adding complete video docs to the "Getting Started" section of the DVC site.
We now have 100% coverage! We have videos that mirror the tutorials for:

- [Data versioning](https://dvc.org/doc/start/data-versioning) - how to use Git
and DVC together to track different versions of a dataset
- [Data versioning](/doc/start/data-and-model-versioning) - how to use Git and
DVC together to track different versions of a dataset

- [Data access](https://dvc.org/doc/start/data-access) - how to share models and
- [Data access](/doc/start/data-and-model-access) - how to share models and
datasets across projects and environments

- [Pipelines](https://dvc.org/doc/start/data-pipelines) - how to create
reproducible pipelines to transform datasets to features to models
- [Pipelines](/doc/start/data-pipelines) - how to create reproducible pipelines
to transform datasets to features to models

- [Experiments](https://dvc.org/doc/start/experiments) - how to do a `git diff`
for models that compares and visualizes metrics
- [Experiments](/doc/start/experiments) - how to do a `git diff` for models that
compares and visualizes metrics

https://media.giphy.com/media/L4ZZNbDpOCfiX8uYSd/giphy.gif

Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,9 @@ $ dvc diff

Let's checkout the
[2-track-data](https://github.com/iterative/example-get-started/releases/tag/2-track-data)
tag, corresponding to the [Data Versioning](/doc/start/data-versioning) _Get
Started_ chapter, right after we added `data.xml` file with DVC:
tag, corresponding to the
[Data Versioning](/doc/start/data-and-model-versioning) _Get Started_ chapter,
right after we added `data.xml` file with DVC:

```dvc
$ git checkout 2-track-data
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ file or directory from. It also has the `--out` option to specify the location
to place the target data within the workspace. Combining these two options
allows us to do something we can't achieve with the regular `git checkout` +
`dvc checkout` process – see for example the
[Get Older Data Version](/doc/start/data-versioning#switching-between-versions)
[Get Older Data Version](/doc/start/data-and-model-versioning#switching-between-versions)
chapter of our _Get Started_.

Let's use the
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ $ git checkout 3-config-remote
## Example: Tracking a file from the web

An advanced alternate to the intro of the
[Versioning Basics](/doc/start/data-versioning) part of the _Get Started_ is to
use `dvc import-url`:
[Versioning Basics](/doc/start/data-and-model-versioning) part of the _Get
Started_ is to use `dvc import-url`:

```dvc
$ dvc import-url https://data.dvc.org/get-started/data.xml \
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ data `path`, and the `outs` field contains the corresponding local path in the
<abbr>workspace</abbr>. It records enough metadata about the imported data to
enable DVC efficiently determining whether the local copy is out of date.

To actually [version the data](/doc/start/data-versioning), `git add` (and
`git commit`) the import `.dvc` file.
To actually [version the data](/doc/start/data-and-model-versioning), `git add`
(and `git commit`) the import `.dvc` file.

Note that `dvc repro` doesn't check or update import `.dvc` files (see
`dvc freeze`), use `dvc update` to bring the import up to date from the data
Expand Down
4 changes: 2 additions & 2 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@
},
"children": [
{
"slug": "data-versioning",
"slug": "data-and-model-versioning",
"tutorials": {
"katacoda": "https://katacoda.com/dvc/courses/get-started/versioning"
}
},
{
"slug": "data-access",
"slug": "data-and-model-access",
"tutorials": {
"katacoda": "https://katacoda.com/dvc/courses/get-started/accessing"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
title: 'Get Started: Data Access'
title: 'Get Started: Data and Model Access'
---

# Get Started: Data Access
# Get Started: Data and Model Access

Okay, now that we've learned how to _track_ data and models with DVC and how to
version them with Git, next question is how can we _use_ these artifacts outside
of the project? How do I download a model to deploy it? How do I download a
specific version of a model? How do I reuse datasets across different projects?
Okay, we've learned how to _track_ data and models with DVC, and how to commit
their versions to Git. The next questions are: How can we _use_ these artifacts
outside of the project? How do I download a model to deploy it? How to download
a specific version of a model? Or reuse datasets across different projects?

> These questions tend to come up when you browse the files that DVC saves to
> remote storage, e.g.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 'Get Started: Data Versioning'
description: 'Get started with data versioning in DVC. Learn how to use a
title: 'Get Started: Data and Model Versioning'
description: 'Get started with data and model versioning in DVC. Learn how to use a
regular Git workflow for datasets and ML models, without storing large files in
Git.'
---
Expand All @@ -14,8 +14,8 @@ to a different version of a 100Gb file in less than a second with a
`git checkout`.

The foundation of DVC consists of a few commands that you can run along with
`git` to track large files, directories, or ML models. Think "Git for data".
Read on or watch our video to learn about versioning data with DVC!
`git` to track large files, directories, or ML model files. Think "Git for
data". Read on or watch our video to learn about versioning data with DVC!

https://youtu.be/kLKBcPonMYw

Expand All @@ -34,8 +34,8 @@ $ dvc get https://github.com/iterative/dataset-registry \
```

We use the fancy `dvc get` command to jump ahead a bit and show how Git repo
becomes a source for datasets or models - what we call "data registry" or "model
registry". `dvc get` can download any file or directory tracked in a <abbr>DVC
becomes a source for datasets or models - what we call "data/model registry".
`dvc get` can download any file or directory tracked in a <abbr>DVC
repository</abbr>. It's like `wget`, but for DVC or Git repos. In this case we
download the latest version of the `data.xml` file from the
[dataset registry](https://github.com/iterative/dataset-registry) repo as the
Expand Down Expand Up @@ -90,10 +90,10 @@ outs:

## Storing and sharing

You can upload DVC-tracked data or models with `dvc push`, so they're safely
stored [remotely](/doc/command-reference/remote). This also means they can be
retrieved on other environments later with `dvc pull`. First, we need to setup a
storage:
You can upload DVC-tracked data or model files with `dvc push`, so they're
safely stored [remotely](/doc/command-reference/remote). This also means they
can be retrieved on other environments later with `dvc pull`. First, we need to
setup a storage:

```dvc
$ dvc remote add -d storage s3://mybucket/dvcstore
Expand Down Expand Up @@ -154,9 +154,9 @@ a3

## Retrieving

Having DVC-tracked data stored remotely, it can be downloaded when needed in
other copies of this <abbr>project</abbr> with `dvc pull`. Usually, we run it
after `git clone` and `git pull`.
Having DVC-tracked data and models stored remotely, it can be downloaded when
needed in other copies of this <abbr>project</abbr> with `dvc pull`. Usually, we
run it after `git clone` and `git pull`.

<details>

Expand Down
4 changes: 2 additions & 2 deletions content/docs/start/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@ stages:
There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared`
in this case); `dvc run` already took care of this. You only need to run
`dvc push` if you want to save them to
[remote storage](/doc/start/data-versioning#storing-and-sharing), (usually along
with `git commit` to version `dvc.yaml` itself).
[remote storage](/doc/start/data-and-model-versioning#storing-and-sharing),
(usually along with `git commit` to version `dvc.yaml` itself).

## Dependency graphs (DAGs)

Expand Down
4 changes: 2 additions & 2 deletions content/docs/start/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,8 @@ $ git commit -a -m "Preserve best random forest experiment"
## Sharing experiments

After committing the best experiments to our Git branch, we can
[store and share](/doc/start/data-versioning#storing-and-sharing) them remotely
like any other iteration of the pipeline.
[store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them
remotely like any other iteration of the pipeline.

```dvc
dvc push
Expand Down
19 changes: 10 additions & 9 deletions content/docs/start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,16 @@ Now you're ready to DVC!
DVC's features can be grouped into functional components. We'll explore them one
by one in the next few pages:

- [**Data versioning**](/doc/start/data-versioning) (try this next) is the base
layer of DVC for large files, datasets, and machine learning models. Use a
regular Git workflow, but without storing large files in the repo (think "Git
for data"). Data is stored separately, which allows for efficient sharing.

- [**Data access**](/doc/start/data-access) shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
project. This can help to download a specific version of an ML model to a
deployment server or import a model to another project.
- [**Data and model versioning**](/doc/start/data-and-model-versioning) (try
this next) is the base layer of DVC for large files, datasets, and machine
learning models. Use a regular Git workflow, but without storing large files
in the repo (think "Git for data"). Data is stored separately, which allows
for efficient sharing.

- [**Data and model access**](/doc/start/data-and-model-access) shows how to use
data artifacts from outside of the project and how to import data artifacts
from another DVC project. This can help to download a specific version of an
ML model to a deployment server or import a model to another project.

- [**Data pipelines**](/doc/start/data-pipelines) describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Expand Down
8 changes: 4 additions & 4 deletions content/docs/use-cases/data-registries.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

One of the main uses of <abbr>DVC repositories</abbr> is the
[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning).
DVC also enables cross-project [reusability](/doc/start/data-access) of these
<abbr>data artifacts</abbr>. This means that your projects can depend on data
from other DVC repositories — like a **package management system for data
science**.
DVC also enables cross-project [reusability](/doc/start/data-and-model-access)
of these <abbr>data artifacts</abbr>. This means that your projects can depend
on data from other DVC repositories — like a **package management system for
data science**.

![](/img/data-registry.png) _Data management middleware_

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Benefits of our approach include:
- **Collaboration**: Easily distribute your project development and share its
data [internally](/doc/use-cases/shared-development-server) and
[remotely](/doc/use-cases/sharing-data-and-model-files), or
[reuse](/doc/start/data-access) it in other places.
[reuse](/doc/start/data-and-model-access) it in other places.

- **Data compliance**: Review data modification attempts as Git
[pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/).
Expand Down
3 changes: 2 additions & 1 deletion content/docs/user-guide/project-structure/dvc-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
You can use `dvc add` to track data files or directories located in your current
<abbr>workspace</abbr>\*. Additionally, `dvc import` and `dvc import-url` let
you bring data from external locations to your project, and start tracking it
locally. See [Data Versioning](/doc/start/data-versioning) for more info.
locally. See [Data Versioning](/doc/start/data-and-model-versioning) for more
info.

> \* Certain [external locations](/doc/user-guide/managing-external-data) are
> also supported.
Expand Down
2 changes: 2 additions & 0 deletions redirects-list.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
"^/(?:docs|documentation)(/.*)?$ /doc$1",

"^/doc/get-started(/.*)?$ /doc/start",
"^/doc/start/data-versioning$ /doc/start/data-and-model-versioning",
"^/doc/start/data-access$ /doc/start/data-and-model-access",
"^/doc/tutorial(/.*)?$ /doc/start",
"^/doc/tutorials/get-started(/.*)?$ /doc/start",
"^/doc/tutorials/versioning(/.*)?$ /doc/use-cases/versioning-data-and-model-files/tutorial",
Expand Down