Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use-cases: new datasets registry case study, et al. #679

Merged
merged 45 commits into from
Nov 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
eb088b5
use-cases: fix H3->H2 levels in shared-development-server
jorgeorpinel Oct 6, 2019
98271d7
cmd ref: remove comment from import
jorgeorpinel Oct 7, 2019
77da9ba
tutorials: explain use of `dvc remove` in versionin; other improvemen…
jorgeorpinel Oct 7, 2019
b0f871a
use-cases: incomplete draft of new data registry case study
jorgeorpinel Oct 7, 2019
cef3e14
use-cases: shared-data-registry -> data-registry
jorgeorpinel Oct 7, 2019
f1d82d6
Merge branch 'master' into use-cases/versioning-data
jorgeorpinel Oct 9, 2019
4ccd932
engine: clarify what [Edit on GitHub] button is for.
jorgeorpinel Oct 9, 2019
76d200d
use-cases: first draft of new data registry case study
jorgeorpinel Oct 10, 2019
32bbcd0
use-cases: fixes and improvements on first draft
jorgeorpinel Oct 10, 2019
7a98c30
use-cases: remove Concept header and move its intro to the top, compr…
jorgeorpinel Oct 11, 2019
de6103d
use-cases: Rewrite second half of new case study draft
jorgeorpinel Oct 11, 2019
79f974b
use-cases: full first draft of new data registry case
jorgeorpinel Oct 12, 2019
93aea80
Merge branch 'master' into use-cases/versioning-data
jorgeorpinel Oct 17, 2019
a16ac70
use-cases: add note about read-only remotes for data registries
jorgeorpinel Oct 17, 2019
0f7b1a1
use-cases: merge and compress example H2s with new title
jorgeorpinel Oct 17, 2019
b9d8475
use-cases: shorten "proper data versioning" section of data-registry,…
jorgeorpinel Oct 17, 2019
5112f12
use-cases: address feedback in #679...
jorgeorpinel Oct 17, 2019
bdab21b
use-cases: add general benefits to the list in data-registry
jorgeorpinel Oct 22, 2019
f5a66d0
use-cases: simplify 2nd half of data-registry
jorgeorpinel Oct 22, 2019
bdeeb24
use-cases: fix 2 typos in data-registry
jorgeorpinel Oct 22, 2019
42335a1
use-cases: removed paragraph about risk in data-registry
jorgeorpinel Oct 22, 2019
ed6025b
use-cases: rename H2 to "Example"
jorgeorpinel Oct 22, 2019
7a7e2e6
use-cases: rewrite parts of the example in data-registry and
jorgeorpinel Oct 22, 2019
ac96397
Merge branch 'master' into use-cases/versioning-data
jorgeorpinel Oct 22, 2019
008e358
use-cases: rewrite data-registry list of benefits
jorgeorpinel Oct 23, 2019
131af1e
import,update: explain rev field and update vs re-importing for #735,…
jorgeorpinel Oct 30, 2019
f01f860
cmd ref: add data registry example to import cmd
jorgeorpinel Oct 30, 2019
006be74
Merge branch 'master' into use-cases/versioning-data
jorgeorpinel Oct 30, 2019
45fb574
get: clarify that external repos are NOT data sources
jorgeorpinel Nov 1, 2019
c7d695d
use-cases: feedback round for new data registry case and related cmd …
jorgeorpinel Nov 1, 2019
a829f93
use-cases: use regular back ticks for DVC commands instead of <code> …
jorgeorpinel Nov 2, 2019
216bccb
use-cases: revise advanatage list again
jorgeorpinel Nov 2, 2019
9f0d729
glossary: add "DVC Repository" term; reorder glossary; and
jorgeorpinel Nov 6, 2019
0ff4ec0
Merge branch 'master' into use-cases/versioning-data
jorgeorpinel Nov 6, 2019
2c56d99
glossary: add plural form "DVC repositories" to reviously introduced …
jorgeorpinel Nov 6, 2019
6499644
remove a couple unnecessary link anchors
jorgeorpinel Nov 6, 2019
9d34076
use-cases: address a couple typos
jorgeorpinel Nov 7, 2019
e37e4ba
use-cases: merge "Data as code" and "Lifecycle management" in data re…
jorgeorpinel Nov 7, 2019
ba2e28e
use-cases: merge "Versioning" and "Data as code" (and "Lifecycle mana…
jorgeorpinel Nov 7, 2019
9ce52cd
get: update description to clarify data is downloaded from remote sto…
jorgeorpinel Nov 8, 2019
3e4ef7d
use-cases: un-do WIP text from data registry
jorgeorpinel Nov 8, 2019
f706124
user-guide: small improvement in dvc-file-format intro
jorgeorpinel Nov 9, 2019
2e31691
use-cases: revise intros of existing cases
jorgeorpinel Nov 9, 2019
6425a5d
use-cases: rewrite data registry intro (1)
jorgeorpinel Nov 9, 2019
cb0726f
use-cases: add "or models" to the short data registry description
jorgeorpinel Nov 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/Documentation/RightPanel/RightPanel.js
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ export default class RightPanel extends React.PureComponent {
<span role="img" aria-label="bug">
🐛
</span>{' '}
Found an issue? Let us know or fix it:
Found an issue? Let us know! Or fix it:
</Description>

<GithubButton href={githubLink} target="_blank">
Expand Down
34 changes: 23 additions & 11 deletions src/Documentation/glossary.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,20 @@ form part of your expanded workspace, technically.
name: 'DVC Project',
match: ['DVC project', 'project', 'projects'],
desc: `
Initialized by running \`dvc init\` in the **workspace**. It will contain the
Initialized by running \`dvc init\` in the **workspace** (typically in a Git
repository). It will contain the
[\`.dvc/\` directory](/doc/user-guide/dvc-files-and-directories) and
[DVC-files](/doc/user-guide/dvc-file-format) created with commands such as
\`dvc add\` or \`dvc run\`. It's typically also a Git repository.
\`dvc add\` or \`dvc run\`. It may also be a Git repository.
`
},
{
name: 'DVC Repository',
match: ['DVC repository', 'DVC repositories'],
desc: `
**DVC project** initialized using \`dvc init\` in a Git repository. It will
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
contain \`.git/\` and [\`.dvc/\`](/doc/user-guide/dvc-files-and-directories)
directories, as well as any DVC-files created by DVC.
`
},
{
Expand All @@ -37,14 +47,24 @@ For more details, please refer to this [document]
(/doc/user-guide/dvc-files-and-directories#structure-of-cache-directory).
`
},
{
name: 'Output',
match: ['output', 'outputs'],
desc: `
A file or a directory that is under DVC control, recorded in the \`outs\`
section of a DVC-file. See \`dvc add\` \`dvc run\`, \`dvc import\`,
\`dvc import-url\` commands. A.k.a. **data artifact*.
`
},
{
name: 'Data Artifact',
match: ['data artifact', 'data artifacts'],
desc: `
Any data file or directory, as well as intermediate or final result (such as
extracted features or a ML model file) that is under DVC control. Refer to
[Versioning Data and Model Files]
(/doc/use-cases/versioning-data-and-model-files) for more details.
(/doc/use-cases/versioning-data-and-model-files) for more details. A.k.a
**output*.
`
},
{
Expand All @@ -55,14 +75,6 @@ Stage (DVC-file) created with the \`dvc import\` or \`dvc import-url\`
commands. They represent files or directories from external sources.
`
},
{
name: 'Output',
match: ['output', 'outputs'],
desc: `
A file or a directory that is under DVC control. See \`dvc add\` \`dvc run\`,
\`dvc import\`, \`dvc import-url\` commands.
`
},
{
name: 'External Dependency',
match: ['external dependency', 'external dependencies'],
Expand Down
3 changes: 2 additions & 1 deletion src/Documentation/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@
"label": "Sharing Data & Model Files",
"slug": "sharing-data-and-model-files"
},
"shared-development-server"
"shared-development-server",
"data-registry"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion static/docs/changelog/0.18.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,5 @@ really excited to share the progress with you:

Please use the discussion forum [discuss.dvc.org](discuss.dvc.org) and
[issue tracker]() and don't hesitate to [⭐](https://github.com/iterative/dvc)
our [DVC repository](https://github.com/iterative/dvc) if you haven't yet. We
the [DVC repository](https://github.com/iterative/dvc) if you haven't yet. We
are waiting for your feedback!
2 changes: 1 addition & 1 deletion static/docs/changelog/0.35.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,5 +72,5 @@ There are new [integrations and plugins](/doc/install/plugins) available:
(PyCharm, IntelliJ, etc).

Don't hesitate to
[like\star DVC repository](https://github.com/iterative/dvc/stargazers) if you
[star the DVC repository](https://github.com/iterative/dvc/stargazers) if you
haven't yet. We are waiting for your feedback!
14 changes: 6 additions & 8 deletions static/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,9 +165,7 @@ $ file .dvc/cache/d8/acabbfd4ee51c95da5d7628c7ef74b
```

Note that tracking compressed files (e.g. ZIP or TAR archives) is not
recommended, as `dvc add` supports tracking directories. (Details below.) For
more context, refer to
[Data Registry](/doc/use-cases/data-registry#problem-1-compressed-data-files)
recommended, as `dvc add` supports tracking directories. (Details below.)

## Example: Directory

Expand All @@ -176,14 +174,14 @@ pictures. You may then have hundreds or thousands of pictures of these animals
in a directory, and this is your training dataset:

```dvc
$ tree pics
$ tree pics --filelimit 3
pics
├── train
│   ├── cats <-- A lot of images of cats
│   └── dogs <-- A lot of images of dogs
│   ├── cats [many image files]
│   └── dogs [many image files]
└── validation
├── cats <-- More images of cats
└── dogs <-- More images of dogs
├── cats [more image files]
└── dogs [more image files]
```

Taking a directory under DVC control as simple as with a single file:
Expand Down
6 changes: 3 additions & 3 deletions static/docs/command-reference/cache/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ including the default cache directory.

The cache is where your data files, models, etc (anything you want to version
with DVC) are actually stored. The corresponding files you see in the
<abbr>workspace</abbr> simply link to the ones in cache. (See
`dvc config cache`, `type` config option, for more information on file links on
different platforms.)
<abbr>workspace</abbr> can simply link to the ones in cache. (Refer to
[File link types](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
for more information on file links on different platforms.)

> For more cache-related configuration options refer to `dvc config cache`.

Expand Down
4 changes: 2 additions & 2 deletions static/docs/command-reference/destroy.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ usage: dvc destroy [-h] [-q | -v] [-f]
be removed as well, unless it's set to an external location with
`dvc cache dir`. (By default a local cache is located in the `.dvc/cache`
directory.) If you were using
[symlinks for linking data](/doc/user-guide/large-dataset-optimization) from the
[symlinks for linking](/doc/user-guide/large-dataset-optimization) data from the
cache, DVC will replace them with copies, so that your data is intact after the
DVC repository destruction.
project's destruction.

## Options

Expand Down
8 changes: 3 additions & 5 deletions static/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# fetch

Get files that are under DVC control from
[remote](/doc/command-reference/remote#description) storage into the
<abbr>cache</abbr>.
[remote storage](/doc/command-reference/remote) into the <abbr>cache</abbr>.

## Synopsis

Expand Down Expand Up @@ -74,7 +73,7 @@ specified in DVC-files currently in the project are considered by `dvc fetch`
## Options

- `-r REMOTE`, `--remote REMOTE` - name of the
[remote storage](/doc/command-reference/remote#description) to fetch from (see
[remote storage](/doc/command-reference/remote) to fetch from (see
`dvc remote list`). If not specified, the default remote is used (see
`dvc config core.remote`). The argument `REMOTE` is a remote name defined
using the `dvc remote` command.
Expand Down Expand Up @@ -117,8 +116,7 @@ specified in DVC-files currently in the project are considered by `dvc fetch`
## Examples

Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
pipeline stages, as well as a few Git tags, such as the <abbr>DVC project</abbr>
created in our
pipeline stages, as well as a few Git tags, such as our
[get started example repo](https://github.com/iterative/example-get-started).
Then we can see what happens with `dvc fetch` as we switch from tag to tag.

Expand Down
4 changes: 2 additions & 2 deletions static/docs/command-reference/get-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ be placed inside of it.
Note that this command doesn't require an existing DVC project to run in. It's a
single-purpose command that can be used out of the box after installing DVC.

> See `dvc get` to download data or model files or directories from other DVC
> repositories (e.g. GitHub URLs).
> See `dvc get` to download data or model files or directories from other
> <abbr>DVC repository</abbr> (e.g. GitHub URLs).

DVC supports several types of (local or) remote locations (protocols):

Expand Down
8 changes: 5 additions & 3 deletions static/docs/command-reference/get.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# get

Download or copy file or directory from any <abbr>DVC project</abbr> in a Git
repository (e.g. hosted on GitHub) into the current working directory.
Download or copy file or directory from the
[remote storage](/doc/command-reference/remote) of any <abbr>DVC project</abbr>
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
in a Git repository (e.g. hosted on GitHub) into the current working directory.

> Unlike `dvc import`, this command does not track the downloaded data files
> (does not create a DVC-file).
Expand All @@ -20,7 +21,8 @@ positional arguments:

Provides an easy way to download datasets, intermediate results, ML models, or
other files and directories (any <abbr>data artifact</abbr>) tracked in another
DVC repository, by downloading them into the current working directory.
<abbr>DVC repository</abbr>, by downloading them into the current working
directory. (It works like `wget`, but for DVC repositories.)

Note that this command doesn't require an existing DVC project to run in. It's a
single-purpose command that can be used out of the box after installing DVC.
Expand Down
2 changes: 1 addition & 1 deletion static/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ desired for the downloaded data. If an existing directory is specified, then the
output will be placed inside of it.

> See `dvc import` to download and tack data or model files or directories from
> other DVC repositories (e.g. GitHub URLs).
> other <abbr>DVC repositories</abbr> (e.g. GitHub URLs).

DVC supports [DVC-files](/doc/user-guide/dvc-file-format) that refer to data in
external locations, see
Expand Down
Loading