Skip to content

Commit

Permalink
Merge pull request #549 from jorgeorpinel/master
Browse files Browse the repository at this point in the history
glossary: apply <abbr> tags to 'workspace' term et. al.
  • Loading branch information
shcheklein authored Aug 13, 2019
2 parents 386690f + be94200 commit 5020fd0
Show file tree
Hide file tree
Showing 68 changed files with 726 additions and 687 deletions.
21 changes: 15 additions & 6 deletions src/Documentation/glossary.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,25 @@ export default {
name: 'Workspace',
match: ['workspace'],
desc: `
By "workspace" we refer to the directory containing all your project files. For
example raw datasets, source code, ML models, etc. A workspace becomes a DVC
project when [\`dvc init\`](/doc/commands-reference/init) is run, and
[DVC-files](/doc/user-guide/dvc-file-format) are created in it. It\s typically
also a Git repository.
Directory containing all your project files. For example raw datasets, source
code, ML models, etc. A workspace becomes a **DVC project** when
[\`dvc init\`](/doc/commands-reference/init) is run, and
[DVC-files](/doc/user-guide/dvc-file-format) or stage files are created in it.
`
},
{
name: 'DVC Project',
match: ['DVC project', 'project', 'projects'],
desc: `
Initialized by running \`dvc init\` in the **workspace**. It will contain the
[\`.dvc/\` directory](/doc/user-guide/dvc-files-and-directories) and
[DVC-files](/doc/user-guide/dvc-file-format) created with commands such as
\`dvc add\` or \`dvc run\`. It's typically also a Git repository.
`
},
{
name: 'DVC Cache',
match: ['DVC cache', 'cache', 'cache directory'],
match: ['DVC cache', 'cache', 'cache directory', 'data cache', 'cached'],
desc: `
The DVC cache is a hidden storage (by default located in the \`.dvc/cache\`
directory) for files that are under DVC control, and their different versions.
Expand Down
8 changes: 4 additions & 4 deletions static/docs/changelog/0.18.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ We have been working hard last few weeks improving **user experience**,
**performance**, and **documentation**. Kudos to `@sotte` and `@Hong-Xiang` for
the great feedback they gave us!

We are very close to the 1.0 release! Two major changes are coming in DVC 1.0 -
We are very close to the 1.0 release! Two major changes are coming in DVC 1.0:
[commit semantics](https://github.com/iterative/dvc/issues/919#issuecomment-414540094)
and
[execution matrix](https://github.com/iterative/dvc/issues/973#issuecomment-412739728).
Expand All @@ -14,10 +14,10 @@ discuss and let us know about your thoughts!
Since the last announcement we have released versions 0.12 through 0.18 and are
really excited to share the progress with you:

-**DVC just got faster**:
-**DVC just got faster**

- Data files management commands - `dvc add`, `dvc push`, `dvc pull`, etc got
up to 10x faster on data sets with large number of files.
- Data files management commands like `dvc add`, `dvc push`, `dvc pull`, etc.
got up to 10x faster on data sets with large number of files.

- Commands startup latency reduced 3x

Expand Down
9 changes: 6 additions & 3 deletions static/docs/changelog/0.35.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# v0.19 - v0.35

We've launched the
[DVC Patreon campaign](https://www.patreon.com/DVCorg/overview) - it's one of
the ways to support the project if you like it.
[DVC Patreon campaign](https://www.patreon.com/DVCorg/overview). It's one of the
ways to support the project if you like it.

Now, let’s **highlight the changes** (not including bug fixes, and minor
improvements) we have done in the last few months:
Expand Down Expand Up @@ -39,16 +39,19 @@ improvements) we have done in the last few months:
1 file not changed, 0 files modified, 1 file added, 0 files deleted, size was increased by 15.3 MB
```

- We’ve introduced the dvc commit command and `dvc run/repro/add --no-commit`
- We’ve introduced the DVC commit command and `dvc run/repro/add --no-commit`
flag to give a way to **avoid uncontrolled cache growth** and as a way to save
some `dvc repro` runs. In the future we plan to have “do-not-cache-my-data” as
a default mode for `dvc run`, `dvc add` and `dvc repro`.

- **SSH remotes (data storage) support** - config options to set port, key
files, timeouts, password, etc + improved stability and Windows support!
Introduced **HTTP remotes** - external dependencies and as a read-only cache.

- **Control over where DVC-files are located in your project** - place them
wherever you want with the `-f` option supported by all relevant commands -
`dvc add`, `dvc run`, and `dvc import`.

- 🙂A lot of **UI improvements** . Starting from the finally fixed nasty issue
with Windows terminal printing a lot of garbage symbols, to using progress
bars for checkouts, better metrics output, and lots of smaller things:
Expand Down
20 changes: 11 additions & 9 deletions static/docs/commands-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ Under the hood, a few actions are taken for each file in `targets`:
2. Move the file content to the DVC cache (default location is `.dvc/cache`).
3. Replace the file by a link to the file in the cache (see details below).
4. Create a corresponding [DVC-file](/doc/user-guide/dvc-file-format) and store
the checksum to identify the cache entry.
5. Add the target(s) to `.gitignore` (if Git is used in this workspace) to
prevent it from being committed to the Git repository.
the MD5 checksum to identify the cache entry.
5. Add the targets to `.gitignore` (if Git is used in this
<abbr>workspace</abbr>) to prevent it from being committed to the Git
repository.
6. Instructions are printed showing `git` commands for adding the files to a Git
repository. If a different SCM system is being used, use the equivalent
command for that system or nothing is printed if `--no-scm` was specified for
Expand Down Expand Up @@ -69,12 +70,13 @@ to work with directory hierarchies with `dvc add`.
the single DVC-file points to a file in the DVC cache that contains
references to the files in the added hierarchy.

In a DVC project `dvc add` can be used to version control any <abbr>data
artifact</abbr> (input, intermediate, or output files and directories, and model
files). It is useful by itself to go back and forth between different versions
of datasets or models. Usually though, it is recommended to use `dvc run` and
`dvc repro` mechanism to version control intermediate and final results (like
models). This way you bring data provenance and make your project reproducible.
In a <abbr>DVC project</abbr>, `dvc add` can be used to version control any
<abbr>data artifact</abbr> (input, intermediate, or output files and
directories, and model files). It is useful by itself to go back and forth
between different versions of datasets or models. Usually though, it is
recommended to use `dvc run` and `dvc repro` mechanism to version control
intermediate and final results (like models). This way you bring data provenance
and make your project reproducible.

## Options

Expand Down
7 changes: 4 additions & 3 deletions static/docs/commands-reference/cache/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ positional arguments:

After DVC initialization, a hidden directory `.dvc/` is created with the
[DVC internal files](/doc/user-guide/dvc-files-and-directories), including the
default `cache` directory.
default cache directory.

The DVC cache is where your data files, models, etc (anything you want to
version with DVC) are actually stored. The corresponding files you see in the
workspace simply link to the ones in cache. (See `dvc config cache`, `type`
config option, for more information on file links on different platforms.)
<abbr>workspace</abbr> simply link to the ones in cache. (See
`dvc config cache`, `type` config option, for more information on file links on
different platforms.)

> For more cache-related configuration options refer to `dvc config cache`.
Expand Down
32 changes: 17 additions & 15 deletions static/docs/commands-reference/checkout.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# checkout

Update data files and directories in workspace based on current DVC-files.
Update data files and directories in the <abbr>workspace</abbr> based on current
DVC-files.

## Synopsis

Expand All @@ -15,14 +16,14 @@ positional arguments:

## Description

[DVC-files](/doc/user-guide/dvc-file-format) in the workspace specify which
instance of each data file or directory is to be used, using the checksum saved
in the `outs` fields. The `dvc checkout` command updates the workspace data to
match with the cache files corresponding to those checksums.
[DVC-files](/doc/user-guide/dvc-file-format) in a <abbr>DVC project</abbr>
specify which instance of each data file or directory is to be used, using the
checksum saved in the `outs` fields. The `dvc checkout` command updates the
workspace data to match with the cache files corresponding to those checksums.

Using an SCM like Git, the DVC-files are kept under version control. At a given
branch or tag of the SCM workspace, the DVC-files will contain checksums for the
corresponding data files kept in the DVC cache. After an SCM command like
branch or tag of the SCM repository, the DVC-files will contain checksums for
the corresponding data files kept in the DVC cache. After an SCM command like
`git checkout` is run, the DVC-files will change to the state at the specified
branch or commit or tag. Afterwards, the `dvc checkout` command is required in
order to synchronize the data files with the currently checked out DVC-files.
Expand All @@ -38,15 +39,16 @@ The execution of `dvc checkout` does:
data files. The scanned DVC-files is limited by the listed `targets` (if any)
on the command line. And if the `--with-deps` option is specified, it scans
backward from the given `targets` in the corresponding
[pipeline](/doc/get-started/pipeline).
[pipeline](/doc/commands-reference/pipeline).

- For any data files where the checksum doesn't match their DVC-file entry, the
data file is restored from the cache. The link strategy used (`reflink`,
`hardlink`, `symlink`, or `copy`) depends on the OS and the configured value
for `cache.type` – See `dvc config cache`.

Note that this DVC by default tries NOT to copy files between the cache and the
workspace by using reflinks when supported by the file system. (Refer to
Note that this command by default tries NOT to copy files between the cache and
the workspace, using reflinks instead when supported by the file system. (Refer
to
[File link types](/docs/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache).)
The next linking strategy default value is `copy` though, so unless other file
link types are manually configured in `cache.type` (using `dvc config`), files
Expand Down Expand Up @@ -77,10 +79,10 @@ be pulled from a remote cache using `dvc pull`.
## Options

- `-d`, `--with-deps` - determine files to update by tracking dependencies to
the target DVC-file(s) (stages). This option only has effect when one or more
the target DVC-files (stages). This option only has effect when one or more
`targets` are specified. By traversing all stage dependencies, DVC searches
backward from the target stage(s) in the corresponding pipeline(s). This means
DVC will not checkout files referenced in later stage(s) than `targets`.
backward from the target stages in the corresponding pipelines. This means DVC
will not checkout files referenced in later stages than the `targets`.

- `-R`, `--recursive` - `targets` is expected to contain at least one directory
path for this option to have effect. Determines the files to checkout by
Expand All @@ -103,8 +105,8 @@ be pulled from a remote cache using `dvc pull`.

## Examples

Let's employ a simple workspace with some data, code, ML models, pipeline
stages, as well as a few Git tags, such as our
Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
pipeline stages, as well as a few Git tags, such as our
[get started example repo](https://github.com/iterative/example-get-started).
Then we can see what happens with `git checkout` and `dvc checkout` as we switch
from tag to tag.
Expand Down
26 changes: 12 additions & 14 deletions static/docs/commands-reference/commit.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ positional arguments:

The `dvc commit` command is useful for several scenarios where a dataset is
being changed: when a [stage](/doc/commands-reference/run) or
[pipeline](/doc/get-started/pipeline) is in development, when one wishes to run
commands outside the control of DVC, or to force DVC-file updates to save time
tying stages or a pipeline.
[pipeline](/doc/commands-reference/pipeline) is in development, when one wishes
to run commands outside the control of DVC, or to force DVC-file updates to save
time tying stages or a pipeline.

- Code or data for a stage is under active development, with rapid iteration of
code, configuration, or data. Run DVC commands (`dvc run`, `dvc repro`, and
Expand All @@ -43,7 +43,7 @@ tying stages or a pipeline.

The last two use cases are **not recommended**, and essentially force update the
DVC-files and save data to cache. They are still useful, but keep in mind that
DVC can't guarantee reproducibility in those cases - you commit any data your
DVC can't guarantee reproducibility in those cases – You commit any data you
want. Let's take a look at what is happening in the fist scenario closely:

Normally DVC commands like `dvc add`, `dvc repro` or `dvc run`, commit the data
Expand All @@ -52,7 +52,7 @@ to the DVC cache as the last step. What _commit_ means is that DVC:
- Computes a checksum for the file/directory
- Enters the checksum and file name into the DVC-file
- Tells the SCM to ignore the file/directory (e.g. add entry to `.gitignore`)
(Note that if the workspace was initialized with no SCM support
(Note that if the <abbr>workspace</abbr> was initialized with no SCM support
(`dvc init --no-scm`), this does not happen.)
- Adds the file/directory or to the DVC cache

Expand All @@ -67,10 +67,10 @@ into play. It handles that last step of adding the file to the DVC cache.
## Options

- `-d`, `--with-deps` - determine files to commit by tracking dependencies to
the target DVC-file(s) (stages). This option only has effect when one or more
the target DVC-files (stages). This option only has effect when one or more
`targets` are specified. By traversing all stage dependencies, DVC searches
backward from the target stage(s) in the corresponding pipeline(s). This means
DVC will not commit files referenced in later stage(s) than `targets`.
backward from the target stages in the corresponding pipelines. This means DVC
will not commit files referenced in later stages than the `targets`.

- `-R`, `--recursive` - `targets` is expected to contain at least one directory
path for this option to have effect. Determines the files to commit by
Expand All @@ -90,10 +90,10 @@ into play. It handles that last step of adding the file to the DVC cache.

## Examples

Let's employ a simple workspace with some data, code, ML models, pipeline
stages, such as the DVC project created in our [Get Started](/doc/get-started)
section. Then we can see what happens with `git commit` and `dvc commit` in
different situations.
Let's employ a simple <abbr>workspace</abbr> with some data, code, ML models,
pipeline stages, such as the <abbr>DVC project</abbr> created in our
[Get Started](/doc/get-started) section. Then we can see what happens with
`git commit` and `dvc commit` in different situations.

<details>

Expand Down Expand Up @@ -122,8 +122,6 @@ Download the precomputed data using:
$ dvc pull --all-branches --all-tags
```

This data will be retrieved from a preconfigured remote cache.

</details>

## Example: Rapid iterations
Expand Down
20 changes: 10 additions & 10 deletions static/docs/commands-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ corresponding config file.
## Configuration sections

These are the `name` parameters that can be used with `dvc config`, or the
sections in the project config file (`.dvc/config`).
sections in the <abbr>DVC project</abbr> config file (`.dvc/config`).

### core

Expand Down Expand Up @@ -98,21 +98,21 @@ for more details.)
> option, properly transforming paths relative to the current working
> directory into paths relative to the config file location.
- `cache.protected` - makes files in the workspace read-only. Possible values
are `true` or `false` (default). Run `dvc checkout` for the change go into
effect. (It affects only files that are under DVC control.)
- `cache.protected` - make files under DVC control read-only. Possible values
are `true` or `false` (default). Run `dvc checkout` for the change to go into
effect.

Due to the way DVC handles linking between the data files in the cache and
their counterparts in the workspace, it's easy to accidentally corrupt the
cached version of a file by editing or overwriting it. Turning this config
option on forces you to run `dvc unprotect` before updating a file, providing
an additional layer of security to your data.
their counterparts in the <abbr>workspace</abbr>, it's easy to accidentally
corrupt the cached version of a file by editing or overwriting it. Turning
this config option on forces you to run `dvc unprotect` before updating a
file, providing an additional layer of security to your data.

It's highly recommended to enable this mod when `cache.type` is set to
`hardlink` or `symlink`.

- `cache.type` - link type that DVC should use to link data files from cache to
your workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a
the workspace. Possible values: `reflink`, `symlink`, `hardlink`, `copy` or a
combination of those, separated by commas e.g: `reflink,hardlink,copy`.

By default, DVC will try `reflink,copy` link types in order to choose the most
Expand Down Expand Up @@ -188,7 +188,7 @@ Set the `dvc` log level to `debug`:
$ dvc config core.loglevel debug
```

Add an S3 remote and set it as the project default:
Add an S3 remote and set it as the <abbr>project</abbr> default:

> **Note!** Before adding a new remote be sure to login into AWS services and
> follow instructions at
Expand Down
20 changes: 10 additions & 10 deletions static/docs/commands-reference/destroy.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ usage: dvc destroy [-h] [-q | -v] [-f]

## Description

It removes DVC-files, and the entire `.dvc/` meta directory from the current
workspace. Note that the <abbr>DVC cache</abbr> will normally be removed as
well, unless it's set to an external location with `dvc cache dir`. (By default
a local cache is located in the `.dvc/cache` directory.) If you were using
[symlinks for linking data](/doc/user-guide/large-dataset-optimization) from the
cache, DVC will replace them with copies, so that your data is intact after the
DVC repository destruction.
`dvc destroy` removes DVC-files, and the entire `.dvc/` meta directory from the
<abbr>workspace</abbr>. Note that the <abbr>DVC cache</abbr> will normally be
removed as well, unless it's set to an external location with `dvc cache dir`.
(By default a local cache is located in the `.dvc/cache` directory.) If you were
using [symlinks for linking data](/doc/user-guide/large-dataset-optimization)
from the cache, DVC will replace them with copies, so that your data is intact
after the DVC repository destruction.

## Options

Expand Down Expand Up @@ -64,7 +64,7 @@ $ dvc cache dir /mnt/cache
$ dvc add foo
```

`dvc cache dir` changed the location of cache storage to exernal location.
`dvc cache dir` changed the location of cache storage to external location.
Content of DVC repository:

```dvc
Expand Down Expand Up @@ -96,8 +96,8 @@ yes
```

`dvc destroy` command removed DVC-files, and the entire `.dvc/` meta directory
from the workspace. But the cache files that are present in the `/mnt/cache`
directory still persist:
from the <abbr>workspace</abbr>. But the cache files that are present in the
`/mnt/cache` directory still persist:

```dvc
$ tree /mnt/cache
Expand Down
2 changes: 1 addition & 1 deletion static/docs/commands-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ reference these experiments.
### Click and expand to setup example

Having followed the previous example's setup, move into the
`example-get-started` directory. Then make sure that you have the latest code
`example-get-started/` directory. Then make sure that you have the latest code
and data with the following commands.

```dvc
Expand Down
Loading

0 comments on commit 5020fd0

Please sign in to comment.