Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: add Configuration guide #4379

Merged
merged 6 commits into from
Mar 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ other DVC commands), a few actions are taken under the hood:
[remote storage]: /doc/user-guide/data-management/remote-storage
[structure of cache directory]:
/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory
[`core.autostage`]: /doc/command-reference/config#core
[`core.autostage`]: /doc/user-guide/project-structure/configuration#core

Summarizing, the result is that the target data is replaced by small `.dvc`
files that can be easily tracked with Git.
Expand Down
24 changes: 14 additions & 10 deletions content/docs/command-reference/checkout.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,18 @@ after `git checkout`. See the
for more details.

By default, this command tries not make copies of cached files in the workspace,
using reflinks instead when supported by the file system (refer to
[File link types](/doc/user-guide/data-management/large-dataset-optimization#file-link-types-for-the-dvc-cache)).
The next linking strategy default value is `copy` though, so unless other file
link types are manually configured in `cache.type` (using `dvc config`), files
will be copied. Keep in mind that having file copies doesn't present much of a
negative impact unless the project uses very large data (several GBs or more).
But leveraging file links is crucial with large files, for example when checking
out a 50Gb file by copying might take a few minutes whereas, with links,
restoring any file size will be almost instantaneous.
using reflinks instead when supported by the file system (refer to [File link
types]). The next linking strategy default value is `copy` though, so unless
other file link types are manually configured in [`cache.type`]), files will be
copied. Keep in mind that having file copies doesn't present much of a negative
impact unless the project uses very large data (several GBs or more). But
leveraging file links is crucial with large files, for example when checking out
a 50Gb file by copying might take a few minutes whereas, with links, restoring
any file size will be almost instantaneous.

[File link types]:
/doc/user-guide/data-management/large-dataset-optimization#file-link-types-for-the-dvc-cache
[`cache.type`]: /doc/user-guide/project-structure/configuration#cache

> When linking files takes longer than expected (10 seconds for any one file)
> and `cache.type` is not set, a warning will be displayed reminding users about
Expand Down Expand Up @@ -95,7 +98,8 @@ situation. In some cases, the data can be pulled from [remote storage] using

- `--relink` - ensures the file linking strategy (`reflink`, `hardlink`,
`symlink`, or `copy`) for all data in the workspace is consistent with the
project's [`cache.type`](/doc/command-reference/config#cache). This is
project's
[`cache.type`](/doc/user-guide/project-structure/configuration#cache). This is
achieved by restoring **all data files or directories** referenced in current
DVC files (regardless of whether the files/dirs were already present).

Expand Down
Loading