Skip to content

Commit

Permalink
guide: Tracking data guide and
Browse files Browse the repository at this point in the history
comments with the cmmands related to Sync'ing and Versioning
  • Loading branch information
jorgeorpinel committed Oct 19, 2022
1 parent 3df841e commit 6e5450e
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 2 deletions.
2 changes: 1 addition & 1 deletion content/docs/user-guide/data-management/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Refer to [Versioning Data and Models] to learn more.
</admon>

[version control]:
https://www.atlassian.com/git/tutorials/what-is-version-control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
[git workflows]: https://www.atlassian.com/git/tutorials/comparing-workflows

<!--
Expand Down
51 changes: 50 additions & 1 deletion content/docs/user-guide/data-management/track-sync-version.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,61 @@ it's also included automatically in more advanced features like [pipelining] and

## Tracking data

...
DVC is [similar to Git] in this area. To start tracking large files or
directories, "add" them to DVC with the `dvc add` command. This hides the data
from Git, moves it to the <abbr>cache</abbr>, [links it] back to the
<abbr>workspace</abbr>, and creates an accompanying `.dvc` file (visible to
Git). Now your code (including DVC metafiles) is physically separated from your
data!

<admon type="info">

`.dvc` files can be tracked (and versioned) with Git directly. This ties
everything together (more about this in [Versioning data](#versioning-data)
below).

</admon>

To check what's happening with the data in your project, use `dvc data status`.
This will list changes to DVC-tracked data as well as files unknown to DVC (or
Git).

To capture changes to tracked data, `dvc add` it again. Alternatively,
`dvc commit` will also do the trick. This caches the latest data present in the
workspace and updates `.dvc` files accordingly (changes visible to Git). If you
need to move or rename tracked data without content changes, use `dvc move`.

Finally, to stop tracking data, use `dvc remove`. To also remove it from the
cache (either the latest or historic versions), use `dvc gc`. See [more
details].

<admon type="tip">

Other commands related to tracking data: `dvc unprotect`, `dvc import`, and
`dvc import-url`.

</admon>

[similar to git]:
https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository
[links it]: /doc/user-guide/data-management/large-dataset-optimization
[more details]: /doc/user-guide/how-to/stop-tracking-data

## Synchronizing data

...

<!--
remote add, modify, etc.
push
fetch
pull
-->

## Versioning data

...

<!--
dvc diff
-->

0 comments on commit 6e5450e

Please sign in to comment.