-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: Basic Operations (Data Mgmt) #4053
Conversation
bfae17c
to
3df841e
Compare
Link Check ReportAll 32 links passed! |
comments with the cmmands related to Sync'ing and Versioning
7baea5a
to
6e5450e
Compare
9c93b5c
to
afb38ee
Compare
figure placeholder in Basic Ops page
[more details]. | ||
|
||
Putting it all together, we can get an overview of the data in a project with | ||
`dvc data status`. This will list changes to DVC-tracked data as well as files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why dvc data status
is not auto-linking (see in deployment). I see it in https://github.com/iterative/gatsby-theme-iterative/blob/main/packages/gatsby-theme-iterative/config/prismjs/dvc-commands.js . Cc @iterative/websites
Alright. So this one is done as originally intended (see OP). Questions:
I'll leave it up to you @dberenbaum @shcheklein . Turned it into a draft for now. Thanks |
This comment was marked as resolved.
This comment was marked as resolved.
* guide: draft structure of Data Mgmt and some updates around the topic in existing docs * guide: full text for draft intro to DM * guide: hide cloud versioning info per #4042 (review) * guide: clarify Data Mgmt parts and add prospective figure titles * guide: add figure drafts to Data Mgmt * guide: SCM->VC (Data Mgmt) * guide: update 2 figs and add 1 more (Data Mgmt) * guide: roll back unrelated changes per #4042 (review) * guide: mention clouds first (DM) and and update fig. 1 per #4042 (review) * guide: flatten DM index per #4042 (review) * guide: udpates to DM/ DV moved from #4053 (review) * guide: add DM/ Data Versioning page per #4042 (comment) * guide: update outdated link * guide: revert more unrelatedly chaqnged files per #4042 (review) * guide: remove unused ref link * guide: DM/ Remote Storage (not just Setup) and and some links from cmd refs and avoid term "data remote" and some admons nearby... * guide: remove a comment * guide: draft for DM/ Remote Storage content * ref: expand config.remote and link to/from Remotes guide * ref: fix remote config file examples * guide: complete Remote Config section and and add Project config section to DM/ DV guide * ref: rewrite remote add and modify Descs * guide: complete list of supported storage types * ref: rewrite remote index page from extracted from #4053 * guide: clarify `remote modify` phrase in in the Remote config section of DM/ Remote Storage * Update content/docs/user-guide/data-management/data-versioning.md * guide: update versioning config per #4058 (review) * guide: don't call remote storage "additional" here (in the DM/ Remote Storage guide) per #4058 (review) Co-authored-by: Dave Berenbaum <[email protected]> * guide: pull -> download (DM/ RS intro) * guide: remove "optional" from Remote Storage nav & title per #4058 (review) * guide: splits and notes around Data Mgmt index page rel. #4042 (comment) * guide: Data Mgmt intro + note updates * guide: draft of all contents + + remove comments * guide: small impros to Data Mgmt in prep for #4042 (review) * guide: rewrite Data Mgmt index in before/after form per #4042 (review) * guide: add draft figure for Data Mgmt * guide: simplify/refocus data mgmt index per #4042 (review) * work around commented header bug * guide: drop DM/ DV page * guide: rewrite DM intro and - hide benefits (for now) - remove codification comment block * guide: use DM table instead of figure for now * guide: rewrite Data Mgmt story * guide: add draft figures to Data Mgmt * guide: simplify Data Mgmt story and benefits * guide: remove unused images (DM) * guide: update Data Mgmt figures (v1) * guide: rewrite text of Data Mgmt index * guide: update Data Mgmt figures * guide: iterate on Data Mgmt again * guide: update Data Mgmt figs * guide: more supporting info about Data Mgmt * guide: update figures (much more concrete) and and matching text updates * guide: edits to How it works (Data Mgmt) * guide: update Data Mgmt figures Rel. #4042 (comment) * guide: emphaisze dataset versions in UG fig 1 Rel. #4042 (comment) * guide: update Data Mgmt figures (with notes), expand img captions, and update text accordingly. * guide: more updates to text and figure styles, esp. to the first half and comment some stuff out (temporary) * guide: update figures and text (Data Mgmt) ... Using a tabs toggle for the 2nd fig. * guide: Data Management text (section 1) finalized for this version of figures * guide: Data Management (main text) finalized for this version of figures * guide: Data Management (secondary text) pending diagram and code sample(s) * guide: add DVC data mgmt technical diagram & dummy sample CLI blocks * guide: update Data Mgmt text * guide: udpate text and 2nd figure (Data Mgmt) * guide: draft 2nd and 3rd figures * guide: rewrite Data Mgmt/ How it works & and Benefits/ Tradeoffs Probably still unfinished... Missing more data versioning info? See HTML comments. * guide: update drafts of Data Mgmt figures 2, 3 * guide: Data Mgmt improvements and hide the benefits list for now * guide: separate from Data Mgmt work Rel. #4042 * Apply suggestions from code review * Merge branch main + * other: links to Remotes guide * install: Remote Storage guide links * start: Remote Storage guide links + * guide: links to Remote Storage page * Restyled by prettier (#4323) Co-authored-by: Restyled.io <[email protected]> --------- Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: rogermparent <[email protected]> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright the PR looks good now. See the pending questions above though. Summary of changes thus far:
# Track and Sync Versioned Data & Models | ||
|
||
The fundamental workflow of most <abbr>DVC projects</abbr> includes the | ||
following **basic operations**. These can be performed directly (as we cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main file being contributed.
and [data sync operations], and provides data de-duplication at the file level. | ||
However, this comes with the drawback of losing human-readable filenames without | ||
the use of the DVC CLI (`dvc get --show-url`) or API (`dvc.api.get_url()`). | ||
|
||
When using cloud versioning, DVC does not provide de-duplication, and certain | ||
remote storage performance optimizations will be unavailable. | ||
|
||
[content-addressable storage]: | ||
/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory | ||
[data sync operations]: | ||
/doc/user-guide/data-management/track-sync-data#synchronizing-data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most other changes just add links to the new page (mainly to the Sync section).
## Basic workflow: store as peristent commits | ||
## Basic workflow: store as persistent commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated fix, oops. Found it when looking for places to link from (but didn't end up linking in this file).
* guide: draft structure of Data Mgmt and some updates around the topic in existing docs * guide: full text for draft intro to DM * guide: hide cloud versioning info per #4042 (review) * guide: clarify Data Mgmt parts and add prospective figure titles * guide: add figure drafts to Data Mgmt * guide: SCM->VC (Data Mgmt) * guide: update 2 figs and add 1 more (Data Mgmt) * guide: roll back unrelated changes per #4042 (review) * guide: mention clouds first (DM) and and update fig. 1 per #4042 (review) * guide: flatten DM index per #4042 (review) * guide: udpates to DM/ DV moved from #4053 (review) * guide: add DM/ Data Versioning page per #4042 (comment) * guide: update outdated link * guide: revert more unrelatedly chaqnged files per #4042 (review) * guide: remove unused ref link * guide: DM/ Remote Storage (not just Setup) and and some links from cmd refs and avoid term "data remote" and some admons nearby... * guide: remove a comment * guide: draft for DM/ Remote Storage content * ref: expand config.remote and link to/from Remotes guide * ref: fix remote config file examples * guide: complete Remote Config section and and add Project config section to DM/ DV guide * ref: rewrite remote add and modify Descs * guide: complete list of supported storage types * ref: rewrite remote index page from extracted from #4053 * guide: clarify `remote modify` phrase in in the Remote config section of DM/ Remote Storage * Update content/docs/user-guide/data-management/data-versioning.md * guide: update versioning config per #4058 (review) * guide: don't call remote storage "additional" here (in the DM/ Remote Storage guide) per #4058 (review) Co-authored-by: Dave Berenbaum <[email protected]> * guide: pull -> download (DM/ RS intro) * guide: remove "optional" from Remote Storage nav & title per #4058 (review) * guide: splits and notes around Data Mgmt index page rel. #4042 (comment) * guide: Data Mgmt intro + note updates * guide: draft of all contents + + remove comments * guide: small impros to Data Mgmt in prep for #4042 (review) * guide: rewrite Data Mgmt index in before/after form per #4042 (review) * guide: add draft figure for Data Mgmt * guide: simplify/refocus data mgmt index per #4042 (review) * work around commented header bug * guide: drop DM/ DV page * guide: rewrite DM intro and - hide benefits (for now) - remove codification comment block * guide: use DM table instead of figure for now * guide: rewrite Data Mgmt story * guide: add draft figures to Data Mgmt * guide: simplify Data Mgmt story and benefits * guide: remove unused images (DM) * guide: update Data Mgmt figures (v1) * guide: rewrite text of Data Mgmt index * guide: update Data Mgmt figures * guide: iterate on Data Mgmt again * guide: update Data Mgmt figs * guide: more supporting info about Data Mgmt * guide: update figures (much more concrete) and and matching text updates * guide: edits to How it works (Data Mgmt) * guide: update Data Mgmt figures Rel. #4042 (comment) * guide: emphaisze dataset versions in UG fig 1 Rel. #4042 (comment) * guide: update Data Mgmt figures (with notes), expand img captions, and update text accordingly. * guide: more updates to text and figure styles, esp. to the first half and comment some stuff out (temporary) * guide: update figures and text (Data Mgmt) ... Using a tabs toggle for the 2nd fig. * guide: Data Management text (section 1) finalized for this version of figures * guide: Data Management (main text) finalized for this version of figures * guide: Data Management (secondary text) pending diagram and code sample(s) * guide: add DVC data mgmt technical diagram & dummy sample CLI blocks * guide: update Data Mgmt text * guide: udpate text and 2nd figure (Data Mgmt) * guide: draft 2nd and 3rd figures * guide: rewrite Data Mgmt/ How it works & and Benefits/ Tradeoffs Probably still unfinished... Missing more data versioning info? See HTML comments. * guide: update drafts of Data Mgmt figures 2, 3 * guide: Data Mgmt improvements and hide the benefits list for now * guide: separate from Data Mgmt work Rel. #4042 * Apply suggestions from code review * Merge branch main + * ref: update links from API to Remotes guide * guide: update links around Remote Storage and and other updates to nearby Markdown (e.g. proper admons) * Roll back unrelated changes * Restyled by prettier (#4261) Co-authored-by: Restyled.io <[email protected]> * ref: bring cloud versioning copy edits of import-url from https://github.com/iterative/dvc.org/pull/4260/files#diff-ef95e18c4bd039757695065a23946dc27e28b4727ce07c670cdc096e34dbe3b3 * ref: clarify import-url with cloud versioning per #4142 (review) * ref: updates to import-url --version-aware and update --rev * ref: add import-url --version aware to Synopsis per #4089 (comment) * Restyled by prettier (#4266) Co-authored-by: Restyled.io <[email protected]> * Restyled by prettier (#4322) Co-authored-by: Restyled.io <[email protected]> * Update content/docs/command-reference/remote/modify.md Co-authored-by: Oded Messer <[email protected]> * Update content/docs/command-reference/remote/modify.md Co-authored-by: Oded Messer <[email protected]> * Update content/docs/command-reference/push.md Co-authored-by: Oded Messer <[email protected]> * yarn format-all --------- Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: rogermparent <[email protected]> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <[email protected]> Co-authored-by: Oded Messer <[email protected]>
My take on this one: I don't see anywhere else in the data management guide that we have the basic workflow explained, so I think this one is useful for that. Happy to try and get it polished if you agree @shcheklein. I would consider incorporating https://dvc.org/doc/user-guide/how-to/update-tracked-data and/or https://dvc.org/doc/user-guide/how-to/stop-tracking-data. |
@dberenbaum sound good to me! |
@jorgeorpinel Do you want to finish this one, or would you rather I take it over? |
Let me try to wrap it up first @dberenbaum 🙂 (Sorry for the delay) |
@dberenbaum is this PR still relevant? If yes, maybe you could complete it as you suggested? |
@tapadipti It is relevant but it doesn't seem like we are able to prioritize it right now, so I'll close. |
To finish addressing the 2nd check box in #2856 (comment). Planned structure:
list
, gets, imports &update
)Main file to review: UG/data-management/track-sync-version.md
In review app: https://dvc-org-guide-data-mgmt-epdkkq.herokuapp.com/doc/user-guide/data-management/track-sync-data
UPDATE: Above done ✅