-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start: Data Access and Data Versioning to mention Model in titles (#2096) #2214
Changes from 20 commits
4742d9c
8dea963
45ba851
09dc8ca
3ed7627
a3b15ba
7731587
bb84a99
9ef97c6
2593bb7
9ed0867
3d7d61d
f44e92e
b65de40
3555c5e
19a0859
f3b0631
b83d00d
6166ed0
e6d6bf7
b46cca3
4210bbf
06cf1da
49eefb0
a4252f6
8f3de6e
a4ed206
c143342
0c94a8e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
--- | ||
title: 'Get Started: Data Versioning' | ||
description: 'Get started with data versioning in DVC. Learn how to use a | ||
title: 'Get Started: Data and Model Versioning' | ||
description: 'Get started with data and model versioning in DVC. Learn how to use a | ||
regular Git workflow for datasets and ML models, without storing large files in | ||
Git.' | ||
--- | ||
|
@@ -247,6 +247,16 @@ defines data file versions. Git itself provides the version control. DVC in turn | |
creates these `.dvc` files, updates them, and synchronizes DVC-tracked data in | ||
the <abbr>workspace</abbr> efficiently to match them. | ||
|
||
## Model versioning | ||
|
||
DVC helps you to handle model files as well. Models in a project usually change | ||
more frequently than data files and they need to be kept in sync with changes in | ||
other elements of a project. Model files are no different than data files when | ||
it comes to tracking their versions. DVC also provides means to track minor | ||
changes in model files without fully checking in to Git. In later sections of | ||
this series, you'll see how DVC enables to track changes to synchronize multiple | ||
model and data files. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced we need this new section. We already say "data and models" in every section (except in Retrieving — let's fix that though), so if the gist here is that models are also tracked as any file normally, I think that's already implied in every other section. Also, it can probably be summarized a bit (see feedback below) and then it's too short for a whole section anyway (could be moved to right before Storing and sharing if anything).
|
||
|
||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Large datasets versioning | ||
|
||
In cases where you process very large datasets, you need an efficient mechanism | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems to break the semantics of the paragraph. TBH I don't think it's necessary, we already state "data and models".
I do like the correction to the first sentence, "... , and how to commit their versions to Git" but the next 2 sentences are out of context here.