-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Data Access mentions tracking models without clarifying what kind #2096 #2188
Conversation
Update: |
of the project? How do I download a model to deploy it? How do I download a | ||
specific version of a model? How do I reuse datasets across different projects? | ||
Okay, now that we've learned how to _track_ data files in DVC and how to version | ||
them with Git. _Models_ in a machine learning project are also files written and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change, but it feels a bit misplaced - it splits the previously written paragraph which was written as a single piece. It feels like the whole section will be about models after these two sentences, but then we go with artifacts, and go with How do I reuse datasets across different projects?
.
Also, I'm not sure it addressed the issue #2096 - it feel that we should put your sentence somewhere earlier? wdyt?
I'm even thinking if it worth creating a separate section Models Versioning ? To be extra explicit, to optimize SEO. Otherwise it'll be a constant battle that people read Data Versioning and do not connect dots (expected).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, section itself can be super simple first. That explains that models are the same as data for us pretty much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to change https://dvc.org/doc/start/data-versioning instead, to avoid the confusion in the first place. I'm thinking specifically around "few commands that you can run along with git to track large files, directories, or ML models".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking specifically around "few commands that you can run along with git to track large files, directories, or ML models".
Unfortunately, I think it won't be enough. It's hard to beat the "Data Versioning" title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that another document about model versioning between Data Versioning and Data Access might be better. It can be an overview of changing runtime parameters for models through params.yaml
or run-cache. These could be a bit complex for a Get Started document but a simple we put these parameters here and run this script and create these different models kind of introduction can be written I think.
But usually people use already existing models at the beginning. We may need to discuss access first and versioning later.
Maybe we can just remove models from these documents and focus on data. Data Versioning ➡️ Data Access ➡️ Model Versioning can be another path.
We can also discuss Model Versioning as a section at the end of Data Versioning for now.
I'm fine with any of these solutions. We can continue to discuss this in #2096
BTW mentioning just the issue number (without "fix", "fixes") in the title should be OK for automated closing of issues. (At least that's what VS Code did while sending the PR 😃 ) I'll put "fixes" for clarity though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, @iesahin @jorgeorpinel !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also rename Data Access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Access Data and Models"? Yes perhaps. Or "Access Artifacts" maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifacts - we'll have to define what artifact is then, unfortunately. Feels like we should not be overoptimizing this and do the simplest, the most explicit thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ntent to frontmatter (#2192)
Let me revise the Versioning doc and submit to this branch. |
Question: Can we make URL redirects from |
…com:iterative/dvc.org into iesahin/issue2096
Good question. Usually we change the URL with the title but there are a few places where they don't match exactly (e.g. /start = Get Started). Maybe in this case we should keep /data-versioning as the new title still sort of matches that and dealing with the redirects can be painful later. |
@jorgeorpinel in this specific case, I think "model" should be part of the URL for SEO? (get started vs start - is not that important). We can remove some old redirects probably at this point if it gets too complicated. |
@iesahin looks like some unrelated commits got there? |
I think the title and content already make proper use of the term "model" so I doubt the URL matters much. But TBH Idk, if you prefer to be safe then sure, we can add a redirect (please see /redirects-list.json @iesahin). |
I think I've messed up with VS Code's GitHub extension to apply some unrelated changes to the branch while merging or whatever. 🤦🏼 Restarting this on a new PR seems to be the easiest. |
- Changes title of "Data Access" to "Data and Model Access" - Changes title of "Data Versioning" to "Data and Model Versioning" - Renames path of Data Access and Data Versioning to `data-and-model-access.md` and `data-and-model-versioning.md` respectively. - Adds redirects -- `/doc/start/data-access` -> `/doc/start/data-and-model-access` -- `/doc/start/data-versioning` -> `/doc/start/data-and-model-versioning` - Replaces links in `/doc/start` with the new links.
This PR is closed in favor of #2214. Will continue to discuss there. |
) (#2214) * guide: disclaim x data (impro #2104) * Added changes from PR #2188 and modified paths & titles - Changes title of "Data Access" to "Data and Model Access" - Changes title of "Data Versioning" to "Data and Model Versioning" - Renames path of Data Access and Data Versioning to `data-and-model-access.md` and `data-and-model-versioning.md` respectively. - Adds redirects -- `/doc/start/data-access` -> `/doc/start/data-and-model-access` -- `/doc/start/data-versioning` -> `/doc/start/data-and-model-versioning` - Replaces links in `/doc/start` with the new links. * Update redirects-list.json with fixed subsection redirects. Co-authored-by: Jorge Orpinel <[email protected]> * Fixed incomplete looking sentence * merged into a single paragraph * Divided models sentence and added "large files" phrase. * Adds new paths to sidebar * Updated links to data-access and data-versioning cmd ref * updated links to data-access and data-versioning in blog * Updated links to data-access and data-versioning in UC * Updated links to data-access and data-versioning in UG * updated yarn.lock * Update content/docs/start/data-and-model-versioning.md Co-authored-by: Jorge Orpinel <[email protected]> * Restyled by prettier * fixes hardcoded links to data-and-model-access in the blog * minor fixes * guide: revert Exp Outs guide rename per #2154 (review) * start: emphasize models are files (assumption) * start: roll back unnecessary changes unnecessary for #2214 Co-authored-by: Jorge Orpinel <[email protected]> Co-authored-by: Jorge Orpinel <[email protected]> Co-authored-by: Emre Sahin <iex@levinas> Co-authored-by: Restyled.io <[email protected]>
Fixes #2096