Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Data Access mentions tracking models without clarifying what kind #2096 #2188

Closed
wants to merge 5 commits into from

Conversation

iesahin
Copy link
Contributor

@iesahin iesahin commented Feb 12, 2021

Fixes #2096

@shcheklein shcheklein temporarily deployed to dvc-landing-iesahin-iss-ukqqtk February 12, 2021 12:16 Inactive
@shcheklein
Copy link
Member

Update: added Fixes #2096 to the description - GH will close it automatically.

of the project? How do I download a model to deploy it? How do I download a
specific version of a model? How do I reuse datasets across different projects?
Okay, now that we've learned how to _track_ data files in DVC and how to version
them with Git. _Models_ in a machine learning project are also files written and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change, but it feels a bit misplaced - it splits the previously written paragraph which was written as a single piece. It feels like the whole section will be about models after these two sentences, but then we go with artifacts, and go with How do I reuse datasets across different projects?.

Also, I'm not sure it addressed the issue #2096 - it feel that we should put your sentence somewhere earlier? wdyt?

I'm even thinking if it worth creating a separate section Models Versioning ? To be extra explicit, to optimize SEO. Otherwise it'll be a constant battle that people read Data Versioning and do not connect dots (expected).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, section itself can be super simple first. That explains that models are the same as data for us pretty much.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to change https://dvc.org/doc/start/data-versioning instead, to avoid the confusion in the first place. I'm thinking specifically around "few commands that you can run along with git to track large files, directories, or ML models".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking specifically around "few commands that you can run along with git to track large files, directories, or ML models".

Unfortunately, I think it won't be enough. It's hard to beat the "Data Versioning" title.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that another document about model versioning between Data Versioning and Data Access might be better. It can be an overview of changing runtime parameters for models through params.yaml or run-cache. These could be a bit complex for a Get Started document but a simple we put these parameters here and run this script and create these different models kind of introduction can be written I think.

But usually people use already existing models at the beginning. We may need to discuss access first and versioning later.

Maybe we can just remove models from these documents and focus on data. Data Versioning ➡️ Data Access ➡️ Model Versioning can be another path.

We can also discuss Model Versioning as a section at the end of Data Versioning for now.

I'm fine with any of these solutions. We can continue to discuss this in #2096

BTW mentioning just the issue number (without "fix", "fixes") in the title should be OK for automated closing of issues. (At least that's what VS Code did while sending the PR 😃 ) I'll put "fixes" for clarity though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, @iesahin @jorgeorpinel !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also rename Data Access?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Access Data and Models"? Yes perhaps. Or "Access Artifacts" maybe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifacts - we'll have to define what artifact is then, unfortunately. Feels like we should not be overoptimizing this and do the simplest, the most explicit thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in 8dea963 (in the new PR #2214 )

Thanks.

…ommand flags (Fix #2179) (#2194)

* updated restyled bot config to be the same as prettierrc config via command flags

* fixed single-quote option
@iesahin
Copy link
Contributor Author

iesahin commented Feb 16, 2021

Let me revise the Versioning doc and submit to this branch.

@iesahin
Copy link
Contributor Author

iesahin commented Feb 16, 2021

Question: Can we make URL redirects from /doc/start/data-versioning to /doc/start/data-and-model-versioning or should we keep URLs intact? @jorgeorpinel @shcheklein

@shcheklein shcheklein temporarily deployed to dvc-landing-iesahin-iss-ukqqtk February 16, 2021 08:51 Inactive
@jorgeorpinel
Copy link
Contributor

Can we make URL redirects from /doc/start/data-versioning to /doc/start/data-and-model-versioning

Good question. Usually we change the URL with the title but there are a few places where they don't match exactly (e.g. /start = Get Started). Maybe in this case we should keep /data-versioning as the new title still sort of matches that and dealing with the redirects can be painful later.

@shcheklein
Copy link
Member

@jorgeorpinel in this specific case, I think "model" should be part of the URL for SEO? (get started vs start - is not that important). We can remove some old redirects probably at this point if it gets too complicated.

@shcheklein
Copy link
Member

@iesahin looks like some unrelated commits got there?

@jorgeorpinel
Copy link
Contributor

"model" should be part of the URL for SEO

I think the title and content already make proper use of the term "model" so I doubt the URL matters much. But TBH Idk, if you prefer to be safe then sure, we can add a redirect (please see /redirects-list.json @iesahin).

@iesahin
Copy link
Contributor Author

iesahin commented Feb 18, 2021

I think I've messed up with VS Code's GitHub extension to apply some unrelated changes to the branch while merging or whatever. 🤦🏼 Restarting this on a new PR seems to be the easiest.

iesahin added a commit that referenced this pull request Feb 18, 2021
- Changes title of "Data Access" to "Data and Model Access"
- Changes title of "Data Versioning" to "Data and Model Versioning"
- Renames path of Data Access and Data Versioning to
  `data-and-model-access.md` and `data-and-model-versioning.md`
  respectively.
- Adds redirects
-- `/doc/start/data-access` -> `/doc/start/data-and-model-access`
-- `/doc/start/data-versioning` ->
`/doc/start/data-and-model-versioning`
- Replaces links in `/doc/start` with the new links.
@iesahin
Copy link
Contributor Author

iesahin commented Feb 18, 2021

This PR is closed in favor of #2214. Will continue to discuss there.

@iesahin iesahin closed this Feb 18, 2021
shcheklein pushed a commit that referenced this pull request Mar 29, 2021
) (#2214)

* guide: disclaim x data (impro #2104)

* Added changes from PR #2188 and modified paths & titles

- Changes title of "Data Access" to "Data and Model Access"
- Changes title of "Data Versioning" to "Data and Model Versioning"
- Renames path of Data Access and Data Versioning to
  `data-and-model-access.md` and `data-and-model-versioning.md`
  respectively.
- Adds redirects
-- `/doc/start/data-access` -> `/doc/start/data-and-model-access`
-- `/doc/start/data-versioning` ->
`/doc/start/data-and-model-versioning`
- Replaces links in `/doc/start` with the new links.

* Update redirects-list.json with fixed subsection redirects.

Co-authored-by: Jorge Orpinel <[email protected]>

* Fixed incomplete looking sentence

* merged into a single paragraph

* Divided models sentence and added "large files" phrase.

* Adds new paths to sidebar

* Updated links to data-access and data-versioning cmd ref

* updated links to data-access and data-versioning in blog

* Updated links to data-access and data-versioning in UC

* Updated links to data-access and data-versioning in UG

* updated yarn.lock

* Update content/docs/start/data-and-model-versioning.md

Co-authored-by: Jorge Orpinel <[email protected]>

* Restyled by prettier

* fixes hardcoded links to data-and-model-access in the blog

* minor fixes

* guide: revert Exp Outs guide rename
per #2154 (review)

* start: emphasize models are files (assumption)

* start: roll back unnecessary changes

unnecessary for #2214

Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Emre Sahin <iex@levinas>
Co-authored-by: Restyled.io <[email protected]>
@shcheklein shcheklein deleted the iesahin/issue2096 branch April 4, 2021 02:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

start: Data Access mentions tracking models without clarifying what kind
3 participants