Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there's ambiguity between the terms "stage" and "stage file" #423

Closed
jorgeorpinel opened this issue Jun 7, 2019 · 5 comments
Closed

there's ambiguity between the terms "stage" and "stage file" #423

jorgeorpinel opened this issue Jun 7, 2019 · 5 comments
Assignees
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 7, 2019

(Emanating from #321)

  • Directly affects dvc add, dvc import, and dvc run command refs.
  • Indirectly affects many other docs where these terms are used. See for example the description of the dvc lock cmd ref.
  • Look also double check the usage of the term "step(s)" (which should be stages). Related: term: clarifications (Aug) #448

Definitions

From @MrOutis:

You can say that stage is a node on a pipeline, and pipelines are the weakly-connected-component subgraphs of a DAG.

Internally, on the source code, stage is whatever is related to a file ending with .dvc (DVC-file or "stage file"). It doesn't imply a "command" or an "action", so dvc add will create a .dvc file describing the path and checksum of a file, without any instructions to generate such output

Aproach

From @shcheklein:

for the documentation we should be thinking from the user perspective first. For example, it means that stage term can be confusing for ppl who are using it to manage data

@jorgeorpinel

This comment has been minimized.

@shcheklein

This comment has been minimized.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jun 12, 2019

My latest PR (#433) already standardizes all the docs as much as possible to "DVC-file" and always links to the DVC-File Format doc in the first occurrence of the term in each doc.

The thing is that the tool output calls them "stage files" although I have a pending PR to alleviate that (See iterative/dvc#2112) but internally, classes etc are still called StageFile... and the term stage is used interchangeably in some command outputs. That's why I think sticking with "stage file" even if it's a dvc add file with no dependencies is the easiest way to avoid confusion having too many synonyms.

@shcheklein
Copy link
Member

@jorgeorpinel I don't think it's a problem at all. It makes sense that on a code level we have a way deeper understanding and can use "advanced" terminology. From the user perspective it should be simple, it should be resonating with their needs, etc. We are not optimizing DOCs to be "compatible" with DVC code conventions.

@shcheklein shcheklein added A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels Jun 14, 2019
@jorgeorpinel
Copy link
Contributor Author

I think this is resolved now. Both terms "stage" and "stage file" still exist when appropriate, especially in pipeline-related docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

2 participants