Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stages command #5191

Closed
wants to merge 25 commits into from
Closed

Add stages command #5191

wants to merge 25 commits into from

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jan 3, 2021

Resolves #3743.

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
    Not looking into adding a docs yet, keeping this command as experimental for some time.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Screenshot from 2021-01-09 10-17-22

@skshetry skshetry self-assigned this Jan 3, 2021
@skshetry skshetry added enhancement Enhances DVC feature is a feature labels Jan 4, 2021
@skshetry skshetry requested review from pmrowla, pared and efiop January 4, 2021 05:06
@skshetry skshetry marked this pull request as ready for review January 4, 2021 05:06
@skshetry skshetry changed the title [WIP] Add stages command Add stages command Jan 4, 2021
dvc/command/repro.py Outdated Show resolved Hide resolved
@skshetry skshetry requested review from a team and isidentical and removed request for a team January 4, 2021 10:03
@pared
Copy link
Contributor

pared commented Jan 4, 2021

@skshetry do we intend to have tests for stages?

@skshetry
Copy link
Member Author

skshetry commented Jan 4, 2021

@pared, the autocompletion-related feature stages -sq is a very thin wrapper on dvc.repo.stage. The long-form list output is mostly presentational with very little logic and is experimental for now. I'm sure this will change. Till now, I was just toying with rich and its feature, so I haven't thought much about the information that we should display there.

The feedback on that and the design would be very much appreciated. Thanks.

@pared
Copy link
Contributor

pared commented Jan 4, 2021

@skshetry Thanks,
So far it seems very clear and understandable. I'll play around with it and see if I can come up with something useful.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for markdown in description looks great!

Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Only comment would be that having the entire tabled centered instead of left-aligned looks off to me in a wider terminal window, but that's a UI personal preference thing.

Screen Shot 2021-01-05 at 5 18 30 PM

dvc/command/stages.py Outdated Show resolved Hide resolved
dvc/command/stages.py Outdated Show resolved Hide resolved
@shcheklein
Copy link
Member

Looks great overall! I have a few concerns though to be honest:

  1. A separate command - it feels if overlaps with dvc dag to some extent. Even if not, do we actually need the whole new command for this? Should we merge dag into this one dvc stages --dag?
  2. Interface looks refreshing, but a bit too massive to my mind. Feels like -s can be default? And even if we add more details - let's add them as ls does (compact table, parsable with regular tools like awk - it's quite important to my mind)
  3. It's better to always provide --show-json, or something similar for machine readable interface. So that UI does not become an API (as much as possible). In case awk parsable table is possible, it might we okay to do both?

@skshetry
Copy link
Member Author

skshetry commented Jan 6, 2021

1. A separate command - it feels if overlaps with `dvc dag` to some extent. Even if not, do we actually need the whole new command for this? Should we merge `dag` into this one `dvc stages --dag`?

dag is more of a visual command whereas my intention for this command is to just list stages in the dvc.yaml in a quick and easy way (the dependent nodes are only shown if they are within the same file, and graph collections are skipped to make it faster). Anything other than that is optional, the command is only trying to be helpful.

There was a discussion on showing status here or move this to the dvc status, as status is limited, and this stages format provides an opportunity to provide better information (eg: with granularity and dag). But I am not sure about this command in the longer term or whether we should move them to status/dag or move them here.

2. Interface looks refreshing, but a bit too massive to my mind. Feels like `-s` can be default? 

Could that be the other way around? Encourage people to use --short/-s for parsing/grepping?

And even if we add more details - let's add them as ls does (compact table, parsable with regular tools like awk - it's quite important to my mind)

ls might not work as the columns might vary based on description/outputs/dependent nodes/dependencies, etc.
Unless we choose to not show them at all and just show the numbers of them, it will be difficult to parse anyway.

3. It's better to always provide `--show-json`, or something similar for machine readable interface. So that UI does not become an API (as much as possible). In case `awk` parsable table is possible, it might we okay to do both?

-s should be machine-readable. The current format that's required for autocompletion could be changed to a different option and we could just print out in the following format:

# <name>:<sp><desc>
train: Train *model*
evaluate: Evaluate *model*

And, we can easily provide all of the stage internals through --show-json. I have no issues with that.

Alternately, we could make repro/exp return a list of the stages, as a help command for the stages.

@shcheklein
Copy link
Member

dag is more of a visual command

but it's not like it has to be, right? also, we can make --dag option. It just feels that we have to much overlap, and it's hard to justify the whole top level command for this.

Could that be the other way around? Encourage people to use --short/-s for parsing/grepping?

I don't know. It's my personal bias I guess from regular CLI tools, like ls. It still feels we can make a bit more compact. Also if we can make it parsable by default it's better to my mind. We should have strong reasons to not do so.

ls might not work as the columns might vary based on description/outputs/dependent nodes/dependencies

skip values? not sure, would need to see the whole table schema I guess

-s should be machine-readable.

agreed, if we can make regular output grep/awk friendly - better do that. But --show-json I mention for consistency, an interface we guarantee consistenct?

And, we can easily provide all of the stage internals through --show-json. I have no issues with that.

πŸ‘

Alternately, we could make repro/exp return a list of the stages, as a help command for the stages.

make sense, might be an option as well

@jorgeorpinel
Copy link
Contributor

Woa, interesting!

  • I agree that the centering is unexpected (feel like a retro DOS app which is fun but probably too much?)
  • Agree about -s default and more compact details
  • Agree that dag is a separate thing (as long as it makes sense that this command doesn't validates DAGs).
  • Cool md support but maybe add a flag to enable that (otherwise truncate desc heavily).
  • βž•1 on all stage details via --show-json (if requested)
  • dvc repro list feels like something repro shouldn't worry about.

The documented feature of this is the autocompletion

What docs changes does that entail? Thanks

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 9, 2021

the list-layout is experimental

On this. I'm not sure about listing deps and outs (or even more details) at least initially (until someone asks for it?). You could cat dvc.yaml for that (mentioned in #3743 (comment)). We "sell" dvc.yaml as human-friendly so I think we expect users to be OK with eye-parsing simple YAML.

@skshetry
Copy link
Member Author

skshetry commented Jan 9, 2021

  • I agree that the centering is unexpected (feel like a retro DOS app which is fun but probably too much?)

UI is no longer centered. The demo is a bit old. Updated with the screenshot.

@skshetry
Copy link
Member Author

skshetry commented Jan 9, 2021

We "sell" dvc.yaml as human-friendly so I think we expect users to be OK with eye-parsing simple YAML.

Tell this to a user with 100 stages/~1300 lines of dvc.yaml.

It's human-friendly in terms of readability and writability and that too applies only to a certain extent; it does not enlighten the user as to their relationship/purpose just by looking at it.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 12, 2021

a user with 100 stages/~1300 lines of dvc.yaml

Is there a real life example of this out there, and most importantly has it been reported/requested by a user with such huge dvc.yaml files? In any case how does this command help with that? The output is even longer than the DVC files.

It's human-friendly in terms of readability

βž•

it does not enlighten the user as to their relationship/purpose just by looking at it

OK but again, how does this command help with that? It basically reformats the same info from dvc.yaml. It doesn't make any "enlightening" interpretations.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 12, 2021

As a reminder, I'm not arguing against the command in general. I'm questioning the value of showing all the details from dvc.yaml (except in --show-json, as a programmatic interface to out dvc.yaml format) until we have evidence of this need from users. But if that exists and I'm not aware of it, maybe you can help us by sharing the full context?

p.s. the purpose in #3743 doesn't seem to require full stage details IMO, including the dvc.yaml location seems enough.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 12, 2021

p.p.s. any details that can fit in a very compact manner (one line) I wouldn't see a problem with β€” if the output remains easily readable.

@skshetry skshetry closed this Jan 14, 2021
@skshetry skshetry deleted the list-stages branch January 14, 2021 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature is a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quick listing of stages
5 participants