Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency of desc fields in dvc.yaml and .dvc files #5025

Closed
jorgeorpinel opened this issue Dec 4, 2020 · 19 comments
Closed

Consistency of desc fields in dvc.yaml and .dvc files #5025

jorgeorpinel opened this issue Dec 4, 2020 · 19 comments
Labels
discussion requires active participation to reach a conclusion p3-nice-to-have It should be done this or next sprint ui user interface / interaction

Comments

@jorgeorpinel
Copy link
Contributor

Extracted from iterative/dvc.org#1951 (review)

The new desc field made me rethink the ability to have user data in DVC files. Expand below for the current options (this emoji 👉 points to unique characteristics of each one):

YAML comments

  • 👉 Part of the YAML format anyway, I guess
  • Only meaningful to the user
  • DVC should ignore (but preserve) these.

meta field

  • A valid field in dvc.yaml, at the stage level — part of its schema
  • A valid field in .dvc files, 👉 at the file level (these files were considered stages in 0.x)
  • 👉 Accepts any YAML structure.
  • Only meaningful to the user
  • DVC should ignore (but preserve) these.

desc field

  • A valid field in dvc.yaml, at the stage 👉 and/or output levels — part of its schema
  • A valid field in .dvc files, 👉 at the output level (I suppose these files are considered something closer to outputs in 1.0+ ?)
  • 👉 Only accepts strings.
  • Only meaningful to the user
  • 👉 dvc add/run --desc can be used to write this.
  • DVC ignores (but preserves) it otherwise?
  • 👉 DVC Viewer requires it? See https://github.com/iterative/viewer/issues/1371#issuecomment-738549604

TL;DR

I see significant overlap. What are we trying to solve and what's the ideal design to implement it?

At the same time these are probably the least relevant fields in the file formats but still, I suspect at least one of these options is redundant.

@jorgeorpinel jorgeorpinel added p3-nice-to-have It should be done this or next sprint discussion requires active participation to reach a conclusion enhancement Enhances DVC ui user interface / interaction labels Dec 4, 2020
@pmrowla
Copy link
Contributor

pmrowla commented Dec 4, 2020

I think the reasoning is that desc is not only relevant to the user, it's also relevant to the viewer. The field can specifically be used to add a text description to an output, that will be displayed in the viewer.

meta can contain literally anything and generally only means anything to the user that put something into the meta field. So it does not make sense for the viewer to try and figure out what is inside meta and then attempt to display whatever that might be.

@efiop efiop added awaiting response we are waiting for your reply, please respond! :) ui user interface / interaction and removed enhancement Enhances DVC ui user interface / interaction labels Dec 10, 2020
@efiop
Copy link
Contributor

efiop commented Dec 10, 2020

Doesn't look like there is anything left to say here. Those are just different by definition. Closing as stale, happy to reopen if anyone else has to say anything here.

@efiop efiop closed this as completed Dec 10, 2020
@jorgeorpinel
Copy link
Contributor Author

desc is not only relevant to the user, it's also relevant to the viewer
meta can contain literally anything and generally only means anything to the user

These seem like arbitrary distinctions to me. meta could mean something to the viewer if it was used that way.

But my question goes beyond that, why is desc at the stage level and at the output level for example? Just seems like an ad hoc implementation (I don't understand the design) but OK, I suppose it's not a big feature? I hope it doesn't cause confusion or other problems in the future...

@efiop
Copy link
Contributor

efiop commented Dec 12, 2020

These seem like arbitrary distinctions to me. meta could mean something to the viewer if it was used that way.

But the fact is that it doesn't use it. desc is meant for stage or output description, meta is just whatever you, as a user, want to shove in it, dvc or other tools won't care and won't interpret it or expect any particular format.

But my question goes beyond that, why is desc at the stage level and at the output level for example? Just seems like an ad hoc implementation (I don't understand the design) but OK, I suppose it's not a big feature? I hope it doesn't cause confusion or other problems in the future...

Because it can describe both stage and particular output right now.

@jorgeorpinel
Copy link
Contributor Author

the fact is that it doesn't use it

My impression is that was a decision by the same people here 😬 in fact the Viewer team didn't seem to realize meta even existed if I understood iterative/dvc.org#1951 (comment) correctly. Also in iterative/dvc.org#1951 (comment) it was requested that meta is displayed (again if I understand that conversation correctly) so... Overlap.

it can describe both stage and particular output right now

OK but for .dvc files desc only works at the "output" level, not at the file (stage) level. Seems inconsistent.

Also, meta only works at the stage level (or file level for .dvc files) so it's not as free as purported.

@efiop
Copy link
Contributor

efiop commented Dec 12, 2020

@jorgeorpinel Correct, that was my decision, I started using meta first but then people from viewer team said that a separate field makes sense so I've switched. And I now agree that desc should be separate, they have different purpose.

We spend too much time ranting about it here, that's counterproductive. I understand that you don't like it, but we need to disagree and commit to this.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Dec 12, 2020

I already agreed to disagree some time ago. Not expecting a change, just leaving feedback. The docs probably do need more clarification on all of the details mentioned above though, as it's confusing (not a priority).

p.s. I'm not ranting 😆, I explained my position/questions in detail above.

@efiop
Copy link
Contributor

efiop commented Dec 12, 2020

@jorgeorpinel Thanks for clarifying! Sorry for putting it that way and overreacting. I thought that we've discussed this enough in linked issues/prs, and the reasoning was obvious, but now I understand that I was obviously very wrong, because I didn't consider that you weren't present during all of the discussions about it. So let me try to show a bigger picture here.

In general, the desc is somewhat related to #1487 . So wanting to have a description for a particular dataset/artifact, that would be preserved during repro and other types of updates. Desc only serves as an aid to understanding particular artifacts, but there will be an additional label mechanism (separate from desc, just mentioning since it is related) that would also propagate labels through your pipeline, so that the end result would have all labels of input data that it was using. We've also added stage-level desc because spacy guys do that in their files and it just makes sense to have an aid to understanding what particular stage does inside.

In case of pure dvc, desc is only currently useful when viewing dvc.yaml's(yes, old *.dvc too for now), and you are right that regular comments would do the same job for that, same as when you use comments in your code/bash scripts. But stage desc might also be used in #3743 or something similar, for more concentrated (than viewing yaml) representation of your pipeline. There might be a similar thing for output descs too, e.g. dvc dag --list --outs, but that's just fantasies at this point. But for the most part, right now desc is as good as putting comments in yaml files by hand. But we've allocated a special field for it for people to start using it now, so we could figure out good ways to use them in dvc in the future. One more thing is that since it is an officially recognized field, it will be available in (to-be-official in the future) python API, so people could use it directly (much harder to do with comments). The main difference with meta is that desc has a particular purpose and schema/type officially is recognized by dvc standard, so it won't change.

Viewer is the first such user of desc, which will be using the descriptions for outputs. It won't use stage descriptions though, as there is no pipeline support there yet, but when/if it will be added - the stage desc will be there to use.

@efiop efiop reopened this Dec 12, 2020
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Dec 19, 2020

Thank for the details! All interesting possibilities. I guess the answer to my question ("What are we trying to solve?") is that we're not sure but we see potential and will keep an eye on whether/how users employ this.

I'd just clarify that I don't think comments are an alternative actually, I see more overlap with meta.

To make my position more constructive, I tried to come up with a specific format for meta that would also cover the same needs and be officially recognized but in that attempt I realized it would be much more complicated than simple desc fields.


So the only concern left from my part is on the consistency i.e. where in the formats are desc and meta valid:

  • Should desc be allowed in deps? Why only outs? What about other things like cmd, metrics/plots (which are outpus), etc.? desc could be universal if our idea is to explore posibilities.
  • Should desc be allowed at the file level in .dvc files? Similar to having desc at the stage level in dvc.yaml files
  • Ideally, could we get meta out of the stage level and make it a file-level thing in both formats? That would make it clearly not desc, and I think it makes the files much cleaner when you can completely separate the official DVC-recognized structure from any ignored user metadata.

Example:

stages:
  training:
    desc: Training stage description
    cmd:
      - pip install -r requirements.txt
      - python train.py
          desc: Performs model training. Requires Python (and lib req's)
    deps:
      - train.py
      - features
          desc: We get these features from blah.
    outs:
      - model.pkl:
          desc: Linear/xyz model to predict something
    plots:
      - logs.csv:
          x: epoch
          desc: Logs of the training process

# Completely ignored but valid user info.
meta:
  name: Jonny Deep
  pipeline: 'For deployment'

@efiop
Copy link
Contributor

efiop commented Dec 30, 2020

Should desc be allowed in deps? Why only outs? What about other things like cmd, metrics/plots (which are outpus), etc.? desc could be universal if our idea is to explore posibilities.

Deps - maybe in the future. So far we didn't see any need for it. Same with cmd. Plots/metrics do support desc too.

Should desc be allowed at the file level in .dvc files? Similar to having desc at the stage level in dvc.yaml files

Maybe in the future, so far we didn't see any need for it.

Ideally, could we get meta out of the stage level and make it a file-level thing in both formats? That would make it clearly not desc, and I think it makes the files much cleaner when you can completely separate the official DVC-recognized structure from any ignored user metadata.

I don't think it should be instead, but maybe in addition as requested in #4960

In general, yes, both desc and meta could be allowed everywhere in the future, but so far there is no need.

@skshetry
Copy link
Member

skshetry commented Jan 3, 2021

@efiop, regarding the technical format of desc, what is it supposed to be? A plain text? Or, a markdown for example?

@skshetry skshetry removed the awaiting response we are waiting for your reply, please respond! :) label Jan 4, 2021
@skshetry skshetry mentioned this issue Jan 4, 2021
2 tasks
@efiop
Copy link
Contributor

efiop commented Jan 4, 2021

@skshetry so far we only said that it should be text, so it could be markdown or other formats.

@skshetry
Copy link
Member

skshetry commented Jan 4, 2021

Maybe there has been a discussion on how desc for the outputs will be displayed on the viewer side? Eg: as a tooltip or similar components? CC'ing @shcheklein.

@shcheklein
Copy link
Member

Yep, I think it'll displayed as a tooltip. We didn't think about it being Markdown yet. But as @efiop mentioned it doesn't change anything, right?

@skshetry
Copy link
Member

skshetry commented Jan 5, 2021

I think it'll displayed as a tooltip.

@shcheklein, that answers my question. I just wanted to know if the desc for stage/outs is targeted in the Viewer to be a long-form description (where markdown would be useful) or short ones like in tooltip.

Anyway, moving on with the markdown for now in #5191, as it looks great.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Jan 9, 2021

So far we didn't see any need for it / so far there is no need

@efiop I find that position a bit contradictory vs. "[desc is] for people to start using it now, so we could figure out good ways to use them in dvc in the future".

IMO if the latter is the aim, making desc universal (including at the file level in dvc.yaml`) would be ideal (and more consistent).

Plots/metrics do support desc too.

Documented in #2086.

Ideally, could we get meta out of the stage level and make it a file-level thing

I don't think it should be instead, but maybe in addition as requested in #4960

Thanks for the reference. I moved that part of the discussion there: #4960 (comment)

@jorgeorpinel jorgeorpinel changed the title Consistency of metadata (meta/desc fields) in dvc.yaml and .dvc files Consistency of desc fields in dvc.yaml and .dvc files Jan 9, 2021
@jorgeorpinel
Copy link
Contributor Author

making desc universal would be ideal

That said, you did also state that it "serves as an aid to understanding particular artifacts". So if that's the aim, just at .dvc file-level, in outs (inc. metrics, plots) and deps would be consistent too.

shcheklein pushed a commit to iterative/dvc.org that referenced this issue Jan 9, 2021
* cml: copy edit
rel #2065

* dag: copy edit
rel #2063

* install: update autocomplete sample
per iterative/dvc/issues/5149

* guide: rec WSL2 for Win

* giude: mention that desc can go in metrics and plots (outs)
per iterative/dvc#5025 (comment)
@efiop
Copy link
Contributor

efiop commented Jan 23, 2021

Seems like there are no action points for the nearest future here. CLosing for now.

@efiop efiop closed this as completed Jan 23, 2021
@jorgeorpinel
Copy link
Contributor Author

Hey sorry for the delay here. So to summarize my position (reading this after a while I see it's confusing). IMO to make desc consistent we would need one of the following action points:

  • Either not allow it at the stage level (so the aim is "understanding particular artifacts" — for outputs only, basically)
  • Or allow it at the .dvc file level (to match the stage level in dvc.yaml)

WDYT?

I also like the idea of allowing it for deps but no strong opinion on that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion requires active participation to reach a conclusion p3-nice-to-have It should be done this or next sprint ui user interface / interaction
Projects
None yet
Development

No branches or pull requests

5 participants