Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help: output from i18n-like string file #2108

Closed
fabiosantoscode opened this issue Jan 21, 2021 · 23 comments
Closed

help: output from i18n-like string file #2108

fabiosantoscode opened this issue Jan 21, 2021 · 23 comments
Labels
p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately. type: discussion Requires active participation to reach a conclusion. type: feature-request DEPRECATED New feature or request

Comments

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Jan 21, 2021

UPDATE: Jump to #2108 (comment)


Hello there!

Updating the CLI help output for every command is a bummer, but it can be done automatically.

By adding special HTML comments which are read by a script, it's possible to inline the output of commands into the markdown.

For example, in content/docs/command-reference/params/index.md:

## Synopsis

<!-- DVC_HELP "dvc params --help" -->
(output of dvc params --help goes here)
<!-- DVC_HELP_END -->
...

By traversing all files and looking for <!-- DVC_HELP comments, it's possible to run the specified command and inline its output in the markdown.

This could be generated automatically with each build, or it could be a script that's run manually and updates everything (as long as dvc is in the PATH).

@rogermparent thoughts?

@jorgeorpinel jorgeorpinel added website: eng-doc DEPRECATED JS engine for /doc type: feature-request DEPRECATED New feature or request labels Jan 21, 2021
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 21, 2021

Some integration between core and docs would be cool for sure.

At the moment the place where these 2 repos are the closest to meeting is in the intro and synposes of https://dvc.org/doc/command-reference. But keep in mind that even then they're not exactly the same: sometimes we (1) add/change text, and we also (2) reformat the usage block so that it fits well in the website. Example:

$ dvc add -h
usage: dvc add [-h] [-q | -v] [-R] [--no-commit] [--external] [--glob]
               [--file <filename>] [--desc <text>]
               targets [targets ...]

Track data files or directories with DVC.
Documentation: <https://man.dvc.org/add>

positional arguments:
  targets            Input files/directories to add.

vs.
image

https://dvc.org/doc/command-reference/add

@fabiosantoscode
Copy link
Contributor Author

So maybe it's viable to use python to grab the correct class and format its synopsis?

I know that python has dynamic imports, because Django uses them extensively.

@rogermparent
Copy link
Contributor

rogermparent commented Jan 21, 2021

It's worth noting that adding a feature to dvc.org that reads from DVC CLI during the build would add DVC and, by extension, Python as a build dependency of the site on all machines, local and cloud. The effects of depending on a whole other language ecosystem would add a large amount of complexity and possible build time to the site that I'm not sure we're currently able to handle.

A possible alternative that avoids this drawback while keeping most advantages would be adding a "docs export" feature to the DVC CLI that dumps all help output to some parse-able text format like MD, JSON, or any structured text which would then be consumed in Gatsby as opposed to running the command during every build.

Using this output as an intermediate step allows us to avoid the issue of adding a hard dependency on DVC itself while still easing the burden of doc writers and allowing for a "source of truth" on the DVC docs. This exported docs API also has the advantage of allowing other methods of consumption like generating minimal HTML or traditional manpages.

As for the API's consumption on Gatsby, we could both read and inject the doc data in many ways.
The mentioned comment-based approach is doable, but involves adding some more advanced Remark transformers; contrast to a more structured frontmatter-based implementation that could better lean on Gatsby's GraphQL querying to match doc outputs with the corresponding page and combine the two in the template.

@fabiosantoscode
Copy link
Contributor Author

Using an output from DVC itself seems like a swell idea. I'm not aware of any per-release jobs we could piggy back on but @efiop would know better.

a more structured frontmatter-based implementation that could better lean on Gatsby's GraphQL

I'm not sure how GraphQL would help in this context -- perhaps you mean a resolver that embeds the result from a DVC-produced JSON or yaml? If so that sounds great.

@rogermparent
Copy link
Contributor

I'm not sure how GraphQL would help in this context -- perhaps you mean a resolver that embeds the result from a DVC-produced JSON or yaml? If so that sounds great.

Exactly! It's often easier in Gatsby to go from source data to React template than from source data to a Remark operation, especially in cases where the data is used the same way each time.

@jorgeorpinel jorgeorpinel added the type: discussion Requires active participation to reach a conclusion. label Jan 22, 2021
@efiop
Copy link
Contributor

efiop commented Jan 23, 2021

This was the idea a long time back, but IIRC the conclusion was that auto-generated help sucks and docs needed some flexibility CC @shcheklein .

@rogermparent
Copy link
Contributor

rogermparent commented Jan 23, 2021

This was the idea a long time back, but IIRC the conclusion was that auto-generated help sucks and docs needed some flexibility.

I agree for fully auto-generated docs, but the synopsis code blocks we're talking about here are effectively supposed to be a complete copy of the output of dvc help and used to complement a more detailed help article below that output, the article using the synopsis block as context.

A workflow involving exporting help from DVC and checking the text output into Git has the benefit of any changes in the automatic output being visible by diff in the same Git workflow as normal code, including mandating multiple devs inspect any changes via the PR review process.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jan 26, 2021

synopsis code blocks we're talking about here are effectively supposed to be a complete copy of the output of dvc help

They're not that exactly, see #2108 (comment) above. But probably not hard to filter into the desired format, which can be what we have now or something else.

Maybe that should be the first thing to decide: What do we want to see (ideal) in the cmd ref docs?

@fabiosantoscode
Copy link
Contributor Author

Side note: if we can neatly separate the help output into a data structure, we might be able to show it neatly on mobile phones without horizontal scrolling.

If whatever library the DVC CLI is using for command line interface doesn't allow us to extract the several parts separately, parsing the output string may be tricky but arguably worth it.

@iesahin
Copy link
Contributor

iesahin commented Feb 4, 2021

I wrote a zsh loop in content/docs/command-reference/ like

for f in *.md ; do                                                                                                  
cmd=${f:r}
echo $cmd
dvc ${cmd} --help > ${cmd}-help.txt
done

to get all help text from DVC. (It gives error on index but whatever.) It's possible to use the output to replace parts in cmd.md files from time to time and edit them manually. It's also possible to track interface changes and alert when something changes. This might be the simplest way without much workflow tooling update.

@rogermparent
Copy link
Contributor

@iesahin A shell script is a good idea, especially in regards to being the least intrusive to both projects.

If we go this route, maybe we could have the functionality to update a single command's help text instead of having to re-run every command's help for every update.

As far as post-processing, we have the choice of either doing some processing in the shell script, or having a lean shell script that simply pipes DVC's output while we do any transformations in Gatsby at build-time. I'm in favor of the latter, since it all has to become Node data in the end regardless.

@shcheklein shcheklein changed the title Copy CLI help into markdown cmd ref: copy CLI help into markdown Feb 5, 2021
@iesahin
Copy link
Contributor

iesahin commented Feb 5, 2021

@rogermparent What I had in my mind is actually a more manual process, does the DVC team update the interface that frequently? Someone (e.g. I) can run a tool and check the differences in output, update markdown files or create tickets. I think a manual edit process will be needed in any case. Otherwise we can simply include dvc cmd --help output in cmd.md in build time like you said.

Keeping a set of text files that contain --help outputs of dvc commands and alert the docs team when something changes may be enough for now. Something like a script running weekly on DVC release branch and checking the diff output. So that when someone adds an option to a command, we can be informed from the change in help text. DVC seems to be using Python argparse module to parse the command line options so some help text should be associated with the option most of the time.

If you would like a fully automated process, I agree that running this within Gatsby (and not creating another script dependency) is better, but I doubt (from the discussion above), @jorgeorpinel and @shcheklein is in favor of such an automated process.

@rogermparent
Copy link
Contributor

@rogermparent What I had in my mind is actually a more manual process, does the DVC team update the interface that frequently? Someone (e.g. I) can run a tool and check the differences in output, update markdown files or create tickets.

I suppose I would call what I was thinking "semi-automated", in that editors manually get dvc working on PATH and invoke the script to update the text files with the help info. From there, Gatsby would do basic transforms like intents and formatting.

Keeping a set of text files that contain --help outputs of dvc commands and alert the docs team when something changes may be enough for now. Something like a script running weekly on DVC release branch and checking the diff output.

Instead of having the script do diffing, I think we could just check these files in with Git. That way, we see these help file diffs in the same place as all the other code diffs.

@jorgeorpinel
Copy link
Contributor

does the DVC team update the interface that frequently?

They barely ever review the help output of existing commands or options but commands and options often change, which causes all sort of UI text updates.

Also we tend to review the descriptions in the docs and send a PR in core to make corresponding updates to the help output strings there.

Keeping a set of text files that contain --help outputs of dvc commands

This would be useful for i18n too. We'd need to check in the core repo to see what they think though.

For now if the proposed automation can reliable perform the transformation explained in #2108 (comment) then it could definitely be interesting, but ideally it should also allow for updating the transformation rules easily, or apply certain exceptions (and not overcomplicate the process for contributors).

@iesahin iesahin self-assigned this Feb 10, 2021
@iesahin
Copy link
Contributor

iesahin commented Feb 10, 2021

VS Code Github interface is interesting 😄 It assigned this to me while trying to open.

For now if the proposed automation can reliable perform the transformation explained in #2108 (comment) then it could definitely be interesting, but ideally it should also allow for updating the transformation rules easily, or apply certain exceptions (and not overcomplicate the process for contributors).

It's possible to use diff/ed/sed/awk/perl/python/vim... scripts for text transformation but IMHO this is over-engineering. (e.g. "create a diff -e script between dvc $cmd --help version and edited-cmd-help.txt version and apply this to new dvc $cmd --help to get a new edited-new-cmd-help.txt, and if any change occurs replace the text between <some tag> ... </some tag> with the new edited-new-cmd-help.txt....) We can create an elaborate set of scripts but at the end a human will be required to read and edit the document.

I think I can write a diff and create issues script in a few hours. It can create issues automatically with the changed help text content in DVC master for each command. There will undoubtedly be false positives but we can close such issues right away. If there is such a system you're currently using, I can integrate this to that as well.

@jorgeorpinel
Copy link
Contributor

A quick script as POC would be great!

@shcheklein
Copy link
Member

My 2cs - I think this is a very low priority ticket, guys. I would not worry about it for now. I don't remember we had too many problems synchronizing docs with CLI in that specific part.

@iesahin
Copy link
Contributor

iesahin commented Feb 18, 2021

I wrote a script and sent a PR in #2207 but closed without merge: If needs arise we can use the script to get the differences in dvc --help outputs and create issues.

@iesahin
Copy link
Contributor

iesahin commented Feb 21, 2021

Can we close this now? @jorgeorpinel @shcheklein

@shcheklein shcheklein added the p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately. label Feb 24, 2021
@iesahin iesahin removed their assignment Apr 10, 2021
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 7, 2021

Since we have #2770 and iterative/dvc#5392 (comment) I'd vote to close this ticket. Or change it just in order to decide on this specific idea:

Keeping a set of text files that contain --help outputs of dvc commands

This would be useful for i18n too.

Cc @efiop WDYT? My spin on it is that DVC would actually read the strings from that external text file to print in help output. In the future it could even use it for other terminal output like error messages, even logging, etc.

@jorgeorpinel jorgeorpinel changed the title cmd ref: copy CLI help into markdown help: output from i18n-like string file Sep 7, 2021
@jorgeorpinel jorgeorpinel removed the website: eng-doc DEPRECATED JS engine for /doc label Sep 7, 2021
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 7, 2021

p.s. we'd need to transfer this over to the core dvc repo, or reopen there though.

@iesahin
Copy link
Contributor

iesahin commented Sep 27, 2021

I believe the question "should we merge help text output and cmd ref" is still open and will be decided by the core team. We can close this. @efiop

@jorgeorpinel
Copy link
Contributor

Closing in favor of iterative/dvc#5392 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately. type: discussion Requires active participation to reach a conclusion. type: feature-request DEPRECATED New feature or request
Projects
None yet
6 participants