-
Notifications
You must be signed in to change notification settings - Fork 12
Semi-autogenerated docs #171
Comments
It has been discussed a few times iterative/dvc.org#2770 before. My take - this approach creates pretty bad docs (if you could run |
Even for python API (that makes more sense to me to generate), afair team decided to keep very simple in the code and longer descriptions / examples in docs. |
Yes, the idea is to force the simple part to be the same in docs and in code. Longer descriptions and examples will be only in docs with all those fancy md things. |
From iterative/dvc.org#2770 (comment) : my intention is exactly what @casperdcl wrote in the end: generate subset of (2) from (1) |
This can be solved by introducing a check. No need to generate or keep source code as a source for docs. I'm not sure how valuable everything else. Tbh from my experience it's still quite rare that we would even benefit from a checks like those.
this is minor. Usually major work is done by writing proper description of those options. Point here is - if you they are the same as
I'm not sure it's possible tbh, unless I'm missing something. Usually 2 looks quite different from 1. |
I think it could be helpful to have a basic CI check that e.g. |
The check you are talking about is almost the same as what I propose. To implement this check there are 2 ways: parse existing options section, find what options are there and compare with what |
Mmm probably you are. I'm not talking about something like |
@mike0sv how do you envision the workflow for this though? it should be a check that you run regularly anyway and then either you generate boilerplate automatically as a PR or fix it manually. If you don't automate this then who is responsible running this. Anyways, my point is that from my experience this takes time to automate, takes time to maintain, etc, etc and in case of DVC was not solving much. Most of the work goes into writing meaningful option descriptions (neither
I understand that it doesn't generate the whole md file, it generates some parts of it, right? (not sure if it keeps or not options that already exist). And that's exactly what I was talking about- It drives bad docs to my mind. For every PR in code that changes API / CLI we should be creating a proper PR with docs update. It should have examples, proper description (--help doesn't give it). This process guarantees that we have meaningful docs. Automation can help to check for discrepancies (e.g. run by cron) or bootstrap it the first time (similar to #172). |
@shcheklein I understand the concern about docstrings contents restricting the doc site content 🙏 we discussed this offline as well. But it doesn't have to be this way imo. So it's very possible to achieve some automation here without any "new" workflow that would reduce quality.
I think it doesn't have to. The generated docs are definitely better than nothing, even if they are just a skeleton for more examples / fleshed out content which requires time and attention. So it's not against that, but automating the repetitive content at least This is the way I see it at least. So I do suggest we give this a try, dvc docs are more stable and 99% goes to handcrafted content, but mlem is in a different stage and things are more dynamic, this can potentially help guard us from drift between docsite and tool. For this to be effective I also think we want to automate this somehow - run in a cronjob and generate a suggestion PR every week or so. would be a good reminder and even if not mergable, and we need a man-in-the-loop, it can provide the skeleton for the changes |
TL;DR: I'm fine to automate and try (but keep in mind we are spending time on this :) ).
yes. But this is about bootstrapping pretty much? After the project is more or less stable I found it's hard to justify this level of automation (I mean making more and more sophisticated scripts to merge / embed, etc, etc). Everything can be done, but it has its own cost. While 99% time in docs goes into writing content. Creating manually a PR that just copy-pastes things when you change a command is not painful at all unless you change something every day (I doubt that it will be happening). A bit of reflection on my approach / my thoughts.
|
I agree the automation might make more sense only for unique tools like CML where most pple don't download it to run I'd only automate checking that all |
As a potential new user, I find the Python API docs on mlem.ai difficult to work with, as they are not up to date, and could benefit from further typehinting. For example: mlem.api.save on mlem.ai While I understand this requires some additional dev work, it may be worth the prioritization. In my case, I am evaluating using mlem/dvc/gto for a model registry, after which I'd like to evaluate Interactive Studio, but I need to get through the docs first ;) |
There are a number inconsistencies between mlem docs and actual mlem code. Sometimes it's because of new features that we forgot to add docs for, sometimes it's fixes in docs that are not reflected in mlem code. To make everything as consistent as possible, I suggest to auto-generate everything we can.
Of course, a big chunk of docs will remain hand-crafted. I am talking about parts of reference pages for API, CLI and upcoming Objects.
Ideal process:
specification
is generated from mlem codebase in a form of json file. It contains all docs-related stuff from code (docstrings, help messages etc). It's generated from latest mlem version in CIFor now:
I will start with CLI for iterative/mlem#363 and create a PR shortly with examples
The text was updated successfully, but these errors were encountered: