Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dvcfs API #3927

Closed
dberenbaum opened this issue Sep 7, 2022 · 25 comments
Closed

Document dvcfs API #3927

dberenbaum opened this issue Sep 7, 2022 · 25 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented Sep 7, 2022

Report

Dvcfs can be a useful public api that is more flexible than the methods in the current dvc api. Some use cases for it:

  • It can support walking through a repo or directory and getting the url or opening the files inside.
  • It can also be used to pass arbitrary credentials or other configuration to connect to remote storage.
@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference labels Sep 8, 2022
@jorgeorpinel
Copy link
Contributor

Have we considered simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?

@skshetry
Copy link
Member

skshetry commented Sep 8, 2022

@jorgeorpinel, that’s a “Developer docs”, not a user facing one. It does not make sense to me to redirect to a different site for what is part of dvc’s offering.

But I am not against auto-generation if the engine can support.

@shcheklein
Copy link
Member

We had that discussion for dvc.api and decided against it (primarily bc team wanted to keep docstrings really simple - e.g. no full examples like we have, etc, etc). Do we plan in this case bring those into docstring?

@skshetry
Copy link
Member

skshetry commented Sep 8, 2022

We do have docstrings now in dvc.api thanks to @daavoo. Docstrings are useful when you are working in IDEs/Editors. Personally, I wouldn't want to leave my text-editor/REPL to docs as much as possible because it involves context switching from the task I have in hand. So I prefer docstrings.

But I also think the docs have a different responsibility compared to docstrings, such as introducing the dvcfs to new users, suggesting scenarios that it can solve, etc that @dberenbaum mentioned above. That said, we also need an API reference too in the docs (which could be generated from the docstrings).

@shcheklein
Copy link
Member

Yes, I was talking specifically about API reference. If you feel that docstring that we have (and plan to have for dvcfs) cover enough of a doc like this https://dvc.org/doc/api-reference/get_url , we should def auto generate. Since docstrings are 100% needed (I would never want to jump into browser also).

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 13, 2022

simplifying API refs and moving the details to auto-generated sites like https://docs.iterative.ai/dvc-task/reference/dvc_task/ ?

that’s a “Developer docs”, not a user facing one

Not sure it's a meaningful distinction @skshetry: The user is a "developer" here (API ref).

If the sites in docs.iterative.ai just repeat docstrings, do we even need them? Apparently unrelated, but the idea was: if we do need them, then we could leave all the dvc.api/fs formalities (signature, types, arguments, etc.) that can be easily auto-generated there, and only use dvc.org/docs/api-ref for the Description and Examples.

does not make sense to me to redirect to a different site

It's pretty common to see different apps for technical explanations (example) vs. specifications (example). That said if the engine could incorporate auto-generated content that would be nice indeed.

@skshetry skshetry added this to DVC Sep 13, 2022
@skshetry skshetry moved this to Backlog in DVC Sep 13, 2022
@skshetry skshetry moved this from Backlog to In Progress in DVC Sep 13, 2022
@dberenbaum
Copy link
Contributor Author

Remaining work here is to have all supported methods linked from the docs:

It makes sense to start small with examples for this PR, but can we hold off closing #3927 until have a reference of supported methods either in the docs or linked from the docs (to an auto-generated docs site)? Compare to the s3fs docs and gcsfs docs. Both of those document the full API of methods they support. Otherwise, we are leaving users to either look through the codebase (without pointing to where they should look) or go to the fsspec AbstractFileSystem docs and use trial and error to find which methods are implemented.

Originally posted by @dberenbaum in #3932 (review)

@skshetry skshetry moved this from In Progress to Backlog in DVC Sep 20, 2022
@skshetry skshetry removed their assignment Sep 20, 2022
@skshetry skshetry removed this from DVC Sep 20, 2022
@jorgeorpinel
Copy link
Contributor

Related discussion: iterative/mlem.ai#171
And initial proposal (with more discussion): iterative/mlem.ai#172

@dberenbaum
Copy link
Contributor Author

Docstrings were added in iterative/dvc#8332. Now how can we publish some docs from them?

@daavoo
Copy link
Contributor

daavoo commented Sep 30, 2022

Docstrings were added in iterative/dvc#8332. Now how can we publish some docs from them?

That's THE question 😅

@dberenbaum
Copy link
Contributor Author

Is there a plan with plugins to separate dvcfs from dvc?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Oct 5, 2022

Do you mean to generate something like this? https://docs.iterative.ai/dvc-task/

That could work. We just need to link these sites from DVC docs more, I think.

@dberenbaum
Copy link
Contributor Author

Yup @jorgeorpinel! The problem here is that dvcfs is one of the few things we haven't made into a separate package, so there's no obvious way to generate that.

@jorgeorpinel
Copy link
Contributor

OK but if the plan is to document it outside of dvc.org maybe we should move this ticket to the core repo.

@dberenbaum
Copy link
Contributor Author

Added a ticket in iterative/dvc#8428. I'd vote to keep this one also to remember to link to whatever gets generated, but if you prefer to close it to reduce noise, no big deal @jorgeorpinel .

@shcheklein
Copy link
Member

@yathomasi @rogermparent technical question - can we use docstring from an external repo as a gatsby node?

@rogermparent
Copy link
Contributor

Yep! Anything that can be accessed with Node.js can be turned into a Gatsby node. I was thinking of making prism commands exported like this to get around our multi-repo command woes. We can also consume that node in any way we want, including having doc pages display the docstring content alongside some optional extra elaboration.

@shcheklein
Copy link
Member

@rogermparent that sounds cool! and I see that multiple repos need this. It can simplify the docs flow - for certain pages we'll use Python files as a source, not Markdown. Let's create a ticket and discuss during the next call.

@skshetry
Copy link
Member

The generated reference docs will have the same content as fsspec's docs. What is our goal with the reference docs?

Do we want to expand upon those docs with examples? If so, then example-driven docs might be better than having a reference. I prefer User Guides that help users get things done (and looking at fastapi, they don't seem to have reference docs).

The only issue that I think this will solve is regarding discoverability, and that won't be fixed if we use a separate site for just dvcfs.

@daavoo
Copy link
Contributor

daavoo commented Oct 17, 2022

and looking at fastapi, they don't seem to have reference docs).

Not a popular decision, though: fastapi/fastapi#804 😅

@skshetry
Copy link
Member

skshetry commented Oct 17, 2022

Not a popular decision, though

For a popular project like that, someone will be unhappy, and only few people have commented. And dvcfs is not in the scale of fastapi. I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).

@daavoo
Copy link
Contributor

daavoo commented Oct 17, 2022

and only few people have commented.

Well . . .

Captura de Pantalla 2022-10-17 a las 13 15 36

I am not against reference docs, but just want to set goals and how we can improve compared to fsspec's existing reference docs (given the same content except for the initializer/constructor).

Yes, in this case, I agree that there is too much overlap with the existing fsspec API ref and that we already cover the differences with examples, so I am ok with not duplicating.

@dberenbaum
Copy link
Contributor Author

My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.

@skshetry
Copy link
Member

skshetry commented Oct 17, 2022

My main concern is that it's unclear which methods from fsspec are implemented by dvcfs.

Maybe I am biased here, but I think "Read only filesystem" may be enough here. Most of the API will raise NotImplementedError or EROFS during runtime.

Also, most of the APIs are implemented in fsspec and only the write-operations are not available during runtime. So this may require us to extend some methods to add docs that they are unavailable.

Anyway, here is the current state of APIs that are unsupported:

  1. mkdir/makedirs/rmdir: are no-ops + optional to implement.

All of the following raises or should raise NotImplementedError/OSError(EROFS):

  1. cp_file/sign/created/modified/rm_file: not extended in dvcfs, default implementation is to raise NotImplementedError.
  2. write_text/pipe_file/put_file/touch -> because open does not support write operations.
  3. copy -> depends on cp_file.
  4. mv -> depends on copy/rm.
  5. rm -> depends on rm_file.
  6. put -> depends on put_file.
  7. pipe -> depends on pipe_file.

@dberenbaum
Copy link
Contributor Author

Good points @skshetry. It looks like auto suggestions and docstrings from within the IDE are working well, and between that, the existings fsspec reference, and the "read only filesystem" description, it should be enough. I'm fine to close this, but if someone else feels more is needed, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference
Projects
None yet
Development

No branches or pull requests

6 participants