-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] api: traversable files
API
#6850
Conversation
45b2c11
to
b1dc02f
Compare
06598c5
to
7ca6f06
Compare
I guess the built-in |
dvc/api.py
Outdated
|
||
|
||
@contextmanager | ||
def files(path=os.curdir, repo=None, rev=None) -> Iterator[RepoPath]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a better suggestion but IMO, the name files
doesn't feel very intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daavoo walk_files
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files()
/walk_files()
sound to me like what I would expect to be returned by os.listdir(os.curdir)
/os.walk(os.curdir)
.
However, it looks like what files()
is returning is more similar to what Path(os.curdir)
would return (unless I'm completely misunderstanding the example snippets in the description).
So, for me, it would be more intuitive to have a name that doesn't suggest that you are going to return an iterator of files. Instead, I would like a name that suggest that you are going to return a context manager with a Path-like instance pointing to some path (defaulting to os.curdir
).
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the usage wasn't immediately intuitive to me. How much value is the 2-line files()
function adding? It might be more clear to use Repo.open()
and RepoPath
directly since they follow conventions of Python built-ins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open for changing the name of the API, would love to hear other alternative namings. Regarding Repo
, it's an undocumented API and to replace files()
, it needs to interact with two different components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_repo_path
/ repo_path
?
I tried:
|
Things I find confusing:
It would be more intuitive to me for users to directly use |
Not sure I understand this, it should be fairly straightforward to document this. I think you are looking it at being two different things,
We do need a context manager to manage resources for the We already have
Users don't need to use separate context managers for different paths, the
|
Thanks for the detailed response @skshetry. Backing up, is this related to any pressing issues on the roadmap? If not, we can take our time making decisions about the interface. To me, it's worth it here because this has potential to be widely useful and maybe even supersede a lot of the existing API while also being way more flexible 🚀 . Two ideas that would aid a lot of my confusion:
Other thoughts:
|
1842db9
to
f504346
Compare
f504346
to
9546b36
Compare
From discussions in #7379, it seems we need to also expose a way to copy a file/directory similar to with files() as root:
root.download(to="path") |
What would be the benefit over exposing |
|
Not sure I understand. My point is that it might be easier for users to follow an API that mimics the CLI for simple use cases. Anyway, let me open a separate discussion since this is really about |
Closing for now, will need to revisit after all the filesystem changes. |
WIP, I don't plan to merge this without more discussions/refinements. The implementation is quite straightforward (half of the changes are type-annotations). See: #6550 (comment) for the motivation.
This API allows users to traverse through Repo, with
pathlib.Path
-like API.eg:
Pathlib APIs supported:
.open
/.read_text
/.read_bytes
/.iterdir
/.exists
/.is_dir
/.isfile
.Extended APIs:
.read
(equivalent todvc.api.read
) /.url
(eq. todvc.api.get_url
).Limited support for:
.glob
/.rglob
Also, all of the
dvc.api
have been migrated to usefiles()
api.Given that we are migrating to fsspec, and the fact that we have
RepoFileSystem
, this does mean that we'll have two similar APIs (though this one is built on top of it). TheRepoFileSystem
may be more powerful, but the pathlib-like API will definitely be user-friendly. We can keep this API private or leave this PR unmerged till we decide on the API.