-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsspec backend #5162
Comments
fsspec looks like an interesting project, thanks for the suggestion! We already use a similar abstraction internally in DVC, and I'm not sure how much we would actually gain by switching to an alternative implementation. Is there a particular remote FS type that is supported in fsspec that you are looking for in DVC? |
We have considered it previously while working on pyarrow (it uses fsspec too), but for now decided that it is not really worth refactoring so much stuff just for the sake of it. It will probably happen in the future, once we are ready for it and have time for it. |
For the record, we are indeed going in fsspec direction in our internals. Current roadmap is
also somewhere after the first step we can rename our Tree/tree to FileSystem/FS/fs/etc so it is less confusing. |
Hey, do you have any idea which |
@BenLinnik There is no formal list yet, but you can see all required methods in https://github.com/iterative/dvc/blob/master/dvc/fs/azure.py , for example. (e.g. see all self.fs.* calls). |
Seems straightforward. Nothing too fancy. I could do that. When looking at the azure code, I wonder. Why don't you use a fs method in the |
@BenLinnik We do, through |
Ah sorry, you are absolutely right. The fs.open is in self.open. A great idea to move to fsspec. This will make it possible to integrate dvc with any fsspec supported framework. That's a clever architectural choice, despite it makes my PR obsolete at the moment 👍 |
(fsspec admin here) I just noticed this issue, and am glad to see the energy, and the appearance of an fsspec backend in your API, alongside your own implementations. Please don't hesitate to help us improve every data user's experience! Did you know that fsspec is interested in exposing DVC as a filesystem backend? In https://github.com/intake/filesystem_spec/blob/master/fsspec/implementations/dvc.py is some nascent code which is not references in the official docs or tested. It has been a while since I looked at that code, so it may not do anything useful by now. However, I do think it's generally useful to have a simple abstraction over-layer to DVC in fsspec, even if that ends up calling other fsspec backends internally when fetching files. |
Hi @martindurant ! Thanks a lot for creating and maintaining fsspec! 🙏 Was meaning to send a similar introductory message to you today 😉 We were inventing something similar ourselves with our legacy Tree classes, but then stumbled upon fsspec and were impressed by how thought through the spec is. Really wish fsspec existed in the early days of DVC 😃 Thanks to great work by @isidentical we were able to temporarily wrap and use
We've had |
Certainly! I might delete the existing module, then, since it's incomplete and not used. Let me know when you have something you feel can be exposed in fsspec, and I am happy to give feedback on it too. |
This is awesome! I think the more fundamental project is fsspec. dvc should be on top of fsspec. Then, when using Python, dvc should just present (versioned) files by passing fs objects to the user (from fsspec). The core idea of dvc is that it just adds a version 'argument' to paths: it should be usable on the fs as well as in Python. |
@martindurant We are quickly moving towards it, I would say that DvcFileSystem should be ready in a month or so. We'll definitely reach out when we are ready. Thank you so much for the great work! 🙏 @majidaldo It will likely be similar to what we have right now. E.g. you could do |
March 3:
On a serious note, we've underestimated this initially, but thanks to tons of great work by @isidentical 🙏 we've migrated all of our currently supported clouds to fsspec and implemented a few fsspec-compatible filesystems ourselves from scratch:
Now we are working on removing some wrappers that we have (mainly related to PathInfo, but there is more) and will soon cleanup dvcfs/repofs to make it sharable as well. Plugins will likely be introduced a bit later, maybe in Q1. |
Closing, we are already using fsspec-based filesystems. And we have dvc-data and dvc-objects projects that are using those, this is no longer very relevant in dvc. We can consider this as done. |
For the record: |
@shcheklein Once it is ready - yes. dvcfs wrap up/documentation is in our potential plans for q3 |
Hopefully, dvcfs can address some of our pressing Python API needs like #3182. |
I really think dvc should have this backend. There is much history to this filesystem abstraction so it could leverage the backends for fsspec.
The text was updated successfully, but these errors were encountered: