-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support fuse for Fileset #4558
Comments
@jerryshao @shaofengshi @xunliu Can you share your thoughts?Thanks. cc @xloya @YxAc @zhoukangcn |
Thanks @coolderli to bring this out. I think HCFS API is not Posix compliant API, so using HCFS API with fuse has many limitations. I don't know how well fsspec support Posix, we need to investigate. The concern from my side is the performance of using python to achieve fuse, using fuse requires lots of context switch (between user space and kernel space) and it will affect the performance a lot. If we use a dynamic language, then the performance will be worse. Currently, I don't have a better solution, maybe we should investigate more to have a better solution. |
Hi Peidian, I don't have much knowledge about FUSE; For the solution2, is that only available for python? Which means, only in Python application, fsspec mounts a remote storage as a local path, and the user can read/write that from Python codes. |
@shaofengshi Not exactly. Users need to run a piece of Python code to perform a mount operation first, and then they can use other applications to access it. |
Got it; thanks Peidian's input; Not sure how compatible and stable it is, such as OS support, Python versions etc. If that is not very good, we may not be able to persuade a large group of user to use it, this is my concern. |
Hi @coolderli I heard that you encountered some issues with |
The first issue is from fusepy. It's about TypeError, meaning datetime can't be converted to an int. I have fixed it. For now, the fsspec fuse can work. I submit a draft PR about fsspec fuse. You can take a look and have a try. And I need more tests. |
@diqiu50 I'm not sure if the fsspec fuse is a good way. Because in most cases we need to use it in a container environment (k8s), we not only need to implement fuse, but also need to implement k8s CSI. https://juicefs.com/docs/zh/csi/introduction/ On the other hand, using CSI in a cloud environment may not be feasible because it requires some configuration in the k8s cluster, although this is not a problem in our private cluster. Using fsspec fuse can help us avoid this situation, as we do not need to make any modifications to the existing k8s cluster. You can take a look at this article: https://mp.weixin.qq.com/s/j6AlSqKxKInAKeBfADJdOA. The Juice team has also raised similar concerns about CSI. |
@diqiu50 I have tested the fsspec fuse again. I mounted an HDFS directory and found the writer wasn't successful. The list and read are successful.
|
@coolderli Does the problem occur only when using the HDFS file system? |
We track the development process through issue #5504 |
Describe the feature
Implement fuse for gvfs to support mounting fileset to local directories. The instance defaults to mounting
fileset://fileset/fileset_catalog/schema/fileset_name
to/fileset/fileset_catalog/schema/fileset_name
. So we can access it via posix protocol. In addition, we can support mounting to user-defined directories, so that users do not need to modify any code.Motivation
In AI scenarios, users often use the posix protocol to access data. The data is stored in media such as JuiceFS, NAS, or CPFS, and then mounted to a local directory.
Directly using these storage has the following disadvantages:
Describe the solution
Use the underlying fuse directly.
This means that fileset needs to manage a local directory. I think this is not a good solution, users will bypass gvfs without any benefits.
Using fsspec fuse to implement gvfs fuse
fsspec provides the feature of fuse, which supports forwarding fuse operations to fsspec fs operations: https://filesystem-spec.readthedocs.io/en/latest/features.html#mount-anything-with-fuse
We could do some optimization based on fsspec fuse to support gvfs fuse.
Implement gvfs fuse using JNI to call GravitinoVirtualFileSystem
At present, Solution 2 and Solution 3 are similar. Solution 2 is implemented by calling Python gvfs, and Solution 3 is implemented by calling Java gvfs.
Additional context
No response
The text was updated successfully, but these errors were encountered: