-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance due to ops never replying ENOSYS #130
Comments
I've done a simple test: mounted the "Hello world" from FUSE's C package and the "Hello world" from this Go package. I then ran many access calls on the same file to count how many calls per second the FS could serve. I did the same later for stat calls, and later for open/read calls. The C version of the hello world FS consumes 0 CPU for the access and open/read test, and some CPU for the stat test. The Go version consumes significant CPU in all cases. For the open/read test, the C FS serves up to 230,000 calls per second while the Go FS reaches only 3,500. This is sad. The thing is that the kernel seems to not call into the C FS as frequently as it does for the Go FS (as observed by the CPU consumption of the FS itself). I wonder if the Go FUSE bindings have gotten some tiny detail wrong and are causing the kernel to issue many more round trips than necessary. |
Aha, so I understand what's happening. The original libfuse returns The Bazil FUSE implementation does not do this. Instead, it never returns Given the current design of these Go bindings, I'm not sure how this could be fixed. Maybe the file system module should explicitly whitelist the operations it implements in all of its node types upfront so that the code in |
What operations do you see called so much that they cause this overhead, while being safe to ignore? |
The one mentioned at the top: As somebody put it to me recently, "FUSE can grant more or less control to the userspace daemon depending on the use case", so the libraries that implement the bindings should give the user that choice. At the moment, these libraries make the assumption that all syscalls must be handled at the user level. |
I wonder why OSXFUSE calls
I'll try to find time to look at the big picture, hopefully soon. |
I guess I could make my nodes return I think it makes sense to keep the current behavior as a default for simplicity, but I'd like a way to explicitly override it. What about the following possibilities?
|
Perhaps something like
then you can write a type switch or whatever logic you want in there, and return I'd like to see an API that could serve debug & trace needs with the same call. Something along the lines of https://godoc.org/bazil.org/fuse/fs#Config |
Not sure. While a "pre-request" hook sounds like a thing that may be good to have in general, I think putting code in there to manually return What's wrong with the two alternatives I proposed in my last comment? Their benefit is that they are explicit and they are handled transparently. I.e. if the user has defined a node and the bazil API has deemed it as valid at startup time, then one can see that the node's functions will be called where appropriate and all syscalls that have no backing implementation will result in |
I don't see listing ops by number or such as a very nice API. I don't want to reflect a list of types at runtime to build whatever constructs are needed. Given such a monster, I'd rather defer until a better API arises. I also want to do some reproducible benchmarks to convince myself of the effect here (and that it cannot be achieved better via general improvements, e.g. #35), and that this problem really is about the FUSE protocol and not just lack of good caching in OSXFUSE. Significant changes require significant justification. Especially when they don't benefit the common case.
#143 might be the way out. |
Yes, I did mount the file system as read-only and also tried the I don't see how caching could be a root cause here. If you are going to handle the access call in user-space, then the kernel must call you to know what the correct privileges are; if the kernel cached the response and didn't call user-space again, the permissions check would be stale... and that sounds like a pretty bad consequence to me. Of course "caching" could be a root cause for read-only file systems if that's what you meant, but this should be generalizable to writable file systems as well. Note: I'm not actively developing my FS any longer nor using these bindings, so I have no rush in getting this resolved. I also do not have a deep enough understanding of FUSE nor Go at the moment so I cannot comment on whether #143 is the way out. |
I suspect some sort of a caching to be at fault because I've only seen this storm of It's not the |
Nope, sorry, I do not have more specific test cases at the moment and I did not get to try this on Linux... I suspect you are right in that OSXFUSE might be unnecessarily calling |
macOS support has been removed. If somebody wants to pick up maintaining an open source macOS FUSE, or wants to fund supporting the proprietary continuation of OSXFUSE, please get in touch. |
I am writing a read-only caching FS to support a pretty intensive workload. The goal is to serve source files from this cache for a build, and the build runs via Bazel (which stresses the file system significantly). So far I've been very pleased with the coding experience using these bindings... but I'm now finding some performance issues that don't seem to arise from my own code, hence this bug.
The build issues a very large number of
access
calls on this FS. Usingpprof
on the FS, which does not implement theNodeAccesser
interface (thus defaulting to the "null" response), I've found that a large chunk of the run time (30 seconds out of 2 minutes of sampling) goes to theAccess
calls. In turn, all those 30 seconds are spent waiting for thesyscall.Write
that backswriteToKernel
. The FS process consumes a lot of CPU while this is happening, obviously. The rest of the profile is all over the place so it's not interesting for this bug report.The funny thing is that, if I run this same build over sshfs (written in C using the reference fuse implementation), there is no such high CPU usage — yet the syscalls received by the sshfs process should be the same as the ones received by my custom FS that uses these bindings. And the build happens faster this way.
Therefore, I'm quite certain that there is something going on in this fuse implementation: the fact that there is such contention to write responses back to the kernel seems to imply that there is something wrong. The fuse reference implementation does not seem to suffer from this as the sshfs experiment highlights. I haven't been able to find anything obvious though: in Linux, there is some fanciness to clone the fd to talk to the kernel, but the OS X code doesn't seem to do that.
Passing
options.DefaultPermissions()
to the mount operation has made this specific problem gone away becauseaccess
is now never called, but this is a hack. Also, if we are slow to write back to the kernel, this would affect any other operation.I'm at loss now so I'm filing this bug in case anyone may have some other idea as to what to try or as to where the bug may be.
This is on OS X El Capitan with OSXFUSE 2.8.3.
The text was updated successfully, but these errors were encountered: