Slow performance due to ops never replying ENOSYS #130

jmmv · 2016-03-15T20:18:56Z

I am writing a read-only caching FS to support a pretty intensive workload. The goal is to serve source files from this cache for a build, and the build runs via Bazel (which stresses the file system significantly). So far I've been very pleased with the coding experience using these bindings... but I'm now finding some performance issues that don't seem to arise from my own code, hence this bug.

The build issues a very large number of access calls on this FS. Using pprof on the FS, which does not implement the NodeAccesser interface (thus defaulting to the "null" response), I've found that a large chunk of the run time (30 seconds out of 2 minutes of sampling) goes to the Access calls. In turn, all those 30 seconds are spent waiting for the syscall.Write that backs writeToKernel. The FS process consumes a lot of CPU while this is happening, obviously. The rest of the profile is all over the place so it's not interesting for this bug report.

The funny thing is that, if I run this same build over sshfs (written in C using the reference fuse implementation), there is no such high CPU usage — yet the syscalls received by the sshfs process should be the same as the ones received by my custom FS that uses these bindings. And the build happens faster this way.

Therefore, I'm quite certain that there is something going on in this fuse implementation: the fact that there is such contention to write responses back to the kernel seems to imply that there is something wrong. The fuse reference implementation does not seem to suffer from this as the sshfs experiment highlights. I haven't been able to find anything obvious though: in Linux, there is some fanciness to clone the fd to talk to the kernel, but the OS X code doesn't seem to do that.

Passing options.DefaultPermissions() to the mount operation has made this specific problem gone away because access is now never called, but this is a hack. Also, if we are slow to write back to the kernel, this would affect any other operation.

I'm at loss now so I'm filing this bug in case anyone may have some other idea as to what to try or as to where the bug may be.

This is on OS X El Capitan with OSXFUSE 2.8.3.

The text was updated successfully, but these errors were encountered:

jmmv · 2016-03-16T14:50:15Z

I've done a simple test: mounted the "Hello world" from FUSE's C package and the "Hello world" from this Go package.

I then ran many access calls on the same file to count how many calls per second the FS could serve. I did the same later for stat calls, and later for open/read calls.

The C version of the hello world FS consumes 0 CPU for the access and open/read test, and some CPU for the stat test. The Go version consumes significant CPU in all cases.

For the open/read test, the C FS serves up to 230,000 calls per second while the Go FS reaches only 3,500. This is sad.

The thing is that the kernel seems to not call into the C FS as frequently as it does for the Go FS (as observed by the CPU consumption of the FS itself). I wonder if the Go FUSE bindings have gotten some tiny detail wrong and are causing the kernel to issue many more round trips than necessary.

jmmv · 2016-03-17T01:54:25Z

Aha, so I understand what's happening.

The original libfuse returns ENOSYS for any operation that is not implemented by the FUSE file system. Once the kernel sees a reply of this form, it understands that the file system will never reply to this specific system call again and therefore never calls it back.

The Bazil FUSE implementation does not do this. Instead, it never returns ENOSYS, which causes the kernel to call all FUSE operations over and over again on the userspace file system. This behavior is harmful because kernel->userspace roundtrips are expensive.

Given the current design of these Go bindings, I'm not sure how this could be fixed. Maybe the file system module should explicitly whitelist the operations it implements in all of its node types upfront so that the code in fuse.go can return ENOSYS for any non-whitelisted operations. It's not pretty to do it this way, but the performance differences warrant supporting this use-case.

tv42 · 2016-03-17T02:36:45Z

What operations do you see called so much that they cause this overhead, while being safe to ignore?

jmmv · 2016-03-17T13:46:01Z

The one mentioned at the top: Access. The same can probably be said about the xattr calls (and possibly others), though my workload doesn't invoke them.

As somebody put it to me recently, "FUSE can grant more or less control to the userspace daemon depending on the use case", so the libraries that implement the bindings should give the user that choice. At the moment, these libraries make the assumption that all syscalls must be handled at the user level.

tv42 · 2016-03-17T15:28:32Z

I wonder why OSXFUSE calls Access that much. It's pretty rare for applications to call access(2) (most of those calls are Time-of-check to time-of-use race conditions).

bazil.org/fuse can't know that none of the Nodes ever implement Access, so it's hard to return ENOSYS on that level. Why don't you write an Access method and return fuse.ENOSYS.

I'll try to find time to look at the big picture, hopefully soon.

jmmv · 2016-03-18T14:25:48Z

I guess I could make my nodes return ENOSYS themselves... but that'd be an ugly thing to do: the fact that the kernel should never call an operation is a property of the whole file system, not of an individual node. Having to make sure that all nodes are consistent regarding ENOSYS seems fragile and could lead to obscure issues. (E.g. if you have a node that is rarely instantiated and by mistake returns ENOSYS for an operation already implemented in the other nodes, as soon as that node's operation is invoked, the whole file system will silently misbehave.)

I think it makes sense to keep the current behavior as a default for simplicity, but I'd like a way to explicitly override it. What about the following possibilities?

Extend the Serve function to receive an optional list of node types. If the list is not provided, keep the current behavior. If the list is provided, extract the list of all FUSE operations the nodes implement and, for all operations that are never defined, make the serving path return ENOSYS. (Not sure how feasible this is in Go; I'm quite a newbie to the language.) This has the advantage that the Go library can "translate" higher-level operations like ReadAll (that do not exist in the FUSE layer) to the lower level operations that implement them.
Extend the Serve function to receive an optional list of syscalls that the FS is willing to receive (maybe as a bitmap or some other representation). If the list is not provided, keep the existing behavior. If the list is provided, change the serving path to return ENOSYS for any operation not in the list. This has the advantage that it is explicit and one gets full control of the kernel/userspace communication possibilities.

tv42 · 2016-06-06T21:06:47Z

Perhaps something like

type FSPreRequester interface {
    // PreRequest gets to inspect all requests before the corresponding Node or Handle
    // methods are called. If it returns a non-nil error, request processing is aborted and
    // the error is returned to the kernel.
    PreRequest(req fuse.Request) error
}

then you can write a type switch or whatever logic you want in there, and return ENOSYS as appropriate.

I'd like to see an API that could serve debug & trace needs with the same call. Something along the lines of https://godoc.org/bazil.org/fuse/fs#Config WithContext and serving #84 & #65

jmmv · 2016-06-21T02:04:24Z

Not sure. While a "pre-request" hook sounds like a thing that may be good to have in general, I think putting code in there to manually return ENOSYS for a bunch of operations is fragile. The connection of arbitrary code in the "pre-request" to how the node-specific hooks are later invoked would be non-obvious.

What's wrong with the two alternatives I proposed in my last comment? Their benefit is that they are explicit and they are handled transparently. I.e. if the user has defined a node and the bazil API has deemed it as valid at startup time, then one can see that the node's functions will be called where appropriate and all syscalls that have no backing implementation will result in ENOSYS.

tv42 · 2016-06-21T02:13:21Z

I don't see listing ops by number or such as a very nice API.

I don't want to reflect a list of types at runtime to build whatever constructs are needed. Given such a monster, I'd rather defer until a better API arises.

I also want to do some reproducible benchmarks to convince myself of the effect here (and that it cannot be achieved better via general improvements, e.g. #35), and that this problem really is about the FUSE protocol and not just lack of good caching in OSXFUSE.

Significant changes require significant justification. Especially when they don't benefit the common case.

~~Have you tried mounting your fs read-only?~~ (sorry, that was a brainfart)

#143 might be the way out.

jmmv · 2016-06-21T02:23:23Z

Yes, I did mount the file system as read-only and also tried the async flag. In the end, I found that enabling the "default permissions" option made the problem go away because that caused the access operation to never be called.

I don't see how caching could be a root cause here. If you are going to handle the access call in user-space, then the kernel must call you to know what the correct privileges are; if the kernel cached the response and didn't call user-space again, the permissions check would be stale... and that sounds like a pretty bad consequence to me. Of course "caching" could be a root cause for read-only file systems if that's what you meant, but this should be generalizable to writable file systems as well.

Note: I'm not actively developing my FS any longer nor using these bindings, so I have no rush in getting this resolved. I also do not have a deep enough understanding of FUSE nor Go at the moment so I cannot comment on whether #143 is the way out.

tv42 · 2016-06-21T02:30:44Z

I suspect some sort of a caching to be at fault because I've only seen this storm of Access requests on OSXFUSE; do you have a Linux workload that triggers them?

It's not the Access itself that should be cached; it's that the whole call to Access seems unnecessary, if you trust the Linux logic.

jmmv · 2016-07-03T02:46:21Z

Nope, sorry, I do not have more specific test cases at the moment and I did not get to try this on Linux...

I suspect you are right in that OSXFUSE might be unnecessarily calling Access though.

tv42 · 2020-04-15T05:19:02Z

macOS support has been removed. If somebody wants to pick up maintaining an open source macOS FUSE, or wants to fund supporting the proprietary continuation of OSXFUSE, please get in touch.
#224

related to bazil/fuse#130

jmmv changed the title ~~Slow performance (on OS X?)~~ Slow performance due to ops never replying ENOSYS Mar 17, 2016

tv42 added the platform:osx label Jan 17, 2020

tv42 closed this as completed Apr 15, 2020

tv42 added the wontfix label Apr 15, 2020

chrislusf added a commit to seaweedfs/seaweedfs that referenced this issue Jun 17, 2020

mount: a fix to prevent possible repeated calls

b74eced

related to bazil/fuse#130

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance due to ops never replying ENOSYS #130

Slow performance due to ops never replying ENOSYS #130

jmmv commented Mar 15, 2016

jmmv commented Mar 16, 2016

jmmv commented Mar 17, 2016

tv42 commented Mar 17, 2016

jmmv commented Mar 17, 2016

tv42 commented Mar 17, 2016

jmmv commented Mar 18, 2016

tv42 commented Jun 6, 2016

jmmv commented Jun 21, 2016

tv42 commented Jun 21, 2016 •

edited

Loading

jmmv commented Jun 21, 2016

tv42 commented Jun 21, 2016

jmmv commented Jul 3, 2016

tv42 commented Apr 15, 2020

Slow performance due to ops never replying ENOSYS #130

Slow performance due to ops never replying ENOSYS #130

Comments

jmmv commented Mar 15, 2016

jmmv commented Mar 16, 2016

jmmv commented Mar 17, 2016

tv42 commented Mar 17, 2016

jmmv commented Mar 17, 2016

tv42 commented Mar 17, 2016

jmmv commented Mar 18, 2016

tv42 commented Jun 6, 2016

jmmv commented Jun 21, 2016

tv42 commented Jun 21, 2016 • edited Loading

jmmv commented Jun 21, 2016

tv42 commented Jun 21, 2016

jmmv commented Jul 3, 2016

tv42 commented Apr 15, 2020

tv42 commented Jun 21, 2016 •

edited

Loading