Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q]: splice operations with two pipes #192

Closed
Mic92 opened this issue Oct 18, 2017 · 18 comments
Closed

[Q]: splice operations with two pipes #192

Mic92 opened this issue Oct 18, 2017 · 18 comments

Comments

@Mic92
Copy link

Mic92 commented Oct 18, 2017

Is there a reason you have not used a single pipe and vmsplice instead of using two pipes to write the header? https://github.com/hanwen/go-fuse/blob/master/fuse/splice_linux.go#L22 I am currently working on my own implementation in rust and would be interested why you did that.

@hanwen
Copy link
Owner

hanwen commented Oct 18, 2017

the reply header includes the response size, which you can only know if you read the data out of the fd.

@Mic92
Copy link
Author

Mic92 commented Oct 18, 2017

What libfuse does is to fallback on short read from splice to sending data via iovector instead. However I expect in the average case to have less context switches compared to your implementation. Have you considered doing this? I have not measured both variants.

@hanwen
Copy link
Owner

hanwen commented Oct 19, 2017

oh, that is a good idea; I should have looked at libfuse when I implemented this.

btw, don't copy my API (returning a ReadResult), which is awkward. I somewhat regret that I didn't simply stick with Read(buf []byte), which is much more straightforward. The bazel.org FUSE api, which passes in the request so you can do req.Reply( .. ) is also more straightforward (but a little less composable)

@hanwen
Copy link
Owner

hanwen commented Oct 19, 2017

see 42d2adc

It's not that obvious to me that this that much better. It would be useful to see some benchmarks; I think it depends on the size and frequency of the partial reads.

You could decrease the number of syscalls by not clearing the pipe for successful reads afterwards, but that would make error handling more complicated.

@Mic92
Copy link
Author

Mic92 commented Oct 19, 2017

@Nikratio I would be very interested in your opinion as well.

@Nikratio
Copy link

I'm afraid I can't contribute much. The exact rationale for adding splice support has been lost to the dust of history. For libfuse3, I would have liked to either always use splice or never use it (since I am pretty sure that the majority of filesystem developers and users have no real idea when to use it or not use it). Unfortunately, at the time I couldn't find any good benchmarks and didn't have the time to come up with something myself either (you'd first need to define what exactly a representative workload is).

@Mic92
Copy link
Author

Mic92 commented Oct 20, 2017

So only @szmi could know that.

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

some random measurements:

for go-fuse:

  • splice yields a 10% throughput improvement compared to no splice
  • opportunistic (what the question was originally about) yields another 10% improvement for small files. Curiously, it makes no difference for large files.

it's possible that the difference is larger for libfuse, since libfuse has less memory (de)allocation overhead. Let me test.

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

I tried testing with libfuse3, but example/passthrough and example/passthrough_fh seem to use read through userspace. passthrough_ll looks as if it should be better (mentioning splice), but it is actually 2x slower for bulk reads.

go install github.com/hanwen/go-fuse/example/loopback; fusermount -u /tmp/x/ ; loopback /tmp/x /boot

with splice:

$ go install github.com/hanwen/go-fuse/example/benchmark-read-throughput && benchmark-read-throughput -bs 128 -limit 30000 /tmp/x/initramfs-0-rescue-12a4c82a414b4f18983362ce2122f69a.img
block size 128 kb: 30035.8 MB in 17.790569359s: 1688.30 MBs/s

without

$ go install github.com/hanwen/go-fuse/example/benchmark-read-throughput && benchmark-read-throughput -bs 128 -limit 30000 /tmp/x/initramfs-0-rescue-12a4c82a414b4f18983362ce2122f69a.img
block size 128 kb: 30035.8 MB in 19.25596025s: 1559.82 MBs/s

libfuse3

fusermount -u /tmp/z ; example/passthrough -f /tmp/z/
$ go install github.com/hanwen/go-fuse/example/benchmark-read-throughput && benchmark-read-throughput -bs 128 -limit 30000 /tmp/z/boot/initramfs-0-rescue-12a4c82a414b4f18983362ce2122f69a.img
block size 128 kb: 30035.8 MB in 20.437629672s: 1469.63 MBs/s

passthrough_fh
block size 128 kb: 30035.8 MB in 33.194200978s: 904.85 MBs/s

passthrough_ll
block size 128 kb: 30035.8 MB in 29.325975204s: 1024.21 MBs/s

@Mic92
Copy link
Author

Mic92 commented Oct 20, 2017

I noticed yesterday that passthrough_ll don't use splice for read unless:

diff --git a/lib/fuse_lowlevel.c b/lib/fuse_lowlevel.c
index 031793a..d48539a 100644
--- a/lib/fuse_lowlevel.c
+++ b/lib/fuse_lowlevel.c
@@ -1913,6 +1913,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, const void *inarg)
        LL_SET_DEFAULT(1, FUSE_CAP_ASYNC_DIO);
        LL_SET_DEFAULT(1, FUSE_CAP_IOCTL_DIR);
        LL_SET_DEFAULT(1, FUSE_CAP_ATOMIC_O_TRUNC);
+       LL_SET_DEFAULT(1, FUSE_CAP_SPLICE_WRITE);
        LL_SET_DEFAULT(se->op.write_buf, FUSE_CAP_SPLICE_READ);
        LL_SET_DEFAULT(se->op.getlk && se->op.setlk,
                       FUSE_CAP_POSIX_LOCKS);

is applied.

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

I also tried cluefs which uses bazil.org/fuse. I removed the trace() calls, but

block size 128 kb: 145.8 MB in 6.003518478s: 24.29 MBs/s

(why do people like to use bazil.org/fuse? The mind boggles.)

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

with Mic's patch:

$ go install github.com/hanwen/go-fuse/example/benchmark-read-throughput && benchmark-read-throughput -bs 128 -limit 30000 /tmp/z/boot/initramfs-0-rescue-12a4c82a414b4f18983362ce2122f69a.img
block size 128 kb: 30035.8 MB in 17.019056007s: 1764.84 MBs/s

so, a little faster than go-fuse (which is expected) but only a little (4%, which is pretty good)

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

bazil.org/fuse uses a fresh buffer for each read,

https://github.com/bazil/fuse/blob/master/fs/serve.go#L1199

so large reads are dominated by allocation costs in the FUSE daemon.

@hanwen
Copy link
Owner

hanwen commented Oct 20, 2017

also, you asked about vmsplice, but in case of splicing, that is only useful for writing the header, no? What is the advantage of vmsplice over write(2) ?

@Mic92
Copy link
Author

Mic92 commented Oct 20, 2017

Yes, this is just useful to write the header.

@Mic92
Copy link
Author

Mic92 commented Oct 24, 2017

I wonder if memfd_create or an ordinary tmpfs-baked file could be used instead of a pipe, since it allows to seeking and writing at different offsets. Maybe it is slower because it requires more copies?

memfd would be way slower.

@Mic92
Copy link
Author

Mic92 commented Oct 24, 2017

a micro-benchmark from me comparing splice, memfd, full discard and header discard for one memory page: https://gist.github.com/Mic92/c25ed7c331f6db927b246465420a55d7

@hanwen
Copy link
Owner

hanwen commented Nov 5, 2017

i'm going to close this for now.

@hanwen hanwen closed this as completed Nov 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants