Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FUSE Passthrough Support in Stargz-Snapshotter #1867 #1868

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

wswsmao
Copy link
Contributor

@wswsmao wswsmao commented Nov 19, 2024

Here’s a proposed implementation plan for FUSE passthrough:

  1. During the Open phase, attempt to pre-read the entire file instead of reading it in chunks.
  2. Utilize the existing cache's Get method to retrieve the fd of the cached file that has been written to local storage.
  3. Implement the FilePassthroughFder interface in node.file, allowing the fd from step 2 to be registered with the kernel via go-fuse.

By following this approach, subsequent Read operations would not need to return to user space, and go-fuse would release the registered information when necessary.

for details,
#1867

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -148,4 +148,7 @@ type FuseConfig struct {

// EntryTimeout defines TTL for directory, name lookup in seconds.
EntryTimeout int64 `toml:"entry_timeout"`

// PassThrough indicates whether to enable FUSE passthrough mode to improve local file read performance. Default is false.
PassThrough bool `toml:"passthrough"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test this in our CI?

b.Reset()
b.Grow(int(chunkSize))
ip := b.Bytes()[:chunkSize]
if _, err := sf.fr.ReadAt(ip, chunkOffset); err != nil && err != io.EOF {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if the data can be read from the cache?

@@ -82,6 +82,7 @@ type BlobCache interface {
type Reader interface {
io.ReaderAt
Close() error
GetReaderAt() io.ReaderAt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment like If a blob is backed by a file, it should return *os.File so that it can be used for FUSE passthrough.?

fs/layer/node.go Outdated
Comment on lines 359 to 360
n.fs.s.report(fmt.Errorf("node.Open: %v", err))
return nil, 0, syscall.EIO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we continue opening this as a non-passthrough file, instead of returning EIO?

@@ -82,6 +82,7 @@ type BlobCache interface {
type Reader interface {
io.ReaderAt
Close() error
GetReaderAt() io.ReaderAt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When FUSE passthrough is enabled, we should always set directoryCache.direct to true so that we can ensure that *directoryCache.Get always return *os.File (not a buffer).

@wswsmao wswsmao force-pushed the passthrough branch 9 times, most recently from 3fd3113 to 5dba826 Compare November 20, 2024 06:39
@wswsmao
Copy link
Contributor Author

wswsmao commented Nov 20, 2024

all done @ktock

@wswsmao
Copy link
Contributor Author

wswsmao commented Nov 20, 2024

This is my benchmark
https://github.com/wswsmao/fuse-performance

I create three sets of files with sizes of 50, 75, and 100MB respectively.
Using these, I've created a test image named abushwang/ocs9:fuseperf-orig.

Subsequently, I used the default value for the --estargz-chunk-size parameter to create an estargz image.

$ nerdctl image convert --estargz --oci abushwang/ocs9:fuseperf-orig abushwang/ocs9:fuseperf-esgz

To ensure a clean environment for each test, I removed the local image and cache, and then restarted the containerd-stargz-grpc service.

$ rm -rf /var/lib/containerd-stargz-grpc/stargz/fscache/*
$ nerdctl images -q | xargs nerdctl rmi -f"

This is the report:

  • no passthrough
$ nerdctl run --rm -t --snapshotter=stargz abushwang/ocs9:fuseperf-esgz
docker.io/abushwang/ocs9:fuseperf-esgz:                                           resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:13868ec859962d0ae07afe0db9f48208e54f38afbaa337e4237aa1e5831c24c0: done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:0abfbeeb608160b63cc9e9e3c351c639b1a53a0bf99a27109d96adc4a75e3c97:   done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 7.1 s                                                                    total:  5.1 Ki (740.0 B/s)                                       
Running file access tests...
Testing file: large_files/large_file_1
Testing file: large_files/large_file_2
Testing file: large_files/large_file_3
Performance Report
==================
File: large_file_1
  Sequential Read Time: 8329.96 ms
--------------------------
File: large_file_2
  Sequential Read Time: 11704.64 ms
--------------------------
File: large_file_3
  Sequential Read Time: 15271.63 ms
--------------------------
Tests completed. See performance_report.txt for details.
  • with passthrough
$ nerdctl run --rm -t --snapshotter=stargz abushwang/ocs9:fuseperf-esgz
docker.io/abushwang/ocs9:fuseperf-esgz:                                           resolved       |++++++++++++++++++++++++++++++++++++++| 
manifest-sha256:13868ec859962d0ae07afe0db9f48208e54f38afbaa337e4237aa1e5831c24c0: done           |++++++++++++++++++++++++++++++++++++++| 
config-sha256:0abfbeeb608160b63cc9e9e3c351c639b1a53a0bf99a27109d96adc4a75e3c97:   done           |++++++++++++++++++++++++++++++++++++++| 
elapsed: 7.1 s                                                                    total:  5.1 Ki (740.0 B/s)                                       
Running file access tests...
Testing file: large_files/large_file_1
Testing file: large_files/large_file_2
Testing file: large_files/large_file_3
Performance Report
==================
File: large_file_1
  Sequential Read Time: 344.62 ms
--------------------------
File: large_file_2
  Sequential Read Time: 518.38 ms
--------------------------
File: large_file_3
  Sequential Read Time: 667.25 ms
--------------------------
Tests completed. See performance_report.txt for details.

env:

$ uname -r
6.11.5-200.fc40.x86_64

Comment on lines 200 to 202
// isFusePthEnable prevents users from enabling passthrough mode on unsupported kernel versions
func isFusePthEnable() (bool, error) {
cmd := exec.Command("sh", "-c", "grep 'CONFIG_FUSE_PASSTHROUGH=y' /boot/config-$(uname -r)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's try not to rely on the shell commands for now. Instead of having this check, let's put a document about how to check if passthrough is a supported on the node (maybe in the following PRs)

Copy link
Member

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@ktock ktock merged commit 2a280d6 into containerd:main Nov 21, 2024
26 checks passed
@ujjwal
Copy link

ujjwal commented Dec 2, 2024

Does the kernel version need to be updated to 6.9 to test out passthrough?

@wswsmao
Copy link
Contributor Author

wswsmao commented Dec 2, 2024

Does the kernel version need to be updated to 6.9 to test out passthrough?

yes,or rebase this feature
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6ce8b2ce0d7e3a621cdc9eb66d74436ca7d0e66e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants