Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEXInterpreter: Punch through a /sys/fex/rootfs node #4228

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Sonicadvance1
Copy link
Member

Currently pressure-vessel does a bit of a bodge to get the x86 rootfs path when running inside of FEX. It does this by opening /. which works around our pseudo-overlayfs tracking. While this worked, it wasn't guaranteed to work forever. With #4225 working to fix more issues around how rootfs is laid out, it had to break this path (while adding a workaround for it to keep working).

Give pressure-vessel a blessed path from the EmulatedFiles code paths to get a real fd back to the x86 rootfs, that way if we break this code path it is entirely our problem to fix.

Still need to have a conversation with upstream pressure-vessel to see if they'll accept this path or if we need to do something different.

We can also use this same mechanism in the future if we want to expose more FEX specific data to the application through this.

Currently pressure-vessel does a bit of a bodge to get the x86 rootfs
path when running inside of FEX. It does this by opening `/.` which
works around our pseudo-overlayfs tracking. While this worked, it wasn't
guaranteed to work forever. With FEX-Emu#4225 working to fix more issues around
how rootfs is laid out, it had to break this path (while adding a
workaround for it to keep working).

Give pressure-vessel a blessed path from the EmulatedFiles code paths to
get a real fd back to the x86 rootfs, that way if we break this code
path it is entirely our problem to fix.

Still need to have a conversation with upstream pressure-vessel to see
if they'll accept this path or if we need to do something different.

We can also use this same mechanism in the future if we want to expose
more FEX specific data to the application through this.
@asahilina
Copy link
Contributor

asahilina commented Dec 20, 2024

So the problem here is... my optimization in #4225 of overlay file lookup breaks this, since it only works with files that do already exist on the host FS.

We could split the difference though, and do the lookup once on the raw path (symlinks / cwd-relative paths / funny mount business won't work), and once after opening the backing file normally. That means direct /proc/foo or /sys/foo lookups including this one would work (and not cause any extra opens/syscalls), and opens of overlaid files through weird mounts/symlinks/relative paths/funny paths with ../whatever would only work for existing files. So as long as we declare that /sys/fex/* can only be accessed as an absolute path directly opened, we're good.

Does that sound OK? I'll update #4225 if so.

@Sonicadvance1
Copy link
Member Author

So the problem here is... my optimization in #4225 of overlay file lookup breaks this, since it only works with files that do already exist on the host FS.

We could split the difference though, and do the lookup once on the raw path (symlinks / cwd-relative paths / funny mount business won't work), and once after opening the backing file normally. That means direct /proc/foo or /sys/foo lookups including this one would work (and not cause any extra opens/syscalls), and opens of overlaid files through weird mounts/symlinks/whatever would only work for existing files. So as long as we declare that /sys/fex/* can only be accessed as an absolute path directly opened, we're good.

Does that sound OK? I'll update #4225 if so.

The current expectation is that they will open the file with the absolute path like openat(AT_FDCWD, "/sys/fex/rootfs", O_RDONLY|O_DIRECTORY) or openat2. Even the current system will break if you put a trailing / on to that, since it's exact match check.

If we add more files in here, we expect the same behaviour, except maybe missing O_DIRECTORY in the case of a file instead of this directory redirect we're doing for this case.

@asahilina
Copy link
Contributor

Ah wait, this isn't going to work. You can't just let them open the RootFS, that's what opening / would do anyway in merged mode. What gets cleaned up is the readlink on the proc file, so it still looks like /. So what you want here is to just return the RootFS path as a string or something.

But I wonder... what's the point of this? The way pressure-vessel broke for me is just that without a RootFS detected, it assumed the host wasn't FEX. But I don't get what they need the RootFS path for, it's not like they can access the RootFS in merged-RootFS mode through open() family calls and it still worked?

@Sonicadvance1
Copy link
Member Author

Ah wait, this isn't going to work. You can't just let them open the RootFS, that's what opening / would do anyway in merged mode. What gets cleaned up is the readlink on the proc file, so it still looks like /. So what you want here is to just return the RootFS path as a string or something.

But I wonder... what's the point of this? The way pressure-vessel broke for me is just that without a RootFS detected, it assumed the host wasn't FEX. But I don't get what they need the RootFS path for, it's not like they can access the RootFS in merged-RootFS mode through open() family calls and it still worked?

I haven't looked too closely at how your merged rootfs mode works, but pressure-vessel bind-mounts a few paths. In FEX's case it needs the rootfs (even if that is just /) to know where to bind mount the x86 graphics libraries and glibc inside of its container. Since it pivots most everything over to the steam-linux-runtime, the only thing coming from the host environment is glibc and graphics drivers afaik.

@asahilina
Copy link
Contributor

I need to look more closely into what they're doing and whether there's a cleaner solution here. I know about the bind mounts but I'm not sure exactly how they behave with the rootfs. I do know that without the workaround, they just disable FEX mode entirely when they can't find the rootfs, and then end up trying to do a traditional pivot_root and that just breaks everything since FEX can't run with all of its depenencies missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants