Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filedesc: don't use ioctl(FIOCLEX) on Linux #62425

Merged
merged 1 commit into from
Jul 11, 2019

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented Jul 5, 2019

All ioctl(2)s will fail on O_PATH file descriptors on Linux (because
they use &empty_fops as a security measure against O_PATH descriptors
affecting the backing file).

As a result, File::try_clone() and various other methods would always
fail with -EBADF on O_PATH file descriptors. The solution is to simply
use F_SETFD (as is used on other unices) which works on O_PATH
descriptors because it operates through the fnctl(2) layer and not
through ioctl(2)s.

Since this code is usually only used in strange error paths (a broken or
ancient kernel), the extra overhead of one syscall shouldn't cause any
dramas. Most other systems programming languages also use the fnctl(2)
so this brings us in line with them.

Fixes: #62314
Signed-off-by: Aleksa Sarai [email protected]

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @shepmaster (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 5, 2019
@cyphar cyphar changed the title filedesc: don't use FIOCLEX on Linux filedesc: don't use ioctl(FIOCLEX or FIONBIO) on Linux Jul 5, 2019
@shepmaster
Copy link
Member

Whew, way past my current level of knowledge! 😇

r? @alexcrichton

@alexcrichton
Copy link
Member

Thanks for the PR! I don't personally know what an O_PATH file descriptor is, but do you know if there's precedent for changes like this in other languages?

These are naively optimized to avoid syscalls into the kernel, although it's never been benchmarked one way or another I imagine. Is it possible though to look for other experience reports to guide which of these in theory should be used for these operations?

@cyphar
Copy link
Contributor Author

cyphar commented Jul 10, 2019

@alexcrichton

I don't personally know what an O_PATH file descriptor is

I can give you a quick run-down if it'll help. Traditionally within Unix there were only three modes you could open a file with (O_RDONLY, O_WRONLY, and O_RDWR -- plus O_APPEND if you want to count it separately).

On Linux (since 2.6.39) there has been an additional way of getting a file handle -- O_PATH. These descriptors are quite strange, in that you cannot read or write to them (or do basically any other operation such as ioctls). However, they are meant to be used as the dirfd argument to openat(2) and other *at(2) syscalls (also introduced in 2.6.x) -- allowing you to operate on a path relative to the dirfd. These *at(2) syscalls allow for programs to avoid a bunch of race conditions (if the dirfd directory gets moved the open will still reference paths underneath it).

However, it turns out that O_PATH descriptors are incredibly useful for a bunch of other dirty hacks clever tricks. We can use them to re-open files without having to trust a root filesystem as well as many other tricks (on Linux you can "re-open" a file by doing an open(2) on /proc/self/fd/$n). Container runtimes (which I work on) use this dirty hack clever trick all the time in order to improve the security of containers against malicious container guests.

The reason why this change is necessary (or rather, "would be nice") is that I'm working on a new library to hopefully improve the security of many projects by providing a safe path resolution implementation. The safety and usability of this path resolution library depends incredibly strongly on the properties of O_PATH descriptors -- and in order to make my library idiomatic Rust it makes sense to wrap these descriptors in a std::fs::File. There's also some kernel work associated with this path resolution work, but I won't get into that here.

I have worked around this issue in my own library (by doing F_DUPFD_CLOEXEC manually) but I'd prefer to have this fixed in Rust so that everyone can benefit from a std::fs::File implementation that works with O_PATH descriptors.

but do you know if there's precedent for changes like this in other languages?

To be honest, I wasn't even aware there was a way to set O_CLOEXEC other than through the fnctl(2) -- all other languages I've done these sorts of things in (Go, C, C++, Python, ...) all use the fnctl(2). But given that O_PATH descriptors are something that (some) Linux developers need to be able to reasonably use -- I think it's fair to say that a systems programming language (like Rust) should accommodate that.

Looking at my change again, I just realised that the FIONBIO (O_NOBLOCK) change doesn't make sense -- it doesn't make any sense to have a non-blocking O_PATH descriptor. I will remove that hunk.

These are naively optimized to avoid syscalls into the kernel, although it's never been benchmarked one way or another I imagine.

While the ioctl does save you one syscall, there are a few things I'd note:

  • set_cloexec is only used if O_CLOEXEC (or F_DUPFD_CLOEXEC) didn't set O_CLOEXEC but also didn't give an error. The comment in File.ensure_cloexec and Fd.duplicate both mention that this is due to a hideously outdated glibc (though I'm not sure that's correct) or a very out-of-date kernel or a broken kernel. So I would argue this case is far from common -- and so the extra overhead of one syscall isn't significant because most users wouldn't have ever hit the ioctl in the first place. The only reason I originally saw this was because of a bug I'd caused in a kernel patch I was testing (which resulted in O_CLOEXEC not working properly.

  • In addition, the F_SETFD code actually checks whether FD_CLOEXEC is already set -- which means that the overhead should be identical for properly-functioning kernels.

  • If you want, I can make it so that each Fd remembers whether the ioctl will fail, and then use fnctl(2) only if the ioctl(2) has already failed once with EBADF. That way, the fast-path is retained without breaking O_PATH descriptors.

  • It should be noted that right now, the only flag defined for F_SETFD is FD_CLOEXEC. So, in principle, we could just skip the F_GETFD. But this would be a horrible idea (since it would cause problems if new flags were ever defined for F_SETFD) -- I just thought I'd mention it.

  • I'm not personally convinced (though I don't have benchmarks to back it up) that one extra fcntl(2) syscall is bad enough to have to work around. If I had to guess, the time it took me to write this message probably dwarfs all of the CPU cycles that will be wasted over the course of a decade because of a change like this. 😉

Is it possible though to look for other experience reports to guide which of these in theory should be used for these operations?

I can look at what other languages do, but I'd like to say as someone who stares at strace output for a living -- this is the first time I've ever seen FIOCLEX. glibc and everyone else I've seen all use fcntl(F_SETFD) because it's the better-documented interface (FIOCLEX isn't in any of the man pages AFAICS) and because it works with O_PATH.

@alexcrichton
Copy link
Member

Thanks for the thorough description! That all sounds great to me. If you want to revert the part for set_nonblocking otherwise r=me

All ioctl(2)s will fail on O_PATH file descriptors on Linux (because
they use &empty_fops as a security measure against O_PATH descriptors
affecting the backing file).

As a result, File::try_clone() and various other methods would always
fail with -EBADF on O_PATH file descriptors. The solution is to simply
use F_SETFD (as is used on other unices) which works on O_PATH
descriptors because it operates through the fnctl(2) layer and not
through ioctl(2)s.

Since this code is usually only used in strange error paths (a broken or
ancient kernel), the extra overhead of one syscall shouldn't cause any
dramas. Most other systems programming languages also use the fnctl(2)
so this brings us in line with them.

Fixes: rust-lang#62314
Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar cyphar changed the title filedesc: don't use ioctl(FIOCLEX or FIONBIO) on Linux filedesc: don't use ioctl(FIOCLEX) on Linux Jul 10, 2019
@cyphar
Copy link
Contributor Author

cyphar commented Jul 10, 2019

Alright, I've dropped that hunk. Thanks!

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jul 10, 2019

📌 Commit 6031a07 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 10, 2019
Centril added a commit to Centril/rust that referenced this pull request Jul 11, 2019
…lexcrichton

filedesc: don't use ioctl(FIOCLEX) on Linux

All `ioctl(2)`s will fail on `O_PATH` file descriptors on Linux (because
they use `&empty_fops` as a security measure against `O_PATH` descriptors
affecting the backing file).

As a result, `File::try_clone()` and various other methods would always
fail with `-EBADF` on `O_PATH` file descriptors. The solution is to simply
use `F_SETFD` (as is used on other unices) which works on `O_PATH`
descriptors because it operates through the `fnctl(2)` layer and not
through `ioctl(2)`s.

Since this code is usually only used in strange error paths (a broken or
ancient kernel), the extra overhead of one syscall shouldn't cause any
dramas. Most other systems programming languages also use the fnctl(2)
so this brings us in line with them.

Fixes: rust-lang#62314
Signed-off-by: Aleksa Sarai <[email protected]>
bors added a commit that referenced this pull request Jul 11, 2019
Rollup of 7 pull requests

Successful merges:

 - #61665 (core: check for pointer equality when comparing Eq slices)
 - #61923 (Prerequisites from dep graph refactoring #2)
 - #62270 (Move async-await tests from run-pass to ui)
 - #62425 (filedesc: don't use ioctl(FIOCLEX) on Linux)
 - #62476 (Continue refactoring macro expansion and resolution)
 - #62519 (Regression test for HRTB bug (issue 30786).)
 - #62557 (Fix typo in libcore/intrinsics.rs)

Failed merges:

r? @ghost
@bors bors merged commit 6031a07 into rust-lang:master Jul 11, 2019
@cyphar cyphar deleted the linux-cloexec-use-fcntl branch July 11, 2019 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stdlib incorrectly handles O_PATH
5 participants