-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match on repr(C) enum returned from C library with unknown value leads to UB #36927
Comments
TL;DR:
It is not a ABI breaking change, as adding a variant to a C enum (as long as the size of the actual integer which represents the enum in memory does not change) has no influence on the registers/memory locations where this enum gets placed when returned or passed as an argument. It is, however, an API-breaking change.
Rustc (its
Matching on |
Ok, so some research suggests that 1) C and C++ have similar, but slightly different rules about the valid values of an enumeration type. However, it seems that neither are limited to the values defined by the declaration. As far as I can tell, an enumeration type in C can take on any value of the underlying type. This means that adding an enum constant may-or-may-not be a breaking ABI change depending on the compiler and target (because of course it is). Fortunately, most platforms will default to With that in mind, we should probably do two things:
|
The fact that Rust enums cause UB when they hold a value that isn't a valid variant is the primary reason I don't use Rust enums in winapi and instead use simple integer constants (The other reason being that integer constants don't require any trait impls or methods and significantly reduce compile times). C/C++ will often even use enums as bitflags which definitely don't align with Rust enums.
Windows API uses enums as bitflags in both C and C++ which indicates that it is not undefined behavior in MSVC C/C++. |
Sure. We purged all Rust enums from rustllvm unless we control the other side. Maybe prohibit |
That's wrong. Creating a Rust enum with an out-of-range value is instant UB - we generate |
I think that there should be a lint against using an enum type in FFI, except for the case where the enum cannot have been created or modified by foreign code. It is way too easy to mess this up. A newtype of an integer is a much better solution. |
Perhaps we should allow |
/subscribe If a C API takes an enum by value, there's a decent chance it's actually bitflags, i.e. you're expected to be able to do |
The usage of C enums as bitflags can be handled in Rust with things like the bitflags crate (or something similar). So that shouldn't be a big deal. But this issue of undefined behavior for "out of range" enums is a real concern for me. bindgen even generates Rust enums from C enums. Describing this current state of affairs as a "foot gun" is a perfect description. Rust really needs an ergonomic solution to using C enums in a Rust FFI setting. Otherwise Rust's safety promises are moot. |
Wouldn't the simplest option be allowing the user to specify the default value to be selected in the event the enum is out-of-range? Something like: #[repr(C, default = Invalid)]
enum Thing {
Invalid,
... At least for the enums I happen to be working with in this FFI, they all support a good choice of a default option. |
Yes, but
I would first approach this with a
I do not agree with this, I do not think that C enumerations are part of Rust's safety measures, what happens is that this problem is unsolvable (in my humble opinion) within the Rust language itself, |
I don't think this provides the guarantees that you want! C enums can still have values outside of the explicitly stated variants. AIUI, basically anything within the same bit range is still legal. |
Note, also, that bindgen allows customizing the enums as needed (generating plain integers, for example). The possibilities would be much larger with rust-lang/rfcs#1758, though. |
Or you could use a regular declarative macro to define enums using normal syntax and have them turned into constants like winapi does. |
How does generating bindings at compile time help? As I mentioned in rust-lang/rust-bindgen#667 if the Rust program dynamically links to the C library then the C library could easily return an "out of range" enum (e.g. if the user's computer has a slightly different version of the C library; C programmers generally don't consider adding a value to an enum to be an API or ABI breaking change). Whether Rust's bindings are generated at compile time or not is irrelevant here. |
Before I take the time to switch over to using numerical constants instead of enums in my own project, may I ask if this is still a concern? I just tried running the playground example that was linked in the original post on every rust version from 1.12.0 up to 1.17.0:
On 1.17.0, however, everything seems to be fine:
It exhibits the same behavior when run on play.rust-lang.org; not one of stable, beta, or nightly segfaults. EDIT: It also doesn't segfault even if you don't have a |
The worst part of UB is that it might just do the "reasonable" thing you'd want... |
Have you tried it with optimization? In my case whether the undefined behavior manifested depended on surrounding code and if it was optimized. |
If by 'optimization' you mean release mode, then yeah. Other than that, no. Regardless, what cuviper said made sense; I see now that it would be foolhardy to rely on this behavior :) |
e.g. ``` cargo build --features bindgen cp target/**/bindings.rs src/bindings.rs ``` This might not be a good idea: rust-lang/rust-bindgen#758 rust-lang/rust#36927
So, it's been two years. re-reading this issue, it seems that this is working as intended, generally. As such, I'm going to give it a close. Anything that changes here seems big enough to require an RFC, and probably should be pursued through those channels. Thanks! |
bindgen option of rustified_enum or newtype_enum are NOT enabled: * newtype_enum: They can't be created from u32, so we can't make our fancy enums with rustic case. * rusitified_enum: They're dangerous if we try to match them: rust-lang/rust#36927 * default: Enum variants are defined as constants. To get strictly typed, we need to wrap them into an `enum`. FFI (fields and functions) will be weakly typed
That's nice!! bindgen option of rustified_enum or newtype_enum are NOT enabled: * newtype_enum: They can't be created from u32, so we can't make our fancy enums with rustic case. * rusitified_enum: They're dangerous if we try to match them: rust-lang/rust#36927 * default: Enum variants are defined as constants and FFI (fields and parameters) will be weakly typed. To get strictly typed, we need to wrap them into our `enum`(s).
Summary
It is not uncommon for C libraries to add more values to an enum in their API while considering it a non-breaking change. However, with a rust library linked to the C library it causes undefined behavior when the change is made if the rust enum definition is not updated.
Examples
I've reduced the issue down to this sample code where get_bad_kind() represents a call into a C library that returns a new enum value unknown to rust which leads to a crash at runtime (signal: 11, SIGSEGV: invalid memory reference or stack overflow): https://play.rust-lang.org/?gist=f066d8e489a6e220866958065891a812&version=stable&backtrace=0
The C version of this code does not result in this issue, instead it hits the default case and returns -1: https://gist.github.com/tcosprojects/49d0c809d40b8f008e25dacf49c508bf
Tested on
rustc 1.12.0 (3191fba 2016-09-23)
binary: rustc
commit-hash: 3191fba
commit-date: 2016-09-23
host: x86_64-unknown-linux-gnu
release: 1.12.0
and
rustc 1.12.0 (3191fba 2016-09-23)
binary: rustc
commit-hash: 3191fba
commit-date: 2016-09-23
host: i686-pc-windows-msvc
release: 1.12.0
Discussion
Discussing the issue on the #rust channel led to the following:
A binding library would need to match the rust definition of the enum with the C definition exactly at all times when linked to a C library to prevent either this undefined behavior or passing unsupported flags to older versions of the C library. This could be done in the cargo build script that checks the C library version and enables feature flags for these enum values. It would still not not guarantee safety though when using dynamically linked libraries. Or it could be resolved by not using repr(C) enums and instead translating them at runtime with a function that handles invalid values. If the latter should be done, it would be helpful to have it mentioned in the docs somewhere. It also makes me uncertain what the purpose of repr(C) enum is, if not to be compatible with and used with C FFI calls.
In any case, as a C programmer this bug was surprising and took some significant debugging. It required going into the assembly output of the match statement to see it jump based on the constant of the highest value from the rust enum definition and then corrupt the stack to pin down the cause of the crash.
A warning about this compatibility issue regarding repr(C) enum in the FFI docs would be helpful. I suspect it is not uncommon for rust bindings libraries to use repr(C) enum this way while expecting it to work.
If the match expression could jump to unreachable!() in debug builds when encountering an unknown value on a repr(C) enum type it would help catch this issue without so much debugging.
Questions
The text was updated successfully, but these errors were encountered: