-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OTHER] Improve internal safety comments and architecture #100
Comments
I'll be looking comprehensively over this later today. |
I've made some major progress on this on the safety branch, benchmarking every significant change I've made to avoid any regressions while slowly removing unsafety. I had high-level "unchecked" functions which I removed to allow more localized safety invariants. |
Initial Littany of Safety Enhancements. This removes a lot of unsafe code, documents the cases where removing it would have significant performance impacts but the safety invariants can be easily guaranteed, and likewise makes other enhancements to remove potentially unsafe behavior. This also redoes some architecture to make more code wrapped into safe variants, where rather than say if x.get(0) == b'0'. then do an unchecked index, instead it just has a peek and step in a single function, where applicable.. This also simplifies the code base a lot. Part of many commits to address #100.
I'll be also adding in documentation at the top of each module that uses unsafe code what the general purpose guarantees are and then better |
This has now been closed: ~90% of all unsafety has been removed and all that remains is trivial to verify with major documentation on the vulnerable parts. The crate documentation also now clearly describes the possible sources of unsoundness, why the code is sound, and any other requirements. Once #135 I'll remove this and publish this to crates.io. Finally, a security policy has been created. |
This introduces numerous different layers of security enhancements: 1. Removal of most unsafe code (according to count-unsafe, the code went from 160 unsafe functions and 3088 unsafe exprs to 8 unsafe functions and 1248 unsafe exprs, However, all the remaining unsafe code has much clearly documented safety guarantees and is isolated into safe abstractions. 2. Clear documentation of the locations where unsafe code is used and at the crate-level documentation so it's clearly visible. A security policy has also been added, with stricter requirements for soundness with PRs. Closes #100.
@Alexhuszagh looking briefly through the remaining unsafe, I still feel like a lot of it can still be reduced. E.g. there are safe parser patterns that avoid peek/unchecked semantics. There ought to be ways to encapsulate the unsafety better. 1000+ unsafe blocks, even if documented, is still a lot to review, so I don't suspect anyone will perform a new audit any time soon for this. (There are basically two ways you can have 1000 unsafe blocks: either they're all very different, in which case that's a huge task to review, or there are a lot of similarities, in which case there can probably be some safe abstractions built) |
I tried this a few ways, but one of them was using /// Peek the next value and consume it if the read value matches the
/// expected one.
#[inline(always)]
fn read_if<Pred: FnOnce(&u8) -> bool>(&mut self, pred: Pred) -> Option<Self::Item> {
if let Some(peeked) = self.peek() {
if pred(peeked) {
// SAFETY: the slice cannot be empty because we peeked a value.
unsafe { self.step_unchecked() };
Some(peeked)
} else {
None
}
} else {
None
}
} However, a lot of the remaining unsafety would be in the implementation of the stack vector and the iterator API: rust-lexical/lexical-util/src/iterator.rs Line 30 in aeab322
Both of which are almost entirely encapsulated. We do have a fair bit that could be removed from the stack vector in I'll see if there are locations I can still fit EDIT: Ok this really only affects match-cases where I do a |
A large percentage of it remaining was the I've created #138 to track this. |
Thanks. Yeah I find it incredibly unlikely that this amount of unsafe is necessary for performance, a lot of these encapsulations can be zero cost. 315 is still a lot but more manageable. |
We're down to less than 250 unsafe expressions and almost all of it is within 2 abstractions: We have <60 expressions of unsafe code outside of those 2 abstractions. A lot of the remaining unsafety is code like this: // Compute xi and zi.
// SAFETY: safe, since value must be finite and therefore in the correct range.
// `-324 <= exponent <= 308`, so `x * log10(2) - log10(4 / 3)` must be in
// `-98 <= x <= 93`, so the final value must be in [-93, 98] (for f64). We have
// precomputed powers for [-292, 326] for f64 (same logic applies for f32) so
// this is **ALWAYS** safe.
let pow5 = unsafe { F::dragonbox_power(-minus_k) }; Overall, this introduces 8 unsafe expressions, which out of almost the entire remainder is in the integer writer, which is a well-known algorithm and has a major performance hit if it uses checked indexing. I think this is acceptable. |
I was reviewing the lexical-write-integer crate and I discovered a bunch of things that could be improved wrt safety.
The main issue was the one already filed as #95. That's actual UB, this issue is more about the safety comments and how the invariants are upheld. The
safe
mode doesn't disable this code that has actual UB, either.Besides the issue that already exists, there are a whole bunch of functions (e.g.
write_digits()
) that unsafely index fromtable
are documented with "This is safe as long as the buffer is large enough to holdT::MAX
digits in radixN
.",but they do not include the safety requirement on
table`.Furthermore, some functions (also
write_digits()
) take amut index: usize
argument that is used to directly index. These safety requirements are also not mentioned.Besides safety comments, I found a couple areas where it was hard to review the unsafe code:
A bunch of code relies on the radix being from amongst a valid set (e.g.
get_table()
). May benefit from being an enum; it's hard to tell if this invariant is being upheld.Other code which is calling the
write_*()
functions (e.g.unsigned()
) declare they are safe as long as the buffer can holdFORMATTED_SIZE
elements, but that invariant is not really clear from the functions used, it's kinda hidden away.Might be cool to have a
ValidBuffer<FORMAT>
wrapper that allows calling these write functions in an encapsulated way.Anyway, hope this helps. I don't have time to devote to doing this work myself but I figured I'd share my notes in case someone else does.
The text was updated successfully, but these errors were encountered: