[OTHER] Improve internal safety comments and architecture #100

Manishearth · 2023-07-10T14:17:39Z

I was reviewing the lexical-write-integer crate and I discovered a bunch of things that could be improved wrt safety.

The main issue was the one already filed as #95. That's actual UB, this issue is more about the safety comments and how the invariants are upheld. The safe mode doesn't disable this code that has actual UB, either.

Besides the issue that already exists, there are a whole bunch of functions (e.g. write_digits()) that unsafely index from table are documented with "This is safe as long as the buffer is large enough to hold T::MAX digits in radix N.",but they do not include the safety requirement on table`.

Furthermore, some functions (also write_digits()) take a mut index: usize argument that is used to directly index. These safety requirements are also not mentioned.

Besides safety comments, I found a couple areas where it was hard to review the unsafe code:

A bunch of code relies on the radix being from amongst a valid set (e.g. get_table()). May benefit from being an enum; it's hard to tell if this invariant is being upheld.

Other code which is calling the write_*() functions (e.g. unsigned()) declare they are safe as long as the buffer can hold FORMATTED_SIZE elements, but that invariant is not really clear from the functions used, it's kinda hidden away.

Might be cool to have a ValidBuffer<FORMAT> wrapper that allows calling these write functions in an encapsulated way.

Anyway, hope this helps. I don't have time to devote to doing this work myself but I figured I'd share my notes in case someone else does.

The text was updated successfully, but these errors were encountered:

Alexhuszagh · 2024-09-09T13:49:59Z

I'll be looking comprehensively over this later today.

Closes #104. Part of fixes to addrss #100.

Closes #104 Part of fixes to address #100

Alexhuszagh · 2024-09-10T05:59:18Z

I've made some major progress on this on the safety branch, benchmarking every significant change I've made to avoid any regressions while slowly removing unsafety. I had high-level "unchecked" functions which I removed to allow more localized safety invariants.

Initial Littany of Safety Enhancements. This removes a lot of unsafe code, documents the cases where removing it would have significant performance impacts but the safety invariants can be easily guaranteed, and likewise makes other enhancements to remove potentially unsafe behavior. This also redoes some architecture to make more code wrapped into safe variants, where rather than say if x.get(0) == b'0'. then do an unchecked index, instead it just has a peek and step in a single function, where applicable.. This also simplifies the code base a lot. Part of many commits to address #100.

Alexhuszagh · 2024-09-12T20:57:53Z

I'll be also adding in documentation at the top of each module that uses unsafe code what the general purpose guarantees are and then better # SAFETY invariant documentation internally. There's a few cases where I wasn't able to remove unsafety (this is performance related, often with a 20%+ hit in performance) despite relative easy of proving that the code was safe so I'm also commenting on why the unsafety remains.

Alexhuszagh · 2024-09-14T22:14:50Z

This has now been closed: ~90% of all unsafety has been removed and all that remains is trivial to verify with major documentation on the vulnerable parts. The crate documentation also now clearly describes the possible sources of unsoundness, why the code is sound, and any other requirements.

Once #135 I'll remove this and publish this to crates.io.

Finally, a security policy has been created.

This introduces numerous different layers of security enhancements: 1. Removal of most unsafe code (according to count-unsafe, the code went from 160 unsafe functions and 3088 unsafe exprs to 8 unsafe functions and 1248 unsafe exprs, However, all the remaining unsafe code has much clearly documented safety guarantees and is isolated into safe abstractions. 2. Clear documentation of the locations where unsafe code is used and at the crate-level documentation so it's clearly visible. A security policy has also been added, with stricter requirements for soundness with PRs. Closes #100.

Manishearth · 2024-09-15T07:33:13Z

@Alexhuszagh looking briefly through the remaining unsafe, I still feel like a lot of it can still be reduced. E.g. there are safe parser patterns that avoid peek/unchecked semantics. There ought to be ways to encapsulate the unsafety better.

1000+ unsafe blocks, even if documented, is still a lot to review, so I don't suspect anyone will perform a new audit any time soon for this.

(There are basically two ways you can have 1000 unsafe blocks: either they're all very different, in which case that's a huge task to review, or there are a lot of similarities, in which case there can probably be some safe abstractions built)

Alexhuszagh · 2024-09-15T09:37:13Z

(There are basically two ways you can have 1000 unsafe blocks: either they're all very different, in which case that's a huge task to review, or there are a lot of similarities, in which case there can probably be some safe abstractions built)

I tried this a few ways, but one of them was using read_if and encapsulating everything there, which unfortunately didn't have close to the performance I was looking for.

    /// Peek the next value and consume it if the read value matches the
    /// expected one.
    #[inline(always)]
    fn read_if<Pred: FnOnce(&u8) -> bool>(&mut self, pred: Pred) -> Option<Self::Item> {
        if let Some(peeked) = self.peek() {
            if pred(peeked) {
                // SAFETY: the slice cannot be empty because we peeked a value.
                unsafe { self.step_unchecked() };
                Some(peeked)
            } else {
                None
            }
        } else {
            None
        }
    }

However, a lot of the remaining unsafety would be in the implementation of the stack vector and the iterator API:

rust-lexical/lexical-util/src/iterator.rs

Line 30 in aeab322

pub unsafe trait BytesIter<'a>: Iterator<Item = &'a u8> + Buffer<'a> {

https://github.com/Alexhuszagh/rust-lexical/blob/aeab32205c18c4569402f83817eeb9fb983ee64b/lexical-parse-float/src/bigint.rs

Both of which are almost entirely encapsulated. We do have a fair bit that could be removed from the stack vector in Bigint since a lot of methods for that code could be removed.

I'll see if there are locations I can still fit read_if in where it doesn't have noticeable on performance.

EDIT: Ok this really only affects match-cases where I do a read_if for 2 cases to handle the steps. Which makes sense. Benchmarking the rest now.

These used to have safety considerations although none do anymore, but they require validation or there could be runtime panics. - Addresses #100 - Addresses #138

Alexhuszagh · 2024-09-15T11:05:35Z

A large percentage of it remaining was the Options API, which originally had only 2 functions in it that were actually unsafe (set_nan_string, set_inf_string), but the rest were reserved as unsafe since any use without validating our format specifications with is_valid was breaking our guarantees. However, none of those are unsafe and after removing them, we're at 315 total unsafe expressions in lexical-core, of which the majority are in 1 or 2 encapsulated classes (the 417 includes the test suite, which is used to test the validity of any unsafe logic for soundness, and therefore is a misleading number).

I've created #138 to track this.

Manishearth · 2024-09-15T17:48:44Z

Thanks. Yeah I find it incredibly unlikely that this amount of unsafe is necessary for performance, a lot of these encapsulations can be zero cost. 315 is still a lot but more manageable.

Alexhuszagh · 2024-09-15T23:17:04Z

We're down to less than 250 unsafe expressions and almost all of it is within 2 abstractions: Iter and Bigint. Since these abstractions are fully encapsulated except for a few invocations where the guarantees are clearly written, there is practically no non-encapsulated unsafety left, so I feel comfortable reclosing this.

We have <60 expressions of unsafe code outside of those 2 abstractions. A lot of the remaining unsafety is code like this:

    // Compute xi and zi.
    // SAFETY: safe, since value must be finite and therefore in the correct range.
    // `-324 <= exponent <= 308`, so `x * log10(2) - log10(4 / 3)` must be in
    // `-98 <= x <= 93`, so the final value must be in [-93, 98] (for f64). We have
    // precomputed powers for [-292, 326] for f64 (same logic applies for f32) so
    // this is **ALWAYS** safe.
    let pow5 = unsafe { F::dragonbox_power(-minus_k) };

Overall, this introduces 8 unsafe expressions, which out of almost the entire remainder is in the integer writer, which is a well-known algorithm and has a major performance hit if it uses checked indexing.

I think this is acceptable.

Manishearth assigned Alexhuszagh Jul 10, 2023

Alexhuszagh added enhancement New feature or request high priority High priority A-sec Issues with potential security implications. labels Sep 9, 2024

Alexhuszagh added this to the 1.0 milestone Sep 9, 2024

Alexhuszagh added a commit that referenced this issue Sep 9, 2024

Ensure BytesIter is an unsafe trait.

113252e

Closes #104. Part of fixes to addrss #100.

Alexhuszagh mentioned this issue Sep 9, 2024

Ensure BytesIter is an unsafe trait. #115

Merged

Alexhuszagh added a commit that referenced this issue Sep 9, 2024

Ensure BytesIter is an unsafe trait.

47fb4b8

Closes #104 Part of fixes to address #100

Alexhuszagh added a commit that referenced this issue Sep 9, 2024

Ensure BytesIter is an unsafe trait.

b07e866

Closes #104 Part of fixes to address #100

Alexhuszagh mentioned this issue Sep 11, 2024

Initial Littany of Safety Enhancements #122

Merged

This was referenced Sep 13, 2024

Document Internal Safety and Remove Most Unsafety #127

Merged

Refactor for Increased Safety without Performance Impacts #135

Merged

Alexhuszagh closed this as completed in #135 Sep 14, 2024

Alexhuszagh reopened this Sep 15, 2024

Alexhuszagh added a commit that referenced this issue Sep 15, 2024

Remove unsafe annotations for the options API.

dbe6c0e

These used to have safety considerations although none do anymore, but they require validation or there could be runtime panics. - Addresses #100 - Addresses #138

Alexhuszagh added a commit that referenced this issue Sep 15, 2024

Remove unsafe annotations for the options API.

c032641

These used to have safety considerations although none do anymore, but they require validation or there could be runtime panics. - Addresses #100 - Addresses #138

Alexhuszagh added a commit that referenced this issue Sep 15, 2024

Remove unsafe annotations for the options API.

b6a84b0

These used to have safety considerations although none do anymore, but they require validation or there could be runtime panics. - Addresses #100 - Addresses #138

Alexhuszagh mentioned this issue Sep 15, 2024

Remove unsafe annotations for the options API. #139

Merged

Alexhuszagh modified the milestones: 1.0, 1.0.1 Sep 15, 2024

Alexhuszagh mentioned this issue Sep 15, 2024

Update documentation and make method names more expressive. #142

Merged

Alexhuszagh closed this as completed in #142 Sep 15, 2024

Alexhuszagh closed this as completed in e3f01cb Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OTHER] Improve internal safety comments and architecture #100

[OTHER] Improve internal safety comments and architecture #100

Manishearth commented Jul 10, 2023

Alexhuszagh commented Sep 9, 2024

Alexhuszagh commented Sep 10, 2024

Alexhuszagh commented Sep 12, 2024

Alexhuszagh commented Sep 14, 2024 •

edited

Loading

Manishearth commented Sep 15, 2024

Alexhuszagh commented Sep 15, 2024 •

edited

Loading

Alexhuszagh commented Sep 15, 2024 •

edited

Loading

Manishearth commented Sep 15, 2024

Alexhuszagh commented Sep 15, 2024

[OTHER] Improve internal safety comments and architecture #100

[OTHER] Improve internal safety comments and architecture #100

Comments

Manishearth commented Jul 10, 2023

Alexhuszagh commented Sep 9, 2024

Alexhuszagh commented Sep 10, 2024

Alexhuszagh commented Sep 12, 2024

Alexhuszagh commented Sep 14, 2024 • edited Loading

Manishearth commented Sep 15, 2024

Alexhuszagh commented Sep 15, 2024 • edited Loading

Alexhuszagh commented Sep 15, 2024 • edited Loading

Manishearth commented Sep 15, 2024

Alexhuszagh commented Sep 15, 2024

Alexhuszagh commented Sep 14, 2024 •

edited

Loading

Alexhuszagh commented Sep 15, 2024 •

edited

Loading

Alexhuszagh commented Sep 15, 2024 •

edited

Loading