`OsStrExt3::from_bytes` should be unsafe #1594

KillTheMule · 2019-11-03T15:25:51Z

I'm opening this as a result from rust-secure-code/safety-dance#49. The mentioned method uses unsafe, but is not marked unsafe itself. I think that is not right, and while I've found no safety violations here, there's the possibility of a future bug in safe code if the necessary properties aren't upheld. @Shnatsel remarked

I think this might cause some actual memory unsafety down the line, potentially turning into a privilege escalation vulnerability.

and while I don't understand the details of this claim, it's probably worth looking at.

There's some prior art at #1054 and #1058, but the issue of from_bytes was left without action. @H2CO3 said

However, the implementation using mem::transmute() is identical to that of the stdlib, so this shouldn't be too bad, after all.

I'm uncertain about this, as far as I can see, this implementation is only valid on systems where OsStr is an arbitrary byte sequence, e.g. linux, but not on Windows, where OsStr is backed by a struct called Wtf8, where the corresponding method is marked unsafe indeed.

My takeaway is that this seems like an error we should fix. There are quite some usages which are basically trivial to fix by introducing a from_str method on OsStr, like here. There are other usages though which seem harder to fix since slicing an OsStr at arbitray indices doe not work on windows. As far as I can see those usages are bugs and will turn into problems when called with the "right" arguments. They are all in osstringext.rs, so maybe they can be fixed without any change outside that file.

I'm open to sending a PR, but I'd need some guidance on what seems acceptable to you. Also, I'm on linux, so I probably can't run proper benchmarks to see if the changes have a bad effect (note: That's just for osstringext.rs, a potential from_str method should be doable without any performance loss).

The text was updated successfully, but these errors were encountered:

kbknapp · 2019-11-03T19:37:06Z

Thank you for the detailed report!

Without having looking into each use, I think this stems from workarounds that no longer need to exist. I.e. much of what clap needs is just sequences of bytes and not UTF-8 encoded strings. Originally this meant shoehorning OsStr into places (which ultimately led to the issues you've pointed out). However, I think switching to use something like bstr would potentially alleviate some of these issues.

I'll start looking deeper and see if I can come up with a list, including the locations you mentioned, that we should look into fix/replacing.

As far as Windows is concerned, I think this will require some sort of cfg-able abstraction layer over the underlying bstr or whatever we end up using. Some of the coming internal changes I plan on landing over the coming months may make this much easier. Although not really related to this issue, I'd like to break out the common structs clap uses into components that handle their own logic, which will allow us to push much of the related code into smaller components instead of using god-objects. Since this is just internal representation, it shouldn't affect the public API at all.

KillTheMule · 2019-11-04T07:56:10Z

I'll happily provide you a list later on. Would you still be interested in a PR that cleans that up a bit short-term?

Theres one thing I want to point out which I don't fully understand, maybe you can shed some light (and maybe it's a point of note anyways): How is this safe? You are making arg out of args_os by lossy conversion, then you find a certain char to split it at, and then you use the byte index of that char to finally split arg. Is there some property of the conversion that makes this work? If arg is wt8, but not utf8 at the start, this all seems to go haywire.

That said, because of this everything will panic in the non-utf8 case on windows anyways, so maybe we should embrace this for now?

Shnatsel · 2019-11-06T15:20:42Z

We have quite a few levels of abstraction here, so I'll try to go through each one and explain why I think this issue might be a problem in practice.

clap/src/util/osstringext.rs

Lines 24 to 28 in 2a50c2e

    
           impl OsStrExt3 for OsStr { 
        
               fn from_bytes(b: &[u8]) -> &Self { 
        
                   use std::mem; 
        
                   unsafe { mem::transmute(b) } 
        
               }

allows creating an OsStr from an arbitrary byte sequence.

On Windows OsStr is actually backed by a WTF-8 string. Multiple places in WTF-8 implementation assume that WTF-8 is well-formed UTF-8 except for possibility of encountering surrogates, and call str::from_utf8_unchecked() on the byte slices between the surrogates (example)
Many functions implemented on &str rely on &str only ever containing valid UTF-8, but we have just constructed an &str via str::from_utf8_unchecked() without validating that they actually are valid UTF-8 (other than removing unpaired surrogates). From here on in it should be possible to trigger more or less arbitrary memory out-of-bounds access through invoking internally unsafe functions relying on &str validity invariant being upheld.

The above holds if truly arbitrary byte sequences are allowed to enter OsStrExt3::from_bytes. Windows might or might not restrict the incoming byte sequences sufficiently to mitigate this. Even if it's not exploitable now, it is a very fragile primitive and needs to be fixed.

Shnatsel · 2019-11-06T22:42:41Z

Upon further discussion, since this applies to &str and not String, the worst we can see from this is information disclosure, not arbitrary code execution, so this is nowhere near as critical as I initially thought.

dylni · 2020-04-18T23:50:39Z

Hi! I created OsStr Bytes a while ago, but today I removed all unsafe code. This issue came up on that thread, so I was wondering if you would be open to a PR that uses this crate to perform the conversions.

If so, there are some design questions I have about the implementation:

Conversions can no longer be zero-cost, so OsStrExt2 methods would be more expensive. I'm thinking of solving this by creating struct similar to OsStr/OsString inside clap that would only need to perform the conversion once. Does this seem reasonable, or do you prefer another way to approach this?
If I understand this line correctly, I think the + 1 should be + c.len_utf8(). The split_at call later uses this index, so I want to be sure my belief is correct that it's always valid to split there.
From what I've seen, I believe all methods of OsStrExt2 are being used correctly to work with any UTF-8-compatible encoding. However, there are no assertions on these methods to ensure that remains true. If I added these for the new struct, should they be assert! or debug_assert!? I would prefer the former, but I'm not sure how concerned you are about the performance of these methods. Of course, I would use debug_assert! for anything too expensive.

Would this be something you'd want to see added?

Dylan-DPC-zz · 2020-04-18T23:54:11Z

@dylni ya sure, you can go ahead and make a PR :) Safety comes first

dylni · 2020-04-19T00:02:40Z

Great :) I'm a bit busy recently, but I'll send a PR when I can find some time. Plus, that will give this time to gather opinions on implementation.

Dylan-DPC-zz · 2020-04-19T00:04:16Z

That works for us :) Thanks

CreepySkeleton · 2020-04-19T04:09:06Z

@dylni We are totally open to ~~push~~ entrust this job to someone else! 😃 You and your PR are very welcome here!

That would be a bit sad, but I'm seconding Dylan here: correctness is more important than performance, especially in a command line parsing library. Send your PR and, as long as the regression is even a bit reasonable (i.e it's not something crazy as + 10 seconds), we will accept it .

Congratulations! You've found a bug, but this bug needs the stars to be aligned in a very specific way, so virtually nobody triggers it in practice. Must be fixed anyway, of course.

Conditions:

The app has a short option with a non-ASCII name (it's byte length must be > 1). This is why nobody noticed it so far: those names are always ASCII.
There must be no space between the option and the value -фBOOM

Reproducer: (Playground)

use clap::{App, Arg};

fn main() {
    let opts = App::new("app")
        // 'ф' is a cyrillic character, its byte lenght is 2 in utf-8 
        // a Chinese hieroglyph would more fancy, but I don't have hieroglyphs
        // on my keyboard :)
        .arg(Arg::with_name("opt").short("ф").takes_value(true))
        // no space between the option and the value so the split will trigger
        .get_matches_from(&["test", "-фNOOOOOO"]);
    println!("{:?}", opts.value_of("opt").unwrap());
}

Clap 2.x righteously panics, master prints gibberish.

Changing this line to let i = p[0].as_bytes().len() + c.len_utf8(); will fix it.

Let's just go with assert!. I believe CPU's branch predictor will discard those branches as trivially non-taken ones.

dylni · 2020-04-19T16:04:34Z

@CreepySkeleton Thanks for the responses and interest! I don't expect the performance hit to be large, but I am interested to see how measurable it is. Some basic testing has shown the methods to be as fast as the standard library's implementation.

kbknapp added D: intermediate labels Nov 3, 2019

CreepySkeleton added this to the 3.0 milestone Feb 1, 2020

CreepySkeleton mentioned this issue Feb 1, 2020

OsStrExt3 transmutes from an &[u8] to a OsStr #1524

Closed

dylni mentioned this issue Apr 22, 2020

Remove the last unsafe block #1852

Merged

bors bot closed this as completed in #1852 Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`OsStrExt3::from_bytes` should be unsafe #1594

`OsStrExt3::from_bytes` should be unsafe #1594

KillTheMule commented Nov 3, 2019 •

edited

Loading

kbknapp commented Nov 3, 2019

KillTheMule commented Nov 4, 2019

Shnatsel commented Nov 6, 2019

Shnatsel commented Nov 6, 2019

dylni commented Apr 18, 2020 •

edited

Loading

Dylan-DPC-zz commented Apr 18, 2020

dylni commented Apr 19, 2020 •

edited

Loading

Dylan-DPC-zz commented Apr 19, 2020

CreepySkeleton commented Apr 19, 2020

dylni commented Apr 19, 2020

OsStrExt3::from_bytes should be unsafe #1594

OsStrExt3::from_bytes should be unsafe #1594

Comments

KillTheMule commented Nov 3, 2019 • edited Loading

kbknapp commented Nov 3, 2019

KillTheMule commented Nov 4, 2019

Shnatsel commented Nov 6, 2019

Shnatsel commented Nov 6, 2019

dylni commented Apr 18, 2020 • edited Loading

Dylan-DPC-zz commented Apr 18, 2020

dylni commented Apr 19, 2020 • edited Loading

Dylan-DPC-zz commented Apr 19, 2020

CreepySkeleton commented Apr 19, 2020

dylni commented Apr 19, 2020

`OsStrExt3::from_bytes` should be unsafe #1594

`OsStrExt3::from_bytes` should be unsafe #1594

KillTheMule commented Nov 3, 2019 •

edited

Loading

dylni commented Apr 18, 2020 •

edited

Loading

dylni commented Apr 19, 2020 •

edited

Loading