Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a CString safely without overhead from Read #59229

Closed
DevQps opened this issue Mar 16, 2019 · 12 comments
Closed

Reading a CString safely without overhead from Read #59229

DevQps opened this issue Mar 16, 2019 · 12 comments
Labels
A-FFI Area: Foreign function interface (FFI) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@DevQps
Copy link
Contributor

DevQps commented Mar 16, 2019

Hello everyone,

When I was inspecting libflate for unsafe code I found this piece of code:

fn read_cstring<R>(mut reader: R) -> io::Result<CString>
where
    R: io::Read,
{
    let mut buf = Vec::new();
    loop {
        let b = reader.read_u8()?;
        if b == 0 {
            return Ok(unsafe { CString::from_vec_unchecked(buf) });
        }
        buf.push(b);
    }
}

The Problem

If I am correct and didn't miss anything this function is completely safe. However because there is (for as far as I know) no functionality that can safely read a CString without performance overhead, the author probably felt forced to implement it himself.

I think reading CStrings from an object that implements Read is a pretty common operation when handling binary files, so it might be good to provide functionality in the standard library for doing so without sacrificing performance.

Currently two solutions have been presented.

Solution 1: Add a CString::from_reader method

One solution would be to add a CString::from_reader method as follows:

fn CString::from_reader(mut reader: impl Read) -> Result<CString, std::io::Error>
{
    let mut buffer = Vec::new();
    let mut character: u8 = 0;
    
    loop {
        let slice = std::slice::from_mut(&mut character);
        reader.read_exact(slice)?;
    
        // Check if a null character has been found, if so return the Vec as CString.
        if character == 0 {
            return Ok(unsafe { CString::from_vec_unchecked(buffer) });
        }
            
        // Push a new non-null character.
        buffer.push(character);
    }
}

Pro's and Con's:

  • Pro: CStrings can be read using a simple one liner if the source being read implements Read. &[u8] also implements Read so I think this is already quite flexible.
  • Con: It only works when reading from objects that impl Read. I am not sure if there are any scenario's on which this would not be sufficient?

-> Playground example

Solution 2: Add way to convert from Vec<NonZeroU8> to CString

Credits to @alex

Another solution would be to add a conversion method (by for example using the From trait) for Vec<NonZeroU8> to CString. Since we can be sure that no zero characters are included in the Vector we could perform a cheap conversion. I took an attempt to implementing From for CString but it might be improved.

impl From<Vec<NonZeroU8>> for CString
{
    fn from(vector: Vec<NonZeroU8>) -> Self
    {
        unsafe {
            let vector = std::mem::transmute::<Vec<NonZeroU8>, Vec<u8>>(vector);
            CString::from_vec_unchecked(vector)
        }
    }
}

Pro's and Con's:

  • Pro: Everything that can be converted to a Vec<NonZeroU8> can be safely converted into a CString.
  • Con: People would still have to read bytes from a Read````-able object manually and converting them into a Vec``` in order for this to work.

-> Playground example

I wonder what you all think of this proposal and whether or not it could be improved.

@jonas-schievink jonas-schievink added A-FFI Area: Foreign function interface (FFI) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Mar 16, 2019
@Mark-Simulacrum
Copy link
Member

The second option seems very unlikely to be helpful; I can't think of any usecases for having a NonZeroU8 Vec beyond this API. Plus, converting to it would mean essentially implementing this same code - and then you could just use the from_vec_unchecked method.

I think the first proposal is quite reasonable though.

@Shnatsel
Copy link
Member

I think allowing to safely convert Vec<NonZeroU8> to CString is a good idea regardless. It's an obviously correct conversion and it would be a shame not to provide a safe interface for it.
I could see it being used for performance reasons in security-critical code where the authors would like to avoid rolling their own unsafe, since part of the appeal of NonZeroU8 is that you can make the check once and avoid any further checks down the road.

@Shnatsel
Copy link
Member

I wonder why read_until() is only implemented on BufRead, but not Read? There is probably a good reason for that - and if so, we should also implement CString::from_reader() only on BufRead. This should still serve most use cases because Cursor implements BufRead, and external I/O without buffering is unbearably slow so you'd wrap it in buffering anyway.

Here's what an implementation for BufRead could look like:

fn from_reader(mut reader: impl BufRead) -> Result<CString, std::io::Error>
{
    let mut buffer = Vec::new();
    reader.read_until(0, &mut buffer)?;
    if buffer.len() > 0 {
        // 0 has been read into the vector, pop it
        buffer.pop(); //TODO: consider in-place transmute
    } else {
        // no bytes have been read, nothing to do
    }
    return Ok(unsafe { CString::from_vec_unchecked(buffer) });
}

@DevQps
Copy link
Contributor Author

DevQps commented Mar 16, 2019

@Shnatsel I completely agree! I think in most cases (depending on how Read::read is implemented) an implementation for BufRead is probably much faster.

I do however wonder: Wouldn't it be better to do it like this?

fn from_reader(mut reader: impl BufRead) -> Result<CString, std::io::Error>
{
    let mut buffer = Vec::new();
    reader.read_until(0, &mut buffer)?;
    if buffer.len() > 0 {
        return Ok(CString { inner: buffer.into_boxed_slice() });
    } else {
        return Ok(unsafe { CString::from_vec_unchecked(buffer) });
    }
}

This way we can avoid doing a buffer.pop and pushing a null character again in Cstring::from_vec_unchecked. But maybe the compiler would be able to optimize your version such that the performance would be the same.

@DevQps
Copy link
Contributor Author

DevQps commented Mar 20, 2019

The implement PR is here: #59314

I could create one for BufRead as well however. Should I call it from_bufread?

DevQps pushed a commit to DevQps/rust that referenced this issue Mar 20, 2019
@DevQps
Copy link
Contributor Author

DevQps commented Apr 9, 2019

I had a discussion on the PR #59314 with @dtolnay and I decided to close the PR since it does not feel elegant enough yet in my opinion. I feel like there must be some smooth trait based solution. I hope to think of something better.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Jul 26, 2019

Using read_until is nice, although the suggested implementation is not sound, in @DevQps case, and may crop a non-null byte in @Shnatsel case; indeed, read_until may read contents without reaching the expected byte (when EOF is reached).

Here is a fixed (and performant) implementation:

/// # Safety
///
///   - `BufRead` implementor mut enforce its contract
unsafe
fn from_reader_unchecked (
    mut reader: impl BufRead,
) -> Result<CString, ::std::io::Error>
{
    let mut buffer = Vec::new();
    reader.read_until(b'\0', &mut buffer)?;
    if let Some(&b'\0') = buffer.last() {
        // # Safety
        //
        //   - no null bytes before `buffer.last()` (thanks to `.read_until()` contract)
        //
        //   - last byte has already been checked to be null
        CString::from_vec_unchecked(buffer)
    } else {
        buffer.reserve_exact(1);
        buffer.push(b'\0');
        // # Safety
        //
        //   - no null bytes before `buffer.last()` (thanks to `.read_until()` contract)
        //
        //   - terminating null byte has been appended
        CString::from_vec_unchecked(buffer)
    }
}

/// # Safety:
///
///   - `read_until` must enforce its contract.
unsafe trait TrustedBufRead : BufRead {}

#[inline]
fn from_reader (
    reader: impl TrustedBufRead,
) -> Result<CString, ::std::io::Error>
{
    unsafe { from_reader_unchecked(reader) }
}

@HeroicKatora
Copy link
Contributor

HeroicKatora commented Sep 3, 2019

@danielhenrymantilla @Shnatsel's implementation was correct to pop the null-byte, as CString::from_vec_unchecked will add it itself. The unsafety that from_vec_unchecked requires only that the Vec contains no null bytes at all, neither interior nor the final one. So it should be instead:

unsafe
fn from_reader_unchecked (
    mut reader: impl BufRead,
) -> Result<CString, ::std::io::Error>
{
    let mut buffer = Vec::new();
    reader.read_until(b'\0', &mut buffer)?;
    if let Some(&b'\0') = buffer.last() {
        buffer.pop()
        // # Safety
        // last was the first null byte encountered, it's been removed.
        CString::from_vec_unchecked(buffer)
    } else {
        // # Safety
        //  no null bytes before `buffer.last()` (thanks to `.read_until()` contract)
        CString::from_vec_unchecked(buffer)
    }
}

@adamreichold
Copy link
Contributor

It is besides the question of safety and performance, but shouldn't from_reader_unchecked rather raised UnexpectedEof if it does not encounter a null-byte at all?

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Feb 15, 2020
…_from_vec_of_nonzerou8, r=KodrAus

Added From<Vec<NonZeroU8>> for CString

Added a `From<Vec<NonZeroU8>>` `impl` for `CString`

# Rationale

  - `CString::from_vec_unchecked` is a subtle function, that makes `unsafe` code harder to audit when the generated `Vec`'s creation is non-trivial. This `impl` allows to write safer `unsafe` code thanks to the very explicit semantics of the `Vec<NonZeroU8>` type.

  - One such situation is when trying to `.read()` a `CString`, see issue rust-lang#59229.

      - this lead to a PR: rust-lang#59314, that was closed for being too specific / narrow (it only targetted being able to `.read()` a `CString`, when this pattern could have been generalized).

     - the issue suggested another route, based on `From<Vec<NonZeroU8>>`, which is indeed a less general and more concise code pattern.

  - quoting @Shnatsel:

      - >  For me the main thing about making this safe is simplifying auditing - people have spent like an hour looking at just this one unsafe block in libflate because it's not clear what exactly is unchecked, so you have to look it up when auditing anyway. This has distracted us from much more serious memory safety issues the library had.
Having this trivial impl in stdlib would turn this into safe code with compiler more or less guaranteeing that it's fine, and save anyone auditing the code a whole lot of time.
@Shnatsel
Copy link
Member

Shnatsel commented Feb 15, 2020

This is addressed by #64069 and should ship in 1.43

@Shnatsel
Copy link
Member

Shnatsel commented Jun 4, 2020

The new trait impl addressing this has shipped in 1.43, so this issue can be closed.

@DevQps DevQps closed this as completed Jun 5, 2020
@necauqua
Copy link

necauqua commented Oct 6, 2024

How does having a From<Vec<NonZeroU8>> impl solve the issue?
It can be useful and it's nice to have it, but it's not a solution.. like at all

You have a stream of bytes and want to interpret the non-nul prefix as a CString, and you don't want to have a non-stdlib unsafe{} block to to this without extra checks and unwraps.
You still have to check and transmute the vector to the non-zero-u8 variant - am I just missing something? Is there a way to get Vec<NonZero<T>> prefix of a Vec<T>?

It's strange that there's no safe CString::from_non_nul_prefix(&[u8]/Vec<u8>) method in the stdlib, or something like what was proposed here, a more generic from_reader method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-FFI Area: Foreign function interface (FFI) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants