Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hazard with ReadBytesExt methods and continuing after errors #208

Open
eric-seppanen opened this issue Aug 2, 2024 · 5 comments · May be fixed by #209
Open

Hazard with ReadBytesExt methods and continuing after errors #208

eric-seppanen opened this issue Aug 2, 2024 · 5 comments · May be fixed by #209

Comments

@eric-seppanen
Copy link

eric-seppanen commented Aug 2, 2024

I don't think this is a bug in byteorder. But I thought it demonstrates an interesting hazard that might be worth documenting for others.

Some byteorder::ReadBytesExt methods rely on std::io::Read::read_exact, which has a warning about the error case:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

... If this function returns an error, it is unspecified how many bytes it has read, but it will never read more than would be necessary to completely fill the buffer.

This means that e.g. byteorder::read_u16 has the same hazard: if it encounters the end of the reader, the reader is now in an unspecified state.

So code like this is wrong:

pub fn parse_packet(packet: &[u8]) -> Vec<u16> {
    let mut result = Vec::new();
    let mut reader = Cursor::new(packet);

    // Read as many u16s as we can
    while let Ok(word) = reader.read_u16::<LittleEndian>() {
        result.push(word);
    }
    // If there was a byte left over, take that too.
    if let Ok(odd_end_val) = reader.read_u8() {
        result.push(odd_end_val.into())
    }
    result
}

I recently stumbled over code that did exactly this, because the standard library changed its (unspecified) behavior. In older versions, the function above behaved as expected (remainder bytes stayed in the reader after the error). In rust 1.80.0, the remainder bytes are consumed. The following unit test demonstrates (comment out the assert for your version):

#[test]
fn test_odd() {
    let input = [1, 2, 3];
    let output = parse_packet(&input);

    // Rust <= 1.79
    assert_eq!(output, [0x0201, 0x0003]);

    // Rust 1.80
    assert_eq!(output, [0x0201]);
}

The byteorder::ReadBytesExt methods are documented to track the read_exact error behavior:

Errors

This method returns the same errors as Read::read_exact.

But maybe more should be done? I'm not sure how often this pattern might exist in the wild. Maybe the docs should be extended to warn people that the input stream can't be relied on after an error is returned?

@BurntSushi
Copy link
Owner

Thanks for the write-up.

I'm fine adding a note to the docs. Patches welcome.

I'm not sure byteorder itself should be trying anything else here. I'm not sure there's anything else to be done. And in general, I consider it bad juju to keep doing reads while ignoring errors anyway.

@eric-seppanen
Copy link
Author

eric-seppanen commented Aug 2, 2024

I agree for io::Read impls in general.

I also have sympathy for anyone that missed this detail, because a Cursor over a slice looks infallible but still has this "unspecified state" constraint. A Cursor along with byteorder::ReadBytesExt is an temptingly easy way to slap together a decoder of a simple binary protocol.

I think that perhaps rust-lang/rust#125123 is the change that modified the example's behavior under 1.80.0. The change occurs between nightly-2024-05-21 and nightly-2024-05-22.

eric-seppanen added a commit to eric-seppanen/byteorder that referenced this issue Aug 2, 2024
std::io::Read::read_exact documentation says that if an error occurs,
the state of the reader is unspecified: it may have consumed some
number of bytes (between zero and the size of the buffer).

This makes it unwise to continue reading after an error, since it's not
possible to know where the read begins.

A caller may be surprised by this issue, for example by calling
read_u16() until it fails, then calling read_u8() to collect a remainder
byte. This is not guaranteed to work.

This was specifically observed to behave one way on rust 1.79.0, and
then a different way in rust 1.80.0 when using std::io::Cursor as the
reader.

Closes BurntSushi#208 (Documents the problem, which is the best we can do.)
@workingjubilee
Copy link
Contributor

workingjubilee commented Aug 3, 2024

If the new behavior of Cursor's implementation seems more inconvenient for actual usage, I am happy to review another PR to adjust such implementation details. Unspecified is unspecified, after all. But if we do so then we should at least internally document our apparent preference.

( It also seems likely to me that Read has been implemented "both ways" by various types in the ecosystem, so this definitely can't be expected to cut in a particular direction for Read in general. )

@eric-seppanen
Copy link
Author

@workingjubilee I don't think Cursor should change.

I ran my test against a std::io::File and the new Cursor behavior matches the File behavior: trying to read consumes some bytes. So it was the old Cursor behavior that was a bit weird, and relying on that behavior was wrong.

@workingjubilee
Copy link
Contributor

Thanks for the additional context! That's useful to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants