Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make String#[] not read out-of-bounds if string ends in unicode char #5257

Merged
merged 2 commits into from
Nov 22, 2017

Conversation

Papierkorb
Copy link
Contributor

Hello,

While working on fancyline, I noticed that String#[] didn't do proper bound checking,
but relied on Char::Reader for this instead. Char::Reader didn't do bound-checking
either (At all, actually), creating an out-of-bounds memory read.

Sample code

"ß"[1] # => Char(0), expected an IndexError
"ß"[1]? # => Char(0), expected `nil`

# And this does an unchecked memory read:
Char::Reader.new("", pos: 100).current_char
# ^ now always gives `Char(0)`.  Changing it to just raise breaks a ton
# of assumptions made in String.

Both showcased behaviours are fixed by this PR for now.

Before this change, Char::Reader#decode_char_at() wouldn't detect that
an UTF-8 sequence was cut off prematurely.  Instead, it continued on
reading, leading to an out-of-bounds read.

This fix should be regarded as band-aid for Char::Reader.
As Char::Reader signals the end of a string by returning a NUL-char,
we need to check in String#[] if the read could go out of bounds.  This
only triggers if the first byte of the character would be OOB already,
which is fine.  If a later byte in the character sequence is missing,
Char::Reader would notice it.
@straight-shoota
Copy link
Member

straight-shoota commented Nov 8, 2017

Is this behaviour really unexpected? A string is delimited by a zero character: "ß" describes a string containing the characters 'ß' and '\0'. So, the character with index 1 is obviously Char::ZERO.

If you use an index that is actually out of bounds (like "b"[2]) this will raise an IndexError (or return nil). There is a out-of-bounds check, you're just off by one.

To me, everything seems to be just as it is supposed to and I don't think this needs a change.

@Papierkorb
Copy link
Contributor Author

Papierkorb commented Nov 8, 2017

  1. The trailing NUL-byte is an implementation detail.
  2. container[container.size] is out of bounds for any container type, why should String violate this?
  3. "a"[1] is OOB, as expected

This is a bug with unicode handling.

@straight-shoota
Copy link
Member

From your description it was not clear that multibyte characters behave differently than ASCII characters. This should obviously be fixed.

Btw. Char::Reader yields \0 and this is part of a public API.

@Papierkorb
Copy link
Contributor Author

Bump

I'd love to see this being merged, at least so that 0.24 will benefit. It can have security implications. This fix also fixes real-world code on my end.

@RX14 RX14 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:crypto topic:stdlib labels Nov 22, 2017
@RX14 RX14 merged commit f6a1fa1 into crystal-lang:master Nov 22, 2017
@RX14 RX14 added this to the Next milestone Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:crypto topic:stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants