Make String#[] not read out-of-bounds if string ends in unicode char #5257

Papierkorb · 2017-11-07T22:21:36Z

Hello,

While working on fancyline, I noticed that String#[] didn't do proper bound checking,
but relied on Char::Reader for this instead. Char::Reader didn't do bound-checking
either (At all, actually), creating an out-of-bounds memory read.

Sample code

"ß"[1] # => Char(0), expected an IndexError
"ß"[1]? # => Char(0), expected `nil`

# And this does an unchecked memory read:
Char::Reader.new("", pos: 100).current_char
# ^ now always gives `Char(0)`.  Changing it to just raise breaks a ton
# of assumptions made in String.

Both showcased behaviours are fixed by this PR for now.

Before this change, Char::Reader#decode_char_at() wouldn't detect that an UTF-8 sequence was cut off prematurely. Instead, it continued on reading, leading to an out-of-bounds read. This fix should be regarded as band-aid for Char::Reader.

As Char::Reader signals the end of a string by returning a NUL-char, we need to check in String#[] if the read could go out of bounds. This only triggers if the first byte of the character would be OOB already, which is fine. If a later byte in the character sequence is missing, Char::Reader would notice it.

straight-shoota · 2017-11-08T19:59:19Z

Is this behaviour really unexpected? A string is delimited by a zero character: "ß" describes a string containing the characters 'ß' and '\0'. So, the character with index 1 is obviously Char::ZERO.

If you use an index that is actually out of bounds (like "b"[2]) this will raise an IndexError (or return nil). There is a out-of-bounds check, you're just off by one.

To me, everything seems to be just as it is supposed to and I don't think this needs a change.

Papierkorb · 2017-11-08T20:01:10Z

The trailing NUL-byte is an implementation detail.
container[container.size] is out of bounds for any container type, why should String violate this?
"a"[1] is OOB, as expected

This is a bug with unicode handling.

straight-shoota · 2017-11-08T20:09:52Z

From your description it was not clear that multibyte characters behave differently than ASCII characters. This should obviously be fixed.

Btw. Char::Reader yields \0 and this is part of a public API.

Papierkorb · 2017-11-22T11:58:04Z

Bump

I'd love to see this being merged, at least so that 0.24 will benefit. It can have security implications. This fix also fixes real-world code on my end.

Papierkorb added 2 commits November 7, 2017 23:15

RX14 approved these changes Nov 8, 2017

View reviewed changes

ysbaddaden approved these changes Nov 22, 2017

View reviewed changes

RX14 added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:crypto topic:stdlib labels Nov 22, 2017

RX14 merged commit f6a1fa1 into crystal-lang:master Nov 22, 2017

RX14 added this to the Next milestone Nov 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make String#[] not read out-of-bounds if string ends in unicode char #5257

Make String#[] not read out-of-bounds if string ends in unicode char #5257

Papierkorb commented Nov 7, 2017

straight-shoota commented Nov 8, 2017 •

edited

Loading

Papierkorb commented Nov 8, 2017 •

edited

Loading

straight-shoota commented Nov 8, 2017

Papierkorb commented Nov 22, 2017

Make String#[] not read out-of-bounds if string ends in unicode char #5257

Make String#[] not read out-of-bounds if string ends in unicode char #5257

Conversation

Papierkorb commented Nov 7, 2017

Sample code

straight-shoota commented Nov 8, 2017 • edited Loading

Papierkorb commented Nov 8, 2017 • edited Loading

straight-shoota commented Nov 8, 2017

Papierkorb commented Nov 22, 2017

straight-shoota commented Nov 8, 2017 •

edited

Loading

Papierkorb commented Nov 8, 2017 •

edited

Loading