chars/bytes confusion in the error emitter #44080

est31 · 2017-08-25T10:17:11Z

src/librustc_errors/snippet.rs has big comment saying that the column info is provided in characters, not in bytes. However, the error emitter doesn't care about that at all and uses these like byte offsets all over the place. This leads to bugs like #44023 and #44078 .

As an example, look how span printing varies with varying characters used:

Correct case:

12 |       "B   "";
   |  ___________^

Now add an emoji character:

12 |       "😊   "";
   |  ___________^

Note how its off by one char now. This can stack up:

12 |       "😊😊😊😊   "";
   |  ______________^

If I didn't use any spaces at all, I'd run into #44078.

Now this can be fixed by going through the emitter code and looking for all places where the pos is used in a byte position fashion. A much more proper fix instead is to stop trusting that people read comments and encode this via the type system. There is already a mechanism for that inside the compiler, its libsyntax_pos::CharPos! Just convert the types of start_col, end_col members of the MultilineAnnotation and Annotation structs to CharPos, or maybe to BytePos if that's preferred.

The text was updated successfully, but these errors were encountered:

euclio · 2017-08-25T14:45:57Z

cc #8706

Fixes rust-lang#44078. Fixes rust-lang#44023. The start_col member is given in chars, while the code previously assumed it was given in bytes. The more basic issue rust-lang#44080 doesn't get fixed.

Fix a byte/char confusion issue in the error emitter Fixes rust-lang#44078. Fixes rust-lang#44023. The start_col member is given in chars, while the code previously assumed it was given in bytes. The more basic issue rust-lang#44080 doesn't get fixed.

Fix a byte/char confusion issue in the error emitter Fixes #44078. Fixes #44023. The start_col member is given in chars, while the code previously assumed it was given in bytes. The more basic issue #44080 doesn't get fixed.

est31 mentioned this issue Aug 25, 2017

Compiler panic when string literal contains emoji followed by a quote #44078

Closed

est31 mentioned this issue Aug 25, 2017

Fix a byte/char confusion issue in the error emitter #44081

Merged

shepmaster added A-diagnostics Area: Messages for errors, warnings, and lints C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 25, 2017

zackmdavis mentioned this issue Oct 11, 2017

Error span is in incorrect place due to Unicode fullwidth characters #45211

Closed

estebank closed this as completed Feb 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chars/bytes confusion in the error emitter #44080

chars/bytes confusion in the error emitter #44080

est31 commented Aug 25, 2017

euclio commented Aug 25, 2017

chars/bytes confusion in the error emitter #44080

chars/bytes confusion in the error emitter #44080

Comments

est31 commented Aug 25, 2017

euclio commented Aug 25, 2017