chars/bytes confusion in the error emitter #44080
Labels
A-diagnostics
Area: Messages for errors, warnings, and lints
C-bug
Category: This is a bug.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
src/librustc_errors/snippet.rs
has big comment saying that the column info is provided in characters, not in bytes. However, the error emitter doesn't care about that at all and uses these like byte offsets all over the place. This leads to bugs like #44023 and #44078 .As an example, look how span printing varies with varying characters used:
Correct case:
Now add an emoji character:
Note how its off by one char now. This can stack up:
If I didn't use any spaces at all, I'd run into #44078.
Now this can be fixed by going through the emitter code and looking for all places where the pos is used in a byte position fashion. A much more proper fix instead is to stop trusting that people read comments and encode this via the type system. There is already a mechanism for that inside the compiler, its
libsyntax_pos::CharPos
! Just convert the types ofstart_col
,end_col
members of theMultilineAnnotation
andAnnotation
structs toCharPos
, or maybe toBytePos
if that's preferred.The text was updated successfully, but these errors were encountered: