Issue with multibyte chars in source_text() computation #410

bram209 · 2023-10-08T10:21:50Z

It treats lo as a byte index, while it is actually a character index:

Line 367 in 7f5533d

let trunc_lo = &self.source_text[lo..];

I expect this test to pass, but it does not:

#[cfg(span_locations)]
#[test]
fn source_text() {
    let input = "    𓀕 c    ";
    let mut tokens = input
        .parse::<proc_macro2::TokenStream>()
        .unwrap()
        .into_iter();

    let ident1 = tokens.next().unwrap();
    assert_eq!("𓀕", ident1.span().source_text().unwrap());

    let ident2 = tokens.next().unwrap();
    assert_eq!("𓀕", ident2.span().source_text().unwrap());
}

Panics with (as character 𓀕 occupies byte 5 and 6)

---- source_text stdout ----
thread 'source_text' panicked at 'byte index 6 is not a char boundary; it is inside '𓀕' (bytes 4..8) of `    𓀕 c   `', src/fallback.rs:367:25

The text was updated successfully, but these errors were encountered:

dtolnay · 2023-10-09T01:06:11Z

I have published a fix in proc-macro2 1.0.69.

This was referenced Oct 8, 2023

Leptosfmt is removing or replacing some characters bram209/leptosfmt#84

Closed

leptosfmt generates incorrect HTML bram209/leptosfmt#62

Closed

fix: workaround bug with proc_macro2 regarding multibyte chars bram209/leptosfmt#85

Merged

dtolnay mentioned this issue Oct 9, 2023

Fix source_text treating span.lo as byte offset not char index #411

Merged

dtolnay closed this as completed in #411 Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with multibyte chars in source_text() computation #410

Issue with multibyte chars in source_text() computation #410

bram209 commented Oct 8, 2023 •

edited

Loading

dtolnay commented Oct 9, 2023

Issue with multibyte chars in source_text() computation #410

Issue with multibyte chars in source_text() computation #410

Comments

bram209 commented Oct 8, 2023 • edited Loading

dtolnay commented Oct 9, 2023

bram209 commented Oct 8, 2023 •

edited

Loading