Optimize `escape_ascii`. #125340

reitermarkus · 2024-05-20T18:47:17Z

Alternative/addition to #125317.

Based on #124307 (comment), it doesn't look like this function is the cause for the regression, but this change produces even fewer instructions (https://rust.godbolt.org/z/nebzqoveG).

rustbot · 2024-05-20T18:47:25Z

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

reitermarkus · 2024-05-20T19:01:34Z

r? @Kobzol

Kobzol · 2024-05-20T19:04:11Z

I'm probably not the best person to review this, but I can try. I have the same question as here though - do you have some (micro)benchmarks to show that this is an improvement? :)

reitermarkus · 2024-05-22T09:19:00Z

@Kobzol, what's the best way to do a benchmark for this? Just create a standalone crate with two versions of this function, or is there a recommended way to test against different commits in this repo?

Kobzol · 2024-05-22T11:05:40Z

Well, that depends. From the microbenchmark side, you could show e.g. on godbolt that this produces "objectively" better asssembly. From the macrobenchmark side, you would probably bring some program that is actually improved by this change.

Usually people have some explicit motivation for doing these kinds of optimizations, which is demonstrated by some change either in codegen or an improvement for some real-world code.

reitermarkus · 2024-05-23T20:24:19Z

e.g. on godbolt

I have updated the Godbolt link in the PR description to reflect the current changes, i.e. 3 fewer jumps and 7 fewer instructions.

I have also done a micro benchmark using criterion:

Source

#![feature(ascii_char)]
#![feature(ascii_char_variants)]
#![feature(let_chains)]
#![feature(inline_const)]
#![feature(const_option)]

use core::ascii;
use core::ops::Range;

use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion, PlotConfiguration};

const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap();

#[inline]
const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) {
    const { assert!(N >= 2) };
    let mut output = [ascii::Char::Null; N];
    output[0] = ascii::Char::ReverseSolidus;
    output[1] = a;
    (output, 0..2)
}

#[inline]
const fn escape_ascii_before<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
    const { assert!(N >= 4) };

    match byte {
        b'\t' => backslash(ascii::Char::SmallT),
        b'\r' => backslash(ascii::Char::SmallR),
        b'\n' => backslash(ascii::Char::SmallN),
        b'\\' => backslash(ascii::Char::ReverseSolidus),
        b'\'' => backslash(ascii::Char::Apostrophe),
        b'\"' => backslash(ascii::Char::QuotationMark),
        byte => {
            let mut output = [ascii::Char::Null; N];

            if let Some(c) = byte.as_ascii()
                && !byte.is_ascii_control()
            {
                output[0] = c;
                (output, 0..1)
            } else {
                let hi = HEX_DIGITS[(byte >> 4) as usize];
                let lo = HEX_DIGITS[(byte & 0xf) as usize];

                output[0] = ascii::Char::ReverseSolidus;
                output[1] = ascii::Char::SmallX;
                output[2] = hi;
                output[3] = lo;

                (output, 0..4)
            }
        }
    }
}

#[inline]
const fn escape_ascii_after<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) {
    const { assert!(N >= 4) };

    let mut output = [ascii::Char::Null; N];

    // NOTE: This `match` is roughly ordered by the frequency of ASCII
    //       characters for performance.
    match byte.as_ascii() {
        Some(
            c @ ascii::Char::QuotationMark
            | c @ ascii::Char::Apostrophe
            | c @ ascii::Char::ReverseSolidus,
        ) => backslash(c),
        Some(c) if !byte.is_ascii_control() => {
            output[0] = c;
            (output, 0..1)
        }
        Some(ascii::Char::LineFeed) => backslash(ascii::Char::SmallN),
        Some(ascii::Char::CarriageReturn) => backslash(ascii::Char::SmallR),
        Some(ascii::Char::CharacterTabulation) => backslash(ascii::Char::SmallT),
        _ => {
            let hi = HEX_DIGITS[(byte >> 4) as usize];
            let lo = HEX_DIGITS[(byte & 0xf) as usize];

            output[0] = ascii::Char::ReverseSolidus;
            output[1] = ascii::Char::SmallX;
            output[2] = hi;
            output[3] = lo;

            (output, 0..4)
        }
    }
}

pub fn criterion_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("escape_ascii");

    group.sample_size(1000);

    for i in [b'a', b'Z', b'\"', b'\t', b'\n', b'\xff'] {
        let i_s = if let Some(c) = i.as_ascii() {
            format!("{c:?}")
        } else {
            format!("'\\x{i:02x}'")
        };

        group.bench_with_input(BenchmarkId::new("before", &i_s), &i, |b, i| {
            b.iter(|| escape_ascii_before::<4>(*i));
        });
        group.bench_with_input(BenchmarkId::new("after", &i_s), &i, |b, i| {
            b.iter(|| escape_ascii_after::<4>(*i));
        });
    }

    group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Output

escape_ascii/before/'a' time:   [1.6945 ns 1.7047 ns 1.7170 ns]
Found 21 outliers among 1000 measurements (2.10%)
  8 (0.80%) low mild
  4 (0.40%) high mild
  9 (0.90%) high severe
escape_ascii/after/'a'  time:   [427.36 ps 428.23 ps 429.15 ps]
Found 23 outliers among 1000 measurements (2.30%)
  2 (0.20%) high mild
  21 (2.10%) high severe
escape_ascii/before/'Z' time:   [1.6944 ns 1.6971 ns 1.6996 ns]
escape_ascii/after/'Z'  time:   [430.95 ps 431.52 ps 432.06 ps]
Found 372 outliers among 1000 measurements (37.20%)
  230 (23.00%) low severe
  37 (3.70%) high mild
  105 (10.50%) high severe
escape_ascii/before/'"' time:   [1.3287 ns 1.3308 ns 1.3328 ns]
Found 1 outliers among 1000 measurements (0.10%)
  1 (0.10%) high mild
escape_ascii/after/'"'  time:   [429.44 ps 430.54 ps 431.73 ps]
Found 9 outliers among 1000 measurements (0.90%)
  2 (0.20%) high mild
  7 (0.70%) high severe
escape_ascii/before/'\t'
                        time:   [1.3326 ns 1.3369 ns 1.3413 ns]
Found 99 outliers among 1000 measurements (9.90%)
  80 (8.00%) high mild
  19 (1.90%) high severe
escape_ascii/after/'\t' time:   [1.3184 ns 1.3215 ns 1.3246 ns]
Found 308 outliers among 1000 measurements (30.80%)
  158 (15.80%) low mild
  10 (1.00%) high mild
  140 (14.00%) high severe
escape_ascii/before/'\n'
                        time:   [1.3336 ns 1.3377 ns 1.3419 ns]
escape_ascii/after/'\n' time:   [1.3033 ns 1.3057 ns 1.3080 ns]
Found 223 outliers among 1000 measurements (22.30%)
  210 (21.00%) low mild
  9 (0.90%) high mild
  4 (0.40%) high severe
escape_ascii/before/'\xff'
                        time:   [1.5074 ns 1.5116 ns 1.5168 ns]
Found 7 outliers among 1000 measurements (0.70%)
  3 (0.30%) high mild
  4 (0.40%) high severe
escape_ascii/after/'\xff'
                        time:   [444.86 ps 456.22 ps 469.96 ps]
Found 51 outliers among 1000 measurements (5.10%)
  8 (0.80%) high mild
  43 (4.30%) high severe

Graph (unfortunately Y-axis is not sorted by input):

Kobzol · 2024-05-24T13:07:38Z

Your benchmark was executed on a single byte input? It would be good to also see how it behaves on something larger, e.g. a short/medium size/long byte slice, to see the effects in practice.

Could you describe the motivation for this change? If I understand your comment correctly, "frequency of ASCII characters" means how often do given characters appear in the input. It makes sense to me to optimize for the common case, which I would expect is that the input does not need to be escaped at all. So my intuition would be to start with first checking if it's an alphabetic ASCII character, and then continue from there. So this optimization seems reasonable, in general. I just wonder if you have some use-case where this escaping is an actual bottleneck and we could actually see some wins in practice?

Btw, in general, the fact that there are less instructions doesn't necessarily mean that the code will be faster. In microarchitecture simulation (llvm mca), the original code seems to have better IPC (https://rust.godbolt.org/z/3qKeohGjs), athough in this case it's hard to decide upon that, because this function is very data dependent.

clarfonthey · 2024-05-24T22:24:53Z

Hmm.

Omitting the non-ASCII case, perhaps this could be done with a lookup table? You could squeeze it down to just 127 bytes if you use the eighth bit to determine if there should be a backslash, since the escaped character will only need 7 bits. This way, you don't need to worry about ordering things by prevalence. Have no idea what the current codegen looks like so I dunno if it'd be much faster, but that feels like the best route to me.

reitermarkus · 2024-06-01T20:33:55Z

I have made some further changes and updated the Godbolt link in the PR description. The instruction count is again slightly lower, and LLCM-MCA now also shows fewer instructions and better IPC and throughput.

I re-ran the previous benchmark with larger inputs (a 100MB file with random data, and a 100MB JSON file). The results show no difference between the two functions:

I also ran LLVM-MCA locally for Cortex M4, and it shows ~25% fewer instructions with ~35% higher throughput:

LLVM-MCA (Cortex M4) - before

cargo asm --features before --lib --target thumbv7em-none-eabihf --att --mca --mca-arg=-mcpu=cortex-m4

    Finished release [optimized] target(s) in 0.03s

Iterations:        100
Instructions:      6900
Total Cycles:      6901
Total uOps:        6900

Dispatch Width:    1
uOps Per Cycle:    1.00
IPC:               1.00
Block RThroughput: 69.0


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     1.00                        mvn	r2, #8
 1      1     1.00                        uxtab	r3, r2, r1
 1      1     1.00                        uxtb.w	r12, r1
 1      1     1.00                        cmp	r3, #30
 1      1     1.00                  U     bhi	.LBB0_4
 1      1     1.00                  U     tbb	[pc, r3]
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #29788
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        cmp.w	r12, #92
 1      1     1.00                  U     bne	.LBB0_6
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #23644
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        cmp.w	r12, #128
 1      1     1.00                        mov	r3, r12
 1      1     1.00                  U     it	hs
 1      1     1.00                        movhs	r3, #128
 1      1     1.00                        sxtb	r2, r1
 1      1     1.00                        cmp	r2, #0
 1      1     1.00                  U     bmi	.LBB0_9
 1      1     1.00                        cmp.w	r12, #32
 1      1     1.00                  U     blo	.LBB0_9
 1      1     1.00                        cmp.w	r12, #127
 1      1     1.00                  U     itttt	ne
 1      1     1.00                        movne	r1, #1
 1      1     1.00           *            strbne	r1, [r0, #5]
 1      1     1.00                        movne	r1, #0
 1      1     1.00           *            strne.w	r1, [r0, #1]
 1      1     1.00                  U     itt	ne
 1      1     1.00           *            strbne	r3, [r0]
 1      1     1.00                  U     bxne	lr
 1      1     1.00                        movw	r3, :lower16:.L__unnamed_1
 1      1     1.00                        mov.w	r2, #1024
 1      1     1.00                        and	r1, r1, #15
 1      1     1.00           *            strh	r2, [r0, #4]
 1      1     1.00                        movt	r3, :upper16:.L__unnamed_1
 1      1     1.00                        lsr.w	r2, r12, #4
 1      2     1.00    *                   ldrb	r2, [r3, r2]
 1      2     1.00    *                   ldrb	r1, [r3, r1]
 1      1     1.00                        movw	r3, #30812
 1      1     1.00           *            strh	r3, [r0]
 1      1     1.00           *            strb	r1, [r0, #3]
 1      1     1.00           *            strb	r2, [r0, #2]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #28252
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #29276
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #8796
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr
 1      1     1.00                        mov.w	r1, #512
 1      1     1.00           *            strh	r1, [r0, #4]
 1      1     1.00                        movw	r1, #10076
 1      1     1.00           *            str	r1, [r0]
 1      1     1.00                  U     bx	lr


Resources:
[0]   - M4Unit


Resource pressure per iteration:
[0]    
69.00  

Resource pressure by instruction:
[0]    Instructions:
1.00   mvn	r2, #8
1.00   uxtab	r3, r2, r1
1.00   uxtb.w	r12, r1
1.00   cmp	r3, #30
1.00   bhi	.LBB0_4
1.00   tbb	[pc, r3]
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #29788
1.00   str	r1, [r0]
1.00   bx	lr
1.00   cmp.w	r12, #92
1.00   bne	.LBB0_6
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #23644
1.00   str	r1, [r0]
1.00   bx	lr
1.00   cmp.w	r12, #128
1.00   mov	r3, r12
1.00   it	hs
1.00   movhs	r3, #128
1.00   sxtb	r2, r1
1.00   cmp	r2, #0
1.00   bmi	.LBB0_9
1.00   cmp.w	r12, #32
1.00   blo	.LBB0_9
1.00   cmp.w	r12, #127
1.00   itttt	ne
1.00   movne	r1, #1
1.00   strbne	r1, [r0, #5]
1.00   movne	r1, #0
1.00   strne.w	r1, [r0, #1]
1.00   itt	ne
1.00   strbne	r3, [r0]
1.00   bxne	lr
1.00   movw	r3, :lower16:.L__unnamed_1
1.00   mov.w	r2, #1024
1.00   and	r1, r1, #15
1.00   strh	r2, [r0, #4]
1.00   movt	r3, :upper16:.L__unnamed_1
1.00   lsr.w	r2, r12, #4
1.00   ldrb	r2, [r3, r2]
1.00   ldrb	r1, [r3, r1]
1.00   movw	r3, #30812
1.00   strh	r3, [r0]
1.00   strb	r1, [r0, #3]
1.00   strb	r2, [r0, #2]
1.00   bx	lr
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #28252
1.00   str	r1, [r0]
1.00   bx	lr
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #29276
1.00   str	r1, [r0]
1.00   bx	lr
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #8796
1.00   str	r1, [r0]
1.00   bx	lr
1.00   mov.w	r1, #512
1.00   strh	r1, [r0, #4]
1.00   movw	r1, #10076
1.00   str	r1, [r0]
1.00   bx	lr

LLVM-MCA (Cortex M4) - after

cargo asm --features after --lib --target thumbv7em-none-eabihf --att --mca --mca-arg=-mcpu=cortex-m4

    Finished release [optimized] target(s) in 0.02s

Iterations:        100
Instructions:      5100
Total Cycles:      5301
Total uOps:        5100

Dispatch Width:    1
uOps Per Cycle:    0.96
IPC:               0.96
Block RThroughput: 51.0


Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)

[1]    [2]    [3]    [4]    [5]    [6]    Instructions:
 1      1     1.00           *      U     push	{r4, r6, r7, lr}
 1      1     1.00                  U     add	r7, sp, #8
 1      1     1.00                        uxtb	r4, r1
 1      1     1.00                        movw	r12, :lower16:.L__unnamed_1
 1      1     1.00                        and	r3, r1, #15
 1      1     1.00                        movt	r12, :upper16:.L__unnamed_1
 1      1     1.00                        lsrs	r2, r4, #4
 1      1     1.00                        cmp	r4, #126
 1      2     1.00    *                   ldrb.w	lr, [r12, r3]
 1      2     1.00    *                   ldrb.w	r12, [r12, r2]
 1      1     1.00                  U     bhi	.LBB0_9
 1      1     1.00                        movs	r2, #92
 1      1     1.00                        movs	r3, #2
 1      1     1.00                        cmp	r4, #34
 1      1     1.00                  U     beq	.LBB0_4
 1      1     1.00                        cmp	r4, #39
 1      1     1.00                  U     beq	.LBB0_4
 1      1     1.00                        cmp	r4, #92
 1      1     1.00                  U     bne	.LBB0_5
 1      1     1.00                        mov	r4, r1
 1      1     1.00                        b	.LBB0_10
 1      1     1.00                        cmp	r4, #31
 1      1     1.00                  U     bls	.LBB0_7
 1      1     1.00                        movs	r4, #120
 1      1     1.00                        movs	r3, #1
 1      1     1.00                        mov	r2, r1
 1      1     1.00                        b	.LBB0_10
 1      1     1.00                        subs	r1, #9
 1      1     1.00                        uxtb	r2, r1
 1      1     1.00                        cmp	r2, #4
 1      1     1.00                  U     bhi	.LBB0_9
 1      1     1.00                        movw	r2, :lower16:.Lswitch.table.after.1
 1      1     1.00                        sxtb	r1, r1
 1      1     1.00                        movt	r2, :upper16:.Lswitch.table.after.1
 1      2     1.00    *                   ldrb	r4, [r2, r1]
 1      1     1.00                        movw	r2, :lower16:.Lswitch.table.after
 1      1     1.00                        movt	r2, :upper16:.Lswitch.table.after
 1      2     1.00    *                   ldrb	r3, [r2, r1]
 1      1     1.00                        movs	r2, #92
 1      1     1.00                        b	.LBB0_10
 1      1     1.00                        movs	r4, #120
 1      1     1.00                        movs	r2, #92
 1      1     1.00                        movs	r3, #4
 1      1     1.00                        movs	r1, #0
 1      1     1.00           *            strb	r3, [r0, #5]
 1      1     1.00           *            strb	r1, [r0, #4]
 1      1     1.00           *            strb.w	lr, [r0, #3]
 1      1     1.00           *            strb.w	r12, [r0, #2]
 1      1     1.00           *            strb	r4, [r0, #1]
 1      1     1.00           *            strb	r2, [r0]
 1      2     1.00    *             U     pop	{r4, r6, r7, pc}


Resources:
[0]   - M4Unit


Resource pressure per iteration:
[0]    
51.00  

Resource pressure by instruction:
[0]    Instructions:
1.00   push	{r4, r6, r7, lr}
1.00   add	r7, sp, #8
1.00   uxtb	r4, r1
1.00   movw	r12, :lower16:.L__unnamed_1
1.00   and	r3, r1, #15
1.00   movt	r12, :upper16:.L__unnamed_1
1.00   lsrs	r2, r4, #4
1.00   cmp	r4, #126
1.00   ldrb.w	lr, [r12, r3]
1.00   ldrb.w	r12, [r12, r2]
1.00   bhi	.LBB0_9
1.00   movs	r2, #92
1.00   movs	r3, #2
1.00   cmp	r4, #34
1.00   beq	.LBB0_4
1.00   cmp	r4, #39
1.00   beq	.LBB0_4
1.00   cmp	r4, #92
1.00   bne	.LBB0_5
1.00   mov	r4, r1
1.00   b	.LBB0_10
1.00   cmp	r4, #31
1.00   bls	.LBB0_7
1.00   movs	r4, #120
1.00   movs	r3, #1
1.00   mov	r2, r1
1.00   b	.LBB0_10
1.00   subs	r1, #9
1.00   uxtb	r2, r1
1.00   cmp	r2, #4
1.00   bhi	.LBB0_9
1.00   movw	r2, :lower16:.Lswitch.table.after.1
1.00   sxtb	r1, r1
1.00   movt	r2, :upper16:.Lswitch.table.after.1
1.00   ldrb	r4, [r2, r1]
1.00   movw	r2, :lower16:.Lswitch.table.after
1.00   movt	r2, :upper16:.Lswitch.table.after
1.00   ldrb	r3, [r2, r1]
1.00   movs	r2, #92
1.00   b	.LBB0_10
1.00   movs	r4, #120
1.00   movs	r2, #92
1.00   movs	r3, #4
1.00   movs	r1, #0
1.00   strb	r3, [r0, #5]
1.00   strb	r1, [r0, #4]
1.00   strb.w	lr, [r0, #3]
1.00   strb.w	r12, [r0, #2]
1.00   strb	r4, [r0, #1]
1.00   strb	r2, [r0]
1.00   pop	{r4, r6, r7, pc}

Kobzol · 2024-06-03T19:57:19Z

I suspect that in the grand scheme of things (escaping strings, rather than chars), this might not have such a large effect (btw https://lemire.me/blog/2024/05/31/quickly-checking-whether-a-string-needs-escaping/ might be interesting to you). The code looked a bit more readable before, but not strong opinion on my side.

r? libs

joboet · 2024-06-20T12:37:31Z

The current version really isn't particularly readable, so I don't think I can accept it.

However I found an even better version (at least according to llvm-mca) that is even more readable than the old one: https://rust.godbolt.org/z/8bfWP9aP8 (the top one)

Do you want to try that?

@rustbot author

Dylan-DPC · 2024-08-16T17:39:31Z

@reitermarkus any updates on this? thanks

JohnCSimon · 2024-10-06T05:44:27Z

@reitermarkus
Ping from triage:

I'm closing this due to inactivity because the PR hasn't been touched by the author in a few months.
If want to continue on this PR, please reopen before committing to the branch. Thank you.

@rustbot label: +S-inactive

Optimize `escape_ascii` using a lookup table Based upon my suggestion here: rust-lang#125340 (comment) Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion. The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700) But, if you want to try my benchmarking code for yourself: <details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary> ```rust #![feature(ascii_char)] #![feature(ascii_char_variants)] #![feature(const_option)] #![feature(let_chains)] use core::ascii; use core::ops::Range; use criterion::{criterion_group, criterion_main, Criterion}; use rand::{thread_rng, Rng}; const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap(); #[inline] const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 2) }; let mut output = [ascii::Char::Null; N]; output[0] = ascii::Char::ReverseSolidus; output[1] = a; (output, 0..2) } #[inline] const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 4) }; let mut output = [ascii::Char::Null; N]; let hi = HEX_DIGITS[(byte >> 4) as usize]; let lo = HEX_DIGITS[(byte & 0xf) as usize]; output[0] = ascii::Char::ReverseSolidus; output[1] = ascii::Char::SmallX; output[2] = hi; output[3] = lo; (output, 0..4) } #[inline] const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 1) }; let mut output = [ascii::Char::Null; N]; output[0] = a; (output, 0..1) } /// Escapes an ASCII character. /// /// Returns a buffer and the length of the escaped representation. const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 4) }; match byte { b'\t' => backslash(ascii::Char::SmallT), b'\r' => backslash(ascii::Char::SmallR), b'\n' => backslash(ascii::Char::SmallN), b'\\' => backslash(ascii::Char::ReverseSolidus), b'\'' => backslash(ascii::Char::Apostrophe), b'\"' => backslash(ascii::Char::QuotationMark), 0x00..=0x1F => hex_escape(byte), _ => match ascii::Char::from_u8(byte) { Some(a) => verbatim(a), None => hex_escape(byte), }, } } /// Escapes an ASCII character. /// /// Returns a buffer and the length of the escaped representation. const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { /// Lookup table helps us determine how to display character. /// /// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to /// indicate whether the result is escaped or unescaped. /// /// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since /// escaped NUL will not occur. const LOOKUP: [u8; 256] = { let mut arr = [0; 256]; let mut idx = 0; loop { arr[idx as usize] = match idx { // use 8th bit to indicate escaped b'\t' => 0x80 | b't', b'\r' => 0x80 | b'r', b'\n' => 0x80 | b'n', b'\\' => 0x80 | b'\\', b'\'' => 0x80 | b'\'', b'"' => 0x80 | b'"', // use NUL to indicate hex-escaped 0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0', _ => idx, }; if idx == 255 { break; } idx += 1; } arr }; let lookup = LOOKUP[byte as usize]; // 8th bit indicates escape let lookup_escaped = lookup & 0x80 != 0; // SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character. let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) }; if lookup_escaped { // NUL indicates hex-escaped if matches!(lookup_ascii, ascii::Char::Null) { hex_escape(byte) } else { backslash(lookup_ascii) } } else { verbatim(lookup_ascii) } } fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> { let mut vec = Vec::new(); for b in bytes { let (buf, range) = f(*b); vec.extend_from_slice(&buf[range.start as usize..range.end as usize]); } vec } pub fn criterion_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("escape_ascii"); group.sample_size(1000); let rand_200k = &mut [0; 200 * 1024]; thread_rng().fill(&mut rand_200k[..]); let cat = include_bytes!("/bin/cat"); let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml"); group.bench_function("old_rand", |b| { b.iter(|| escape_bytes(rand_200k, escape_ascii_old)); }); group.bench_function("new_rand", |b| { b.iter(|| escape_bytes(rand_200k, escape_ascii_new)); }); group.bench_function("old_bin", |b| { b.iter(|| escape_bytes(cat, escape_ascii_old)); }); group.bench_function("new_bin", |b| { b.iter(|| escape_bytes(cat, escape_ascii_new)); }); group.bench_function("old_cargo_toml", |b| { b.iter(|| escape_bytes(cargo_toml, escape_ascii_old)); }); group.bench_function("new_cargo_toml", |b| { b.iter(|| escape_bytes(cargo_toml, escape_ascii_new)); }); group.finish(); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches); ``` </details> My benchmark results: ``` escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms] Found 22 outliers among 1000 measurements (2.20%) 4 (0.40%) high mild 18 (1.80%) high severe escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms] Found 38 outliers among 1000 measurements (3.80%) 38 (3.80%) high mild escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs] Found 39 outliers among 1000 measurements (3.90%) 17 (1.70%) high mild 22 (2.20%) high severe escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs] Found 107 outliers among 1000 measurements (10.70%) 43 (4.30%) high mild 64 (6.40%) high severe escape_ascii/old_cargo_toml time: [23.397 µs 23.699 µs 24.014 µs] Found 204 outliers among 1000 measurements (20.40%) 21 (2.10%) high mild 183 (18.30%) high severe escape_ascii/new_cargo_toml time: [16.404 µs 16.438 µs 16.483 µs] Found 88 outliers among 1000 measurements (8.80%) 56 (5.60%) high mild 32 (3.20%) high severe ``` Random: 1.7006ms => 1.6953ms (<1% speedup) Binary: 225.40µs => 165.63µs (26% speedup) Text: 23.699µs => 16.438µs (30% speedup)

Optimize `escape_ascii` using a lookup table Based upon my suggestion here: rust-lang/rust#125340 (comment) Effectively, we can take advantage of the fact that ASCII only needs 7 bits to make the eighth bit store whether the value should be escaped or not. This adds a 256-byte lookup table, but 256 bytes *should* be small enough that very few people will mind, according to my probably not incontrovertible opinion. The generated assembly isn't clearly better (although has fewer branches), so, I decided to benchmark on three inputs: first on a random 200KiB, then on `/bin/cat`, then on `Cargo.toml` for this repo. In all cases, the generated code ran faster on my machine. (an old i7-8700) But, if you want to try my benchmarking code for yourself: <details><summary>Criterion code below. Replace <code>/home/ltdk/rustsrc</code> with the appropriate directory.</summary> ```rust #![feature(ascii_char)] #![feature(ascii_char_variants)] #![feature(const_option)] #![feature(let_chains)] use core::ascii; use core::ops::Range; use criterion::{criterion_group, criterion_main, Criterion}; use rand::{thread_rng, Rng}; const HEX_DIGITS: [ascii::Char; 16] = *b"0123456789abcdef".as_ascii().unwrap(); #[inline] const fn backslash<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 2) }; let mut output = [ascii::Char::Null; N]; output[0] = ascii::Char::ReverseSolidus; output[1] = a; (output, 0..2) } #[inline] const fn hex_escape<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 4) }; let mut output = [ascii::Char::Null; N]; let hi = HEX_DIGITS[(byte >> 4) as usize]; let lo = HEX_DIGITS[(byte & 0xf) as usize]; output[0] = ascii::Char::ReverseSolidus; output[1] = ascii::Char::SmallX; output[2] = hi; output[3] = lo; (output, 0..4) } #[inline] const fn verbatim<const N: usize>(a: ascii::Char) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 1) }; let mut output = [ascii::Char::Null; N]; output[0] = a; (output, 0..1) } /// Escapes an ASCII character. /// /// Returns a buffer and the length of the escaped representation. const fn escape_ascii_old<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { const { assert!(N >= 4) }; match byte { b'\t' => backslash(ascii::Char::SmallT), b'\r' => backslash(ascii::Char::SmallR), b'\n' => backslash(ascii::Char::SmallN), b'\\' => backslash(ascii::Char::ReverseSolidus), b'\'' => backslash(ascii::Char::Apostrophe), b'\"' => backslash(ascii::Char::QuotationMark), 0x00..=0x1F => hex_escape(byte), _ => match ascii::Char::from_u8(byte) { Some(a) => verbatim(a), None => hex_escape(byte), }, } } /// Escapes an ASCII character. /// /// Returns a buffer and the length of the escaped representation. const fn escape_ascii_new<const N: usize>(byte: u8) -> ([ascii::Char; N], Range<u8>) { /// Lookup table helps us determine how to display character. /// /// Since ASCII characters will always be 7 bits, we can exploit this to store the 8th bit to /// indicate whether the result is escaped or unescaped. /// /// We additionally use 0x80 (escaped NUL character) to indicate hex-escaped bytes, since /// escaped NUL will not occur. const LOOKUP: [u8; 256] = { let mut arr = [0; 256]; let mut idx = 0; loop { arr[idx as usize] = match idx { // use 8th bit to indicate escaped b'\t' => 0x80 | b't', b'\r' => 0x80 | b'r', b'\n' => 0x80 | b'n', b'\\' => 0x80 | b'\\', b'\'' => 0x80 | b'\'', b'"' => 0x80 | b'"', // use NUL to indicate hex-escaped 0x00..=0x1F | 0x7F..=0xFF => 0x80 | b'\0', _ => idx, }; if idx == 255 { break; } idx += 1; } arr }; let lookup = LOOKUP[byte as usize]; // 8th bit indicates escape let lookup_escaped = lookup & 0x80 != 0; // SAFETY: We explicitly mask out the eighth bit to get a 7-bit ASCII character. let lookup_ascii = unsafe { ascii::Char::from_u8_unchecked(lookup & 0x7F) }; if lookup_escaped { // NUL indicates hex-escaped if matches!(lookup_ascii, ascii::Char::Null) { hex_escape(byte) } else { backslash(lookup_ascii) } } else { verbatim(lookup_ascii) } } fn escape_bytes(bytes: &[u8], f: impl Fn(u8) -> ([ascii::Char; 4], Range<u8>)) -> Vec<ascii::Char> { let mut vec = Vec::new(); for b in bytes { let (buf, range) = f(*b); vec.extend_from_slice(&buf[range.start as usize..range.end as usize]); } vec } pub fn criterion_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("escape_ascii"); group.sample_size(1000); let rand_200k = &mut [0; 200 * 1024]; thread_rng().fill(&mut rand_200k[..]); let cat = include_bytes!("/bin/cat"); let cargo_toml = include_bytes!("/home/ltdk/rustsrc/Cargo.toml"); group.bench_function("old_rand", |b| { b.iter(|| escape_bytes(rand_200k, escape_ascii_old)); }); group.bench_function("new_rand", |b| { b.iter(|| escape_bytes(rand_200k, escape_ascii_new)); }); group.bench_function("old_bin", |b| { b.iter(|| escape_bytes(cat, escape_ascii_old)); }); group.bench_function("new_bin", |b| { b.iter(|| escape_bytes(cat, escape_ascii_new)); }); group.bench_function("old_cargo_toml", |b| { b.iter(|| escape_bytes(cargo_toml, escape_ascii_old)); }); group.bench_function("new_cargo_toml", |b| { b.iter(|| escape_bytes(cargo_toml, escape_ascii_new)); }); group.finish(); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches); ``` </details> My benchmark results: ``` escape_ascii/old_rand time: [1.6965 ms 1.7006 ms 1.7053 ms] Found 22 outliers among 1000 measurements (2.20%) 4 (0.40%) high mild 18 (1.80%) high severe escape_ascii/new_rand time: [1.6749 ms 1.6953 ms 1.7158 ms] Found 38 outliers among 1000 measurements (3.80%) 38 (3.80%) high mild escape_ascii/old_bin time: [224.59 µs 225.40 µs 226.33 µs] Found 39 outliers among 1000 measurements (3.90%) 17 (1.70%) high mild 22 (2.20%) high severe escape_ascii/new_bin time: [164.86 µs 165.63 µs 166.58 µs] Found 107 outliers among 1000 measurements (10.70%) 43 (4.30%) high mild 64 (6.40%) high severe escape_ascii/old_cargo_toml time: [23.397 µs 23.699 µs 24.014 µs] Found 204 outliers among 1000 measurements (20.40%) 21 (2.10%) high mild 183 (18.30%) high severe escape_ascii/new_cargo_toml time: [16.404 µs 16.438 µs 16.483 µs] Found 88 outliers among 1000 measurements (8.80%) 56 (5.60%) high mild 32 (3.20%) high severe ``` Random: 1.7006ms => 1.6953ms (<1% speedup) Binary: 225.40µs => 165.63µs (26% speedup) Text: 23.699µs => 16.438µs (30% speedup)

rustbot assigned Mark-Simulacrum May 20, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 20, 2024

rustbot assigned Kobzol and unassigned Mark-Simulacrum May 20, 2024

This comment has been minimized.

Sign in to view

Optimize escape_ascii.

8b94af3

reitermarkus force-pushed the optimize-escape-ascii branch from 6bfb89d to 8b94af3 Compare May 20, 2024 19:08

Order escape_ascii by character frequency.

c9c3f91

This comment has been minimized.

Sign in to view

Fix match ordering.

5606b02

clarfonthey mentioned this pull request May 28, 2024

Optimize escape_ascii using a lookup table #125679

Merged

Further optimization.

b161093

This comment has been minimized.

Sign in to view

Update escape.rs

798c652

rustbot assigned joboet and unassigned Kobzol Jun 3, 2024

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 20, 2024

JohnCSimon closed this Oct 6, 2024

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `escape_ascii`. #125340

Optimize `escape_ascii`. #125340

reitermarkus commented May 20, 2024 •

edited

Loading

rustbot commented May 20, 2024

reitermarkus commented May 20, 2024

Kobzol commented May 20, 2024

This comment has been minimized.

reitermarkus commented May 22, 2024

This comment has been minimized.

Kobzol commented May 22, 2024

reitermarkus commented May 23, 2024 •

edited

Loading

Kobzol commented May 24, 2024

clarfonthey commented May 24, 2024

This comment has been minimized.

reitermarkus commented Jun 1, 2024

Kobzol commented Jun 3, 2024

joboet commented Jun 20, 2024 •

edited

Loading

Dylan-DPC commented Aug 16, 2024

JohnCSimon commented Oct 6, 2024

Optimize escape_ascii. #125340

Optimize escape_ascii. #125340

Conversation

reitermarkus commented May 20, 2024 • edited Loading

rustbot commented May 20, 2024

reitermarkus commented May 20, 2024

Kobzol commented May 20, 2024

This comment has been minimized.

reitermarkus commented May 22, 2024

This comment has been minimized.

Kobzol commented May 22, 2024

reitermarkus commented May 23, 2024 • edited Loading

Kobzol commented May 24, 2024

clarfonthey commented May 24, 2024

This comment has been minimized.

reitermarkus commented Jun 1, 2024

Kobzol commented Jun 3, 2024

joboet commented Jun 20, 2024 • edited Loading

Dylan-DPC commented Aug 16, 2024

JohnCSimon commented Oct 6, 2024

Optimize `escape_ascii`. #125340

Optimize `escape_ascii`. #125340

reitermarkus commented May 20, 2024 •

edited

Loading

reitermarkus commented May 23, 2024 •

edited

Loading

joboet commented Jun 20, 2024 •

edited

Loading