x86_64 SSE2 fast-path for str.contains(&str) and short needles #103779

Based on Wojciech Muła's "SIMD-friendly algorithms for substring searching"[0] The two-way algorithm is Big-O efficient but it needs to preprocess the needle to find a "criticla factorization" of it. This additional work is significant for short needles. Additionally it mostly advances needle.len() bytes at a time. The SIMD-based approach used here on the other hand can advance based on its vector width, which can exceed the needle length. Except for pathological cases, but due to being limited to small needles the worst case blowup is also small. benchmarks taken on a Zen2: ``` 16CGU, OLD: test str::bench_contains_short_short ... bench: 27 ns/iter (+/- 1) test str::bench_contains_short_long ... bench: 667 ns/iter (+/- 29) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 4) 16CGU, NEW: test str::bench_contains_short_short ... bench: 8 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 135 ns/iter (+/- 4) test str::bench_contains_bad_naive ... bench: 130 ns/iter (+/- 2) test str::bench_contains_bad_simd ... bench: 292 ns/iter (+/- 1) test str::bench_contains_equal ... bench: 3 ns/iter (+/- 0) 1CGU, OLD: test str::bench_contains_short_short ... bench: 30 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 713 ns/iter (+/- 17) test str::bench_contains_bad_naive ... bench: 131 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 130 ns/iter (+/- 3) test str::bench_contains_equal ... bench: 148 ns/iter (+/- 6) 1CGU, NEW: test str::bench_contains_short_short ... bench: 10 ns/iter (+/- 0) test str::bench_contains_short_long ... bench: 111 ns/iter (+/- 0) test str::bench_contains_bad_naive ... bench: 135 ns/iter (+/- 3) test str::bench_contains_bad_simd ... bench: 274 ns/iter (+/- 2) test str::bench_contains_equal ... bench: 4 ns/iter (+/- 0) ``` [0] http://0x80.pl/articles/simd-strfind.html#sse-avx2

The Big-O is cubic, but this is only called with ~70 chars so it's still fast enough

- bump simd compare to 32bytes - import small slice compare code from memmem crate - try a few different probe bytes to avoid degenerate cases - but special-case 2-byte needles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86_64 SSE2 fast-path for str.contains(&str) and short needles #103779

x86_64 SSE2 fast-path for str.contains(&str) and short needles #103779

Commits on Nov 14, 2022

Commits on Nov 15, 2022