Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memmem::find performance regression since 2.6.0 #161

Open
ambyjkl opened this issue Sep 11, 2024 · 3 comments
Open

memmem::find performance regression since 2.6.0 #161

ambyjkl opened this issue Sep 11, 2024 · 3 comments

Comments

@ambyjkl
Copy link

ambyjkl commented Sep 11, 2024

I noticed when updating memchr and that memmem::find was considerably slower, and digging deeper found that memmem::Finder takes considerably longer to build in versions 2.6.0 and above. The good thing is this revealed some opportunities for caching and reusing existing Finder instances, but there are still cases where I have a fresh one-time needle to search in the haystack and hence I cannot cache. Comparing the commits between 2.5.0 and 2.6.0 revealed that the major suspects as being e49a1b8 00c6372 93662e7, with the first and the last ones being the biggest. Here are the benchmark results of simulating real-world usage in my application

Summary
  /tmp/m-2.5.0 ran
    1.19 ± 0.02 times faster than /tmp/m-54c893176860c15cb41be03412ab0bc30f20df25
    1.20 ± 0.02 times faster than /tmp/m-e49a1b89ea8c7617de2e3447c356f7813d0a88e9
    1.20 ± 0.02 times faster than /tmp/m-7af67ce25998d7e91c349f39eb3e38efcc5d5e8f
    1.21 ± 0.01 times faster than /tmp/m-00c6372daa25224a3a8698f0d0a3b222bb8b0795
    1.28 ± 0.01 times faster than /tmp/m-93662e796001b795283ec6badb59a17b4b5492cc

and here is a microbenchmark just running memmem::find in a loop to get the net perf regression

fn main() {
    for _ in 0..1000000 {
        memchr::memmem::find(b"123456789012374890172390417230941723401723490293084570293457890234589023478", b"testadfjgadfaopisdfjoipasdf");
    }
}
Summary
  target/release/2.5.0 ran
    3.31 ± 0.20 times faster than target/release/2.6.0

Which comes out to ~3.3x. Here are the flamegraphs

2.6.0:

2.6.0

2.5.0:

2.5.0

@BurntSushi
Copy link
Owner

93662e7 in particular was basically a re-organization of the entire crate. So unfortunately, we don't really have one specific and small change we can point to that is the cause of this regression.

In general, this crate generally prioritizes search speed over construction speed. Of course, there has to be a balance here. I'll need to do an investigation to figure out if there's anything we can improve. But this could come down to something like, "previously, we weren't building an SSE2 fallback for AVX2 searchers, and now we are, and that causes construction times to increase." I don't know that that is the issue, but I think it represents the possible flavor. In which case, it's probably a regression we have to eat. Maybe we can "make up" for it somewhere else, I'm not sure.

@ambyjkl
Copy link
Author

ambyjkl commented Oct 8, 2024

I see, that makes sense, and i wasn't aware the sse2 fallback is new. So if you're right, the trick might be to skip sse2 if avx2 is supported and this check fails, unless that's how it already is:

        if haystack.len() < self.avx2.min_haystack_len() {
            self.sse2.find(haystack, needle)
        } else {
            self.avx2.find(haystack, needle)
        }

and also have a heuristic for skipping everying and going with rabin-karp for small needles and haystacks

@BurntSushi
Copy link
Owner

and i wasn't aware the sse2 fallback is new

I don't think it is. I can't remember. It was just an example of what kinds of issues might appear here. :-)

So if you're right, the trick might be to skip sse2 if avx2 is supported and this check fails, unless that's how it already is:

That check is done at search time, past the point where SSE2 construction is done. In general, construction can't use facts about the haystack to change course. The higher level search APIs like memchr::memmem::find do, but only because that API is explicitly for one-shot usage.

and also have a heuristic for skipping everying and going with rabin-karp for small needles and haystacks

You can't for memmem::Finder because construction doesn't know anything about haystack length. Otherwise, memchr::memmem::find already does this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants