-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar optimization: eliminate redundant grammar trees (~4x faster grammar sampling) #6616
Grammar optimization: eliminate redundant grammar trees (~4x faster grammar sampling) #6616
Conversation
3bbceb5
to
80553e5
Compare
Was getting cross-platform build errors on the new benchmark reporting that I put into Regardless, taking out the changes to the integration tests makes this a much cleaner change, and hopefully easier to review. Also rebased on top of latest master to take advantage of speed improvements in #6609, and the speed improvements from this PR are still there. Benching with 10 iterations showed a roughly 4x-5x speedup: `gbnf-optimize-ambiguity2` ran 5.41 ± 0.23 times faster than `master`
Edit: Was accidentally comparing against old master, so was mistakenly adding the improvements from #6009 to my numbers, instead of correctly separating them. After I updated to use latest head, improvements dropped from 9x to ~5x. Still good, but not that good. :) Overall I'm feeling optimistic about this PR, but would love your review whenever you have time, @ochafik |
This is amazing (getting 8.5x speedup on my M2 Max), and such a small change, great catch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me!!
tl;dr
~4x-5x speedup on processing complex grammars.
Previously discussed in #4218 (comment)
Motivation:
The grammar stacks have a tendency to explode exponentially in the case of redundant ambiguities (see #4218 (comment) ). In these cases, the grammar engine winds up duplicating effort by evaluating the feasibility of multiple redundant copies of grammars, even though they are equivalent.
Fix:
After each token, the grammar engine builds stacks representing the possible directions the parser could go. When building these stacks, this change takes care to not add new grammars to the stack if they are already there. Despite
std::find()
being an O(N) operation, the savings gained from not adding trees of redundant grammars vast outweigh this minor check.Results:
Integration Benchmark
I expanded the grammar integration tests to add some crude timing metrics.
Before:
After:
Note the significant improvement in the chained ambiguity case, which is what this PR was targeting most directly. Most of the other speed differences seem to be hovering around in the noise floor, and don't seem to be consistently faster or slower one way or the other.
"Full" benchmark
Thanks to @ochafik for teaching me how to use Hyperfine, here are the results of the benchmark cribbed from #6609 for a ~9x speedup vs. master:
Hyperfine Results
It's interesting to me that even though this test's grammar seeks to remove redundant ambiguities, this PR still offers enough improvement to be quite significant.