-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid collecting iterator during token resolving #112
Conversation
35f9919
to
fae303f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice improvements.
Only have two remarks about dependencies and unnecessary clone.
I skimmed over the resolver changes, because these will most likely become unnecessary with the new open-token approach I will introduce to the spec.
In short: once a open token is encountered, the element is valid either until an end token is reached, or end of input/element range is reached.
This should make inline parsing significantly simpler, but also improve UX, because one directly sees the impact after a correct open token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good now.
should we merge it now, or wait on PR #111 ?
Good question, I'm not really sure. Will there be conflicts? If not, then it does not matter. |
I think there might be some minor conflicts in the scanner module. I will merge this PR |
What was the problem?
In order to correctly parse inline formats, there is an intermediate step between inlines lexer and parser. This step resolves the tokens, so that they get correctly marked as opening/closing/plain tokens. This step is isolated, because it is very complex and it's worth it to have it pulled out so that we do not introduce a lot of complexity in lexer and/or parser.
Since
TokenResolver
needs to decide for each token whether it is an opening, closing or plain token, it needs to see what tokens come after it. In particular, this is the case for potentially opening tokens. To make this easy, we collected the whole iterator coming from lexer into aVec
. On large inputs, this caused a very large allocation, which is very slow.The fix
The fix was to use iterator directly, without collecting it into a
Vec
. Since we still need to look ahead, a data structure is used to make it possible to look forwards some dynamic amount of tokens. We can now extend the look-ahead until we can resolve the token, and then stop allocating until necessary again.Fixes #107
Update:
After performing multiple benchmarks, it turned out that change described above did not really improve the performance. It was more-or-less performing the same as the previous implementation. It did remove the allocation of vector for all Tokens when resolving inline tokens.
But, to improve performance following additional changes are made:
Symbol::flatten
. This function should be used internally, and we should uphold this invariant ourselves. The function may panic if inputs are not the same (this is now documented).fxhash
crate'sFxHashMap
andFxHashSet
inSubstitutor
, faster hashing improves performance.Substitutor
to prevent multiple allocations of same maps/sets.Quick Benchmark
Tested with file:
Comparison between
unimarkup-main
(on main branch 6ff4562) andunimarkup-optimized
: