-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refac: improve lexing inline elements step's performance #3146
refac: improve lexing inline elements step's performance #3146
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a small change, but if it passes the tests this looks like an awesome improvement, even if it is mainly visible only on larger documents.
I think we are generally open to any performance boosts even if they require some retooling. As long as we remain compliant with the specs and extensions remain functional, we can always make a major version bump if signatures are significantly changed.
The only issue I can see is that |
As far as I could tell, nothing reads the array after the end of the while loop. I don't see Lexer using it anywhere else, or it being accessed elsewhere, and the lexer just returns the array of tokens at the end. If there's cases I'm missing, it could be reset to an empty array after the for loop runs, or reversed and then popped instead of shifted. |
I don't think it will break anything in marked if it isn't reset but if someone uses the lexer with I think we can just reset it to an empty array after the for loop |
Oh, that makes sense. Thank you for explaining the case so clearly. I added reseting it to an empty array. |
## [11.1.1](v11.1.0...v11.1.1) (2023-12-31) ### Bug Fixes * improve lexing inline elements step's performance ([#3146](#3146)) ([4f87b2a](4f87b2a))
Marked version: 11.1.0
Markdown flavor: n/a
Description
Improves performance of the lexing step, specifically lexing inline elements. Prompted by this issue #2355
What was attempted
Create a benchmark file
bench-100.md
- this file copy pasted x 100 https://raw.githubusercontent.com/thephpleague/commonmark/2.2/tests/benchmark/sample.md (I saw that file linked in the issue and it looked good to me)master:
npm run build
node --cpu-prof bin/marked.js -i bench-100.md -o bench-100.html
Apply my changes:
npm run build
node --cpu-prof bin/marked.js -i bench-100.md -o bench-100.html
Load the two profiles in https://www.speedscope.app/
before:
The output file is identical, all the tests still pass, and as far as I can tell from reading the code, there shouldn't be any differences in behavior.
The reason this change speeds it up is because changing from
while loop + shift
tofor loop
is thatwhile + shift
is an n^2 operation; shift shuffles every remaining item in the array down by one, an n time operation, which is done once per item in the array (n * n). Running through the array with a for loop is just n time.This optimization makes no difference in the automated benchmark. I haven’t read all the benchmark test cases, but I think they’re mostly tiny single item parses, which don’t accumulate a big queue to burn though. The benchmark file has lots of inline elements, so the optimization shows up strongly there.
I think a markdown file that had a good, statistically accurate representation of real world use patterns for all the different markdown features would be super helpful for benchmarking, although it seemed like overkill for this PR.
I’m not sure, and I don’t know if I have the time to do them if they do exist, but I think there might be other architecture level opportunities to improve performance, as opposed to micro level optimizations on things like the regexes. Is there interest in changes like that, even if they meant significant changes to the existing classes?
Contributor
Committer
In most cases, this should be a different person than the contributor.