-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations #19
Optimizations #19
Conversation
Renderer#escape was each time passing a new hash instance to gsub, making it allocate more memory than neccessary. Moving the hash to a constant improves performance a bit. For `Markd.to_html("hello world")` before 350.46k ( 2.85µs) (± 6.31%) 4.09kB/op after 431.05k ( 2.32µs) (± 1.43%) 3.07kB/op
For `Markd.to_html("hello world")` before 431.05k ( 2.32µs) (± 1.43%) 3.07kB/op after 506.43k ( 1.97µs) (± 2.50%) 2.72kB/op
For `Markd.to_html("hello world")` before 506.43k ( 1.97µs) (± 2.50%) 2.72kB/op after 554.18k ( 1.80µs) (± 1.10%) 2.41kB/op
For `Markd.to_html("hello world")`: before 554.18k ( 1.80µs) (± 1.10%) 2.41kB/op after 640.45k ( 1.56µs) (± 6.70%) 2.06kB/op
This avoids initializing a Hash when not needed. For `Markd.to_html("hello world")`: before 640.45k ( 1.56µs) (± 6.70%) 2.06kB/op after 727.39k ( 1.37µs) (± 2.28%) 1.59kB/op
Array literals in tight loops allocate a lot of memory! Parsing markd's README.md: before 3.52k (283.87µs) (± 0.59%) 259kB/op after 3.66k (273.38µs) (± 0.54%) 244kB/op
For `Markd.to_html("hello world")`: before 726.36k ( 1.38µs) (± 1.86%) 1.59kB/op after 800.81k ( 1.25µs) (± 0.58%) 1.58kB/op
Markdown is an ASCII format: all control characters are ASCII so we can avoid decoding unicode characters when not needed. The old code was indexing bytes by unicode index position which is very slow because the string must be decoded each time. Here we change it to index by byte, and change all operations to do byte slices instead of unicode slices. For markd's README.md: before 3.86k (258.86µs) (± 0.52%) 242kB/op after 4.11k (243.25µs) (± 1.37%) 242kB/op
For `Markd.to_html("hello world")`: before 800.81k ( 1.25µs) (± 0.58%) 1.58kB/op after 829.33k ( 1.21µs) (±25.35%) 1.56kB/op
We can use an inline comparison or a tuple. For markd's README.md: before 4.57k (218.68µs) (± 0.64%) 198kB/op after 4.63k (215.84µs) (± 2.20%) 196kB/op
`values.each` will create an intermediary array
We can use `each_line` instead of `lines` and then iterating. For `Markd.to_html("hello world")`: before 882.84k ( 1.13µs) (± 1.46%) 1.56kB/op after 929.11k ( 1.08µs) (± 2.75%) 1.46kB/op For markd's README.md: before 4.85k (206.37µs) (± 2.24%) 187kB/op after 4.98k (200.82µs) (± 0.47%) 183kB/op
`String::Builder` is slightly faster when used for a one-time shot.
Because all escape chars are ASCII it's fine if we go byte per byte. For `Markd.to_html("hello world")`: before 942.28k ( 1.06µs) (± 5.45%) 1.46kB/op after 959.11k ( 1.04µs) (± 0.72%) 1.46kB/op For markd's README.md: before 4.97k (201.41µs) (± 0.74%) 176kB/op after 5.05k (197.95µs) (± 1.51%) 176kB/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you had some fun =)
src/markd/renderers/html_renderer.cr
Outdated
if entering | ||
cr | ||
tag(tag_name, attrs(node)) | ||
# toc(node) if @options.toc | ||
else | ||
tag("/#{tag_name}") | ||
tag(end_tag_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be even simpler to let tag
add the slash when calling it with tag(tag_name, end_tag: true)
similar to self_closing
arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done, thanks!
For markd's README.md: before 5.14k (194.65µs) (± 1.25%) 174kB/op after 5.23k (191.14µs) (± 0.55%) 173kB/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome optimization that i learn from the PR, this MUST be merged ASAP.
And as my previous words, i will contribute Markd to the standard library, let's do it now 😆
Another fact: I tried benchmarking rendering markd's README.md with markd and crystal-cmark, which is just a wrapper around a C library, and these are the results:
So markd is pretty good, considering it wasn't written with optimizations in mind (I think there's still room for improvement). Just 2.42 times slower is actually pretty good. The memory consumption there isn't real because the benchmark just shows memory Crystal knows about, not memory consumed by C, so we can't compare that. |
This PR optimizes Markd code. Each commit has one optimization. The most controversial one is about not indexing strings with indices like
string[index]
but instead usingstring.byte_at(index)
. Because Markdown is an ASCII format (well, there's unicode but the control characters like*
,_
,[
, etc. are all ASCII) we can go byte per byte. But all optimizations here are good.I used two benchmarks:
The first one is the code above. The second one is using the uncommented line, where "README.md" is this repo's README.md (I actually wanted to use the "sample markdown file" mentioned in the README but it seems it doesn't exist anymore).
In the code above
require "markdown"
is just the std's (incomplete, buggy) markdown.Results
Hello world
Before this PR
After this PR
README.md
Before this PR
After this PR
Conclusion
It seems time and memory went down by 3 times.
I tried to optimize more but it's harder and harder without me trying to understand more the code. But I think this is good enough: 3~4 times slower than std but without bugs and feature complete is really good! And I'm tempted to ask you to maybe move markd to the standard library so we have a really good doc generate and a good Markdown library in the standard library, but we'll have to discuss that in the Crystal repo.