-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathological input: Deeply nested lists #255
Comments
Interesting. I confirm this on my machine.
1000 0.6s
2000 4.7s
3000 15.6s
4000 36.6s
5000 71.0s
Also tried with the new Haskell parser I'm working on,
and got this roughly similar curve:
1000 3.5s
2000 14.2s
3000 33.4s
4000 61.6s
Need to figure out why. How does md4c do?
|
Well, better then cmark. For N=10000 it takes 0.350s. But after some more thinking, there is quadratic behavior naturally: Because here size of the input rises quadratically with N. So it is possible that the parabola in cmark is just steeper then in md4c and the issue technically invalid. But even then, it should likely be addressed; even at the cost of limiting maximal nesting depth or something. Limit of few hundreds should not really hurt any author. |
cmark
commonmark-hs
Looks like commonmark-hs performance is pretty close to linear with length of input. cmark does worse, for reasons it would be good to understand. We could always limit nesting depth, it's true. |
On the other hand, it's hard to get worried about DOS attacks that require someone to input several megabytes of text. One could always simply limit the size of the text that will be accepted, prior to parsing. The things to worry about are short snippets that blow up exponentially. |
Yes and no. The The input in this report is still < 100MB for N=10.000. And it takes minutes (or tens of minutes? or more?) to process. |
Played with For N=3500, cmark spends 99+ % of time in Part of the report:
Md4c calls the function Cmark calls the function Also the function |
Thanks! I figured it must have to do with indentation, since
the problem doesn't arise for block quotes.
I'll look into it.
Martin Mitáš <[email protected]> writes:
… Played with `gprof` a little.
For N=3500, cmark spends 99+ % of time in `S_find_first_nonspace()` This function roughly corresponds to md4c's `md_line_indentation()`.
Part of the report:
```
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
99.18 77.46 77.46 12260499 0.01 0.01 S_find_first_nonspace
0.24 77.65 0.19 12253499 0.00 0.00 S_last_child_is_open
0.15 77.77 0.12 3500 0.03 0.03 S_can_contain
0.12 77.86 0.09 3500 0.03 22.22 check_open_blocks
```
Md4c calls the function `md_line_indentation()` 7000 times, i.e. two times per input line.
Cmark calls the function `S_find_first_nonspace()` 12260499 times. That's roughly a bit above 3500^2 == 12,250,000. So I guess the accumulated cost of the function calls is responsible for the slowdown.
Also the function `S_find_first_nonspace()` could be likely optimized e.g. by avoiding incrementing via `->` inside the loop. Use local temp variable for it. (But it would likely play a substantial role only if you the above is solved and single call works on longer buffer spans.)
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#255 (comment)
|
Think I got it. New timings:
Note that my change doesn't affect the number of calls to I tried using a local variable in the loop, but it didn't speed things up at all (in fact, it slowed it down a bit). Not sure why. |
Because modern CPU is an artwork of black magic and it is hard to predict if some optimization helps or not, especially in more complicated cases. So it should always be tried and measured. (And also measured whether it is worth of: Optimizing function even by huge factor if it already takes just 0.0001 % of your run time does not make any sense.) And consider your fix made it much more complicated in this context: Note the function now does not have single hot path. There are two semi-hot paths: For K-th line of our input, it is called about K-times. In first of those calls, it really executes the loop with about K iterations. But in the (K-1) calls it does not and it executes the other branch. It is possible the two paths interact with each other so that optimizing one may harm the other. E.g. higher register pressure with more local variables (especially in 32-bit x86 build), longer code perhaps not fitting into a CPU cache line anymore, whatever. Especially if you consider the function is (in release build) most likely inlined into its caller(s) because it is static, relatively compact, and has just two callers. And if you build for another architecture, or even if you run the same binary blob on other CPU model, you might see different results... Anyway, this would be just a cherry on top of the cake. It is important the O(n^2) behavior has been fixed. |
And maybe you might want to add something like this: |
Thanks, I'll add that.
… And maybe you might want to add something like this:
mity/md4c@81e2a5c
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#255 (comment)
|
* Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem. * pathological_tests.py: added test for deeply nested lists. * pathological_tests.py: make tests run faster. - commented out the (already ignored) "many references" test, which times out - reduced the iterations for a couple other tests
* Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem. * pathological_tests.py: added test for deeply nested lists. * pathological_tests.py: make tests run faster. - commented out the (already ignored) "many references" test, which times out - reduced the iterations for a couple other tests
man: Switch --safe option for --unsafe in man page
Markdown input consisted of deeply nested lists exhibits heavily non-linear time (likely quadratic) in parsing by current cmark HEAD.
This C program can be used to generate such Markdown inputs:
For various N, I get these times on my machine:
Ordered lists are broken the same way as unordered.
(Interestingly and to my surprise, nested blockquotes are fine.)
The text was updated successfully, but these errors were encountered: