Pathological input: Deeply nested lists #255

mity · 2018-04-12T17:28:36Z

Markdown input consisted of deeply nested lists exhibits heavily non-linear time (likely quadratic) in parsing by current cmark HEAD.

This C program can be used to generate such Markdown inputs:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char** argv)
{
    int i, j;
    int N = 1000;

    if(argc > 1)
        N = atoi(argv[1]);

    for(i = 0; i < N; i++) {
        for(j = 0; j < i; j++)
            printf("  ");
        printf("* foo\n");
    }

    return 0;
}

For various N, I get these times on my machine:

N=1000: 0m0.728s
N=2000: 0m5.548s
N=3000: 0m18.540s
N=4000: 0m43.780s
N=10000: Way too much (I interrupted the execution after about 3.5 minutes.)

Ordered lists are broken the same way as unordered.

(Interestingly and to my surprise, nested blockquotes are fine.)

jgm · 2018-04-12T19:39:04Z

Interesting. I confirm this on my machine. 1000 0.6s 2000 4.7s 3000 15.6s 4000 36.6s 5000 71.0s Also tried with the new Haskell parser I'm working on, and got this roughly similar curve: 1000 3.5s 2000 14.2s 3000 33.4s 4000 61.6s Need to figure out why. How does md4c do?

mity · 2018-04-12T20:08:14Z

How does md4c do?

Well, better then cmark. For N=10000 it takes 0.350s.
But then it slows down too: For N=20000 it takes 1.299s.

But after some more thinking, there is quadratic behavior naturally: Because here size of the input rises quadratically with N. So it is possible that the parabola in cmark is just steeper then in md4c and the issue technically invalid.

But even then, it should likely be addressed; even at the cost of limiting maximal nesting depth or something. Limit of few hundreds should not really hurt any author.

jgm · 2018-04-12T20:49:55Z

cmark

N	Time	Length in MB	Time / MB
1000	0.6	1	0.6
2000	4.7	4	1.17
3000	15.6	9	1.73
4000	36.6	16	2.28
5000	71.0	25	2.84

commonmark-hs

N	Time	Length in MB	Time / MB
1000	3.5	1	3.5
2000	14.2	4	3.5
3000	33.4	9	3.7
4000	61.6	16	3.8

Looks like commonmark-hs performance is pretty close to linear with length of input. cmark does worse, for reasons it would be good to understand.

We could always limit nesting depth, it's true.

jgm · 2018-04-12T20:52:35Z

On the other hand, it's hard to get worried about DOS attacks that require someone to input several megabytes of text. One could always simply limit the size of the text that will be accepted, prior to parsing. The things to worry about are short snippets that blow up exponentially.

mity · 2018-04-12T21:03:59Z

Yes and no.

The bench/benchinput.md is ~107MB and cmark parses it in few seconds.

The input in this report is still < 100MB for N=10.000. And it takes minutes (or tens of minutes? or more?) to process.

mity · 2018-04-13T06:09:34Z

Played with gprof a little.

For N=3500, cmark spends 99+ % of time in S_find_first_nonspace() This function roughly corresponds to md4c's md_line_indentation().

Part of the report:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 99.18     77.46    77.46 12260499     0.01     0.01  S_find_first_nonspace
  0.24     77.65     0.19 12253499     0.00     0.00  S_last_child_is_open
  0.15     77.77     0.12     3500     0.03     0.03  S_can_contain
  0.12     77.86     0.09     3500     0.03    22.22  check_open_blocks

Md4c calls the function md_line_indentation() 7000 times, i.e. two times per input line.

Cmark calls the function S_find_first_nonspace() 12260499 times. That's roughly a bit above 3500^2 == 12,250,000. So I guess the accumulated cost of the function calls is responsible for the slowdown.

Also the function S_find_first_nonspace() could be likely optimized e.g. by avoiding incrementing via -> inside the loop. Use local temp variable for it. (But it would likely play a substantial role only if the above is solved and single call works on longer buffer spans.)

jgm · 2018-04-14T18:34:51Z

Thanks! I figured it must have to do with indentation, since the problem doesn't arise for block quotes. I'll look into it. Martin Mitáš <[email protected]> writes:

…

Played with `gprof` a little. For N=3500, cmark spends 99+ % of time in `S_find_first_nonspace()` This function roughly corresponds to md4c's `md_line_indentation()`. Part of the report: ``` Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 99.18 77.46 77.46 12260499 0.01 0.01 S_find_first_nonspace 0.24 77.65 0.19 12253499 0.00 0.00 S_last_child_is_open 0.15 77.77 0.12 3500 0.03 0.03 S_can_contain 0.12 77.86 0.09 3500 0.03 22.22 check_open_blocks ``` Md4c calls the function `md_line_indentation()` 7000 times, i.e. two times per input line. Cmark calls the function `S_find_first_nonspace()` 12260499 times. That's roughly a bit above 3500^2 == 12,250,000. So I guess the accumulated cost of the function calls is responsible for the slowdown. Also the function `S_find_first_nonspace()` could be likely optimized e.g. by avoiding incrementing via `->` inside the loop. Use local temp variable for it. (But it would likely play a substantial role only if you the above is solved and single call works on longer buffer spans.) -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #255 (comment)

@mity

We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem.

jgm · 2018-04-15T04:55:53Z

Think I got it. New timings:

N =	Time
1000	0.08s
2000	0.36s
3000	0.80s
4000	1.47s
5000	2.47s

Note that my change doesn't affect the number of calls to S_find_first_nonspace. But it does make this function much smarter; previously, it was going back to the current offset and then finding the first nonspace from there, even if it had already identified a first nonspace ahead of the offset. That was the cause of the slowdown.

I tried using a local variable in the loop, but it didn't speed things up at all (in fact, it slowed it down a bit). Not sure why.

mity · 2018-04-15T07:55:28Z

I tried using a local variable in the loop, but it didn't speed things up at all (in fact, it slowed it down a bit). Not sure why.

Because modern CPU is an artwork of black magic and it is hard to predict if some optimization helps or not, especially in more complicated cases. So it should always be tried and measured. (And also measured whether it is worth of: Optimizing function even by huge factor if it already takes just 0.0001 % of your run time does not make any sense.)

And consider your fix made it much more complicated in this context: Note the function now does not have single hot path. There are two semi-hot paths: For K-th line of our input, it is called about K-times. In first of those calls, it really executes the loop with about K iterations. But in the (K-1) calls it does not and it executes the other branch.

It is possible the two paths interact with each other so that optimizing one may harm the other. E.g. higher register pressure with more local variables (especially in 32-bit x86 build), longer code perhaps not fitting into a CPU cache line anymore, whatever. Especially if you consider the function is (in release build) most likely inlined into its caller(s) because it is static, relatively compact, and has just two callers.

And if you build for another architecture, or even if you run the same binary blob on other CPU model, you might see different results...

Anyway, this would be just a cherry on top of the cake. It is important the O(n^2) behavior has been fixed.

mity · 2018-04-15T08:14:07Z

And maybe you might want to add something like this:
mity/md4c@81e2a5c

jgm · 2018-04-15T15:38:00Z

Thanks, I'll add that.

…

And maybe you might want to add something like this: mity/md4c@81e2a5c -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #255 (comment)

@mity

* Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem. * pathological_tests.py: added test for deeply nested lists. * pathological_tests.py: make tests run faster. - commented out the (already ignored) "many references" test, which times out - reduced the iterations for a couple other tests

@mity

We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem.

@mity

* Optimize S_find_first_nonspace. We were needlessly redoing things we'd already done. Now we skip the work if the first nonspace is greater than the current offset. This fixes pathological slowdown with deeply nested lists (#255). For N = 3000, the time goes from over 17s to about 0.7s. Thanks to @mity for diagnosing the problem. * pathological_tests.py: added test for deeply nested lists. * pathological_tests.py: make tests run faster. - commented out the (already ignored) "many references" test, which times out - reduced the iterations for a couple other tests

man: Switch --safe option for --unsafe in man page

philipturnbull mentioned this issue Apr 16, 2018

Handle deeply nested lists github/cmark-gfm#95

Merged

jgm closed this as completed Mar 17, 2019

marcastel mentioned this issue Oct 24, 2020

Making escaped characters first class citizens... #366

Closed

phillmv added a commit to eli-schwartz/cmark that referenced this issue Jan 31, 2023

Merge pull request commonmark#255 from keith-packard/man-unsafe-flag

d401ba5

man: Switch --safe option for --unsafe in man page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pathological input: Deeply nested lists #255

Pathological input: Deeply nested lists #255

mity commented Apr 12, 2018

jgm commented Apr 12, 2018 via email

mity commented Apr 12, 2018 •

edited

Loading

jgm commented Apr 12, 2018 •

edited

Loading

jgm commented Apr 12, 2018 •

edited

Loading

mity commented Apr 12, 2018

mity commented Apr 13, 2018 •

edited

Loading

jgm commented Apr 14, 2018 via email

jgm commented Apr 15, 2018 •

edited

Loading

mity commented Apr 15, 2018

mity commented Apr 15, 2018

jgm commented Apr 15, 2018 via email

Pathological input: Deeply nested lists #255

Pathological input: Deeply nested lists #255

Comments

mity commented Apr 12, 2018

jgm commented Apr 12, 2018 via email

mity commented Apr 12, 2018 • edited Loading

jgm commented Apr 12, 2018 • edited Loading

jgm commented Apr 12, 2018 • edited Loading

mity commented Apr 12, 2018

mity commented Apr 13, 2018 • edited Loading

jgm commented Apr 14, 2018 via email

jgm commented Apr 15, 2018 • edited Loading

mity commented Apr 15, 2018

mity commented Apr 15, 2018

jgm commented Apr 15, 2018 via email

mity commented Apr 12, 2018 •

edited

Loading

jgm commented Apr 12, 2018 •

edited

Loading

jgm commented Apr 12, 2018 •

edited

Loading

mity commented Apr 13, 2018 •

edited

Loading

jgm commented Apr 15, 2018 •

edited

Loading