Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathological input: Deeply nested lists #255

Closed
mity opened this issue Apr 12, 2018 · 11 comments
Closed

Pathological input: Deeply nested lists #255

mity opened this issue Apr 12, 2018 · 11 comments

Comments

@mity
Copy link
Contributor

mity commented Apr 12, 2018

Markdown input consisted of deeply nested lists exhibits heavily non-linear time (likely quadratic) in parsing by current cmark HEAD.

This C program can be used to generate such Markdown inputs:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char** argv)
{
    int i, j;
    int N = 1000;

    if(argc > 1)
        N = atoi(argv[1]);

    for(i = 0; i < N; i++) {
        for(j = 0; j < i; j++)
            printf("  ");
        printf("* foo\n");
    }

    return 0;
}

For various N, I get these times on my machine:

  • N=1000: 0m0.728s
  • N=2000: 0m5.548s
  • N=3000: 0m18.540s
  • N=4000: 0m43.780s
  • N=10000: Way too much (I interrupted the execution after about 3.5 minutes.)

Ordered lists are broken the same way as unordered.

(Interestingly and to my surprise, nested blockquotes are fine.)

@jgm
Copy link
Member

jgm commented Apr 12, 2018 via email

@mity
Copy link
Contributor Author

mity commented Apr 12, 2018

How does md4c do?

Well, better then cmark. For N=10000 it takes 0.350s.
But then it slows down too: For N=20000 it takes 1.299s.

But after some more thinking, there is quadratic behavior naturally: Because here size of the input rises quadratically with N. So it is possible that the parabola in cmark is just steeper then in md4c and the issue technically invalid.

But even then, it should likely be addressed; even at the cost of limiting maximal nesting depth or something. Limit of few hundreds should not really hurt any author.

@jgm
Copy link
Member

jgm commented Apr 12, 2018

cmark

N Time Length in MB Time / MB
1000 0.6 1 0.6
2000 4.7 4 1.17
3000 15.6 9 1.73
4000 36.6 16 2.28
5000 71.0 25 2.84

commonmark-hs

N Time Length in MB Time / MB
1000 3.5 1 3.5
2000 14.2 4 3.5
3000 33.4 9 3.7
4000 61.6 16 3.8

Looks like commonmark-hs performance is pretty close to linear with length of input. cmark does worse, for reasons it would be good to understand.

We could always limit nesting depth, it's true.

@jgm
Copy link
Member

jgm commented Apr 12, 2018

On the other hand, it's hard to get worried about DOS attacks that require someone to input several megabytes of text. One could always simply limit the size of the text that will be accepted, prior to parsing. The things to worry about are short snippets that blow up exponentially.

@mity
Copy link
Contributor Author

mity commented Apr 12, 2018

Yes and no.

The bench/benchinput.md is ~107MB and cmark parses it in few seconds.

The input in this report is still < 100MB for N=10.000. And it takes minutes (or tens of minutes? or more?) to process.

@mity
Copy link
Contributor Author

mity commented Apr 13, 2018

Played with gprof a little.

For N=3500, cmark spends 99+ % of time in S_find_first_nonspace() This function roughly corresponds to md4c's md_line_indentation().

Part of the report:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 99.18     77.46    77.46 12260499     0.01     0.01  S_find_first_nonspace
  0.24     77.65     0.19 12253499     0.00     0.00  S_last_child_is_open
  0.15     77.77     0.12     3500     0.03     0.03  S_can_contain
  0.12     77.86     0.09     3500     0.03    22.22  check_open_blocks

Md4c calls the function md_line_indentation() 7000 times, i.e. two times per input line.

Cmark calls the function S_find_first_nonspace() 12260499 times. That's roughly a bit above 3500^2 == 12,250,000. So I guess the accumulated cost of the function calls is responsible for the slowdown.

Also the function S_find_first_nonspace() could be likely optimized e.g. by avoiding incrementing via -> inside the loop. Use local temp variable for it. (But it would likely play a substantial role only if the above is solved and single call works on longer buffer spans.)

@jgm
Copy link
Member

jgm commented Apr 14, 2018 via email

jgm added a commit that referenced this issue Apr 15, 2018
We were needlessly redoing things we'd already done.
Now we skip the work if the first nonspace is greater
than the current offset.

This fixes pathological slowdown with deeply nested
lists (#255).  For N = 3000, the time goes from over
17s to about 0.7s.

Thanks to @mity for diagnosing the problem.
@jgm
Copy link
Member

jgm commented Apr 15, 2018

Think I got it. New timings:

N = Time
1000 0.08s
2000 0.36s
3000 0.80s
4000 1.47s
5000 2.47s

Note that my change doesn't affect the number of calls to S_find_first_nonspace. But it does make this function much smarter; previously, it was going back to the current offset and then finding the first nonspace from there, even if it had already identified a first nonspace ahead of the offset. That was the cause of the slowdown.

I tried using a local variable in the loop, but it didn't speed things up at all (in fact, it slowed it down a bit). Not sure why.

@mity
Copy link
Contributor Author

mity commented Apr 15, 2018

I tried using a local variable in the loop, but it didn't speed things up at all (in fact, it slowed it down a bit). Not sure why.

Because modern CPU is an artwork of black magic and it is hard to predict if some optimization helps or not, especially in more complicated cases. So it should always be tried and measured. (And also measured whether it is worth of: Optimizing function even by huge factor if it already takes just 0.0001 % of your run time does not make any sense.)

And consider your fix made it much more complicated in this context: Note the function now does not have single hot path. There are two semi-hot paths: For K-th line of our input, it is called about K-times. In first of those calls, it really executes the loop with about K iterations. But in the (K-1) calls it does not and it executes the other branch.

It is possible the two paths interact with each other so that optimizing one may harm the other. E.g. higher register pressure with more local variables (especially in 32-bit x86 build), longer code perhaps not fitting into a CPU cache line anymore, whatever. Especially if you consider the function is (in release build) most likely inlined into its caller(s) because it is static, relatively compact, and has just two callers.

And if you build for another architecture, or even if you run the same binary blob on other CPU model, you might see different results...

Anyway, this would be just a cherry on top of the cake. It is important the O(n^2) behavior has been fixed.

@mity
Copy link
Contributor Author

mity commented Apr 15, 2018

And maybe you might want to add something like this:
mity/md4c@81e2a5c

@jgm
Copy link
Member

jgm commented Apr 15, 2018 via email

kivikakk referenced this issue in github/cmark-gfm Jun 20, 2018
* Optimize S_find_first_nonspace.

We were needlessly redoing things we'd already done.
Now we skip the work if the first nonspace is greater
than the current offset.

This fixes pathological slowdown with deeply nested
lists (#255).  For N = 3000, the time goes from over
17s to about 0.7s.

Thanks to @mity for diagnosing the problem.

* pathological_tests.py: added test for deeply nested lists.

* pathological_tests.py: make tests run faster.

- commented out the (already ignored) "many references" test, which
  times out
- reduced the iterations for a couple other tests
@jgm jgm closed this as completed Mar 17, 2019
talum referenced this issue in github/cmark-gfm Sep 14, 2021
We were needlessly redoing things we'd already done.
Now we skip the work if the first nonspace is greater
than the current offset.

This fixes pathological slowdown with deeply nested
lists (#255).  For N = 3000, the time goes from over
17s to about 0.7s.

Thanks to @mity for diagnosing the problem.
talum referenced this issue in github/cmark-gfm Sep 14, 2021
* Optimize S_find_first_nonspace.

We were needlessly redoing things we'd already done.
Now we skip the work if the first nonspace is greater
than the current offset.

This fixes pathological slowdown with deeply nested
lists (#255).  For N = 3000, the time goes from over
17s to about 0.7s.

Thanks to @mity for diagnosing the problem.

* pathological_tests.py: added test for deeply nested lists.

* pathological_tests.py: make tests run faster.

- commented out the (already ignored) "many references" test, which
  times out
- reduced the iterations for a couple other tests
phillmv added a commit to eli-schwartz/cmark that referenced this issue Jan 31, 2023
man: Switch --safe option for --unsafe in man page
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants