-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ReDoS with autolink #73
Conversation
Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.
This looks great; thanks for contributing it! I think the parsing change is fine. Yeah, I think anything superlinear is probably a problem. If you have examples I'd be happy to look into and fix them, or happy to accept your fixes! (I haven't done a lot of measuring times per document size, but I expect if we had everything linear most of those would be fine too.) |
Actually, I might have missed something: what's still quadratic about the fixed version here? The whole parse, or the individual autolink match? I think the whole parse in general will be quadratic as it may re-run any regex for each character in the worst case (like your example), but the matchers themselves should ideally all be linear. |
I'm referring to it being quadratic (formerly cubic) in the entire parse. All other superlinears I'm aware of mean quadratic documents as well, per my definition. I apologize for the confusion, I should've been clearer. I wrote this analysis of the quadratic documents before you submitted your second post, so I'll post it even though it's not necessarily interesting. <<<<<<<<<<<<<<<<<<<< - autolink tries to start at every one, fails because no :/ (and no >). Could be fixed by rejecting < in autolinks, but unless the other quadratic documents are fixed, fixing this doesn't really solve anything. There are most likely other ways to cause quadratic (or worse) document parsing; I didn't check all parsing rules. Some of them exceed my brain capacity. |
Thanks & sorry for the late reply! That was a thoughtful and helpful explanation! I think you're right in that right now the quadratic document cases would be pretty challenging to address without a major re-thinking of how this works. I believe most parsing algorithms are quadratic in the worst case, so while we could probably do better for markdown specifically, that might be generally okay.
I've generally tried to check these visually as well, but as in this example, I don't always see them. If you run into any more, I'd love to hear about them so we can fix them :). Thanks again for your comments and PR here and for making this project better for everyone! |
* Fix ariabuckles#68 escaping pipes in tables * v0.5.0 * Fix license copyright * v0.5.1: Fix .git folder in published archive * Tests: Add exponential backtracking test for inline code Addresses ariabuckles#71 Inline code has an exponential backtracking regex. This commit makes tests for those so that we can verify we fix them. Test plan: 1. Run `make test` * Verify the three exponential backtracking avoidance tests fail * inlineCode: Fix ReDoS & improve escape semantics Our inline code regex had overlapping parts in the case of spaces; spaces could be parsed as part of the `\s*`s or as part of the `[\S\s]*`, which leads to catastrophic backtracking in the case of a string with many spaces. The `\s*` parts were attempting to allow for escaping of backticks within the start/end of inline code blocks, so this commit removes the `\s*` parts of the regex, and then after parsing the code block, checks for the case where a single space is being used to escape a backtick, and removes that space. Test plan: 1. Added a test for the semantics change. 2. Added a test for the exponential backtracking to match the test case in issue ariabuckles#71 * Heading: Add test for exponential backtracking in headings * Heading: Fix exponential backtracking Test plan: 1. `make test` * Verify all tests now pass * Del: Add test for del/strikethrough exponential backtracking * Del: Fix del/strikethrough exponential backtracking * Fence: Add test for code fence exponential backtracking * Fence: fix fence exponential backtracks * v0.5.2: Fix exponential backtracking / ReDoS regexes * inlineCode: fix overzealous replacement $1 only refers to the parens in the first branch of the `` /^ ( *` *) $|^ ( *`)|(` *) $/g `` disjunction. If another branch matches, the replacement will be blank. * Add a globalPrevCapture that persists into nested parses This adds an additional `globalPrevCapture` that is "global" state for the parses and persist into nested invocations of `nestedParse`. This allows rules to determine if they are truly at the beginning of a line. The existing `prevCapture` resets at each nesting, so it only allows you to determine if you're at the beginning of the nested parse, but not necessarily the line. * Move `prevCapture` to the `state` and make it global for a given parse. * Remove stray comma * Bump mixin-deep from 1.3.1 to 1.3.2 Bumps [mixin-deep](https://github.com/jonschlinkert/mixin-deep) from 1.3.1 to 1.3.2. - [Release notes](https://github.com/jonschlinkert/mixin-deep/releases) - [Commits](jonschlinkert/mixin-deep@1.3.1...1.3.2) Signed-off-by: dependabot[bot] <[email protected]> * Make prevCapture comparisons `== null` for consistency Minor change; most of the other comparisons work this way, and this makes prevCapture harder to break * v0.6.0 Fix ariabuckles#72: backticks in code blocks bug Add ariabuckles#66: state.prevCapture that persists between parse calls ariabuckles#70: Bump devDependency versions for security * Minify for v0.6.0 * Fix ReDoS with autolink Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms. * Dev Dependencies: Run `npm audit fix` to fix vulnerabilities * Create index.d.ts typings file * Export everything ES6 import usage is: ```javascript import * as SimpleMarkdown from 'simple-markdown'; ``` so rather than export a default object, we should be exporting everything. * v0.6.1: ariabuckles#73 Fix ReDoS with autolink * Flow: Clean up some flow types to match typescript Just a small fixup of some flow types so that when adding the typescript types they're more of a straight port. Done before the typescript changes so this change is easier to verify on its own. Test plan: `make check` * Error handling: Clean up error handling This cleanup happens in two parts: 1. We remove some logic to disable certain error handling, which, as far as I can tell, was never actually used :/. Doing this makes both typescript and flow happier (and it's harder to work around in typescript than it was in flow). * In order to avoid introducing a potential perf regression for anyone who could have been using `state.disableErrorGuards`, we also remove the more expensive string comparison check, and instead only check for missing `^` if the result is a raw regex capture result (i.e., if it has a numeric index, we warn if that index is non-zero). 2. We add an error check if the Array joiner rule isn't defined for a non-html/react output type. * This will make it easier to convince typescript that the function we call isn't undefined, and is a better api/documentation as well. Test plan: As... noted in the comment, these parts aren't really tested in the tests, but I did run the tests to verify that the normal path API didn't seem to break. * React Keys: Fix sending null as a key See facebook/react#5519 * Typescript: Move index.d.ts to simple-markdown.d.ts * Typescript: Configure typescript checking for build process Install typescript and configure it to check simple-markdown.js & simple-markdown.d.ts during `make check` Test plan: `make check` * Typescript: Combine ts and flow types into up-to-date v0.6 ts types Updates type type definition file to be equivalent to the flow typings. Test plan: I didn't test this externally; will ask someone using typescript to check whether this is working for their usecase. The contents of the file are tested later with `make check` though. * Typescript: Add types to simple-markdown.js source Add typescript types to our source, so that we can use typescript in addition to flow to check our source, and verify that our types are correct. Test plan: tested in later commit that enables tsc in `make check` * Typescript: Add type checking of tests Adds types to simple-markdown-test.js and checks that file with typescript as well \o/ Test plan: `make test` * v0.7.0: Typescript types * Allow one level of balanced parens in link urls * Setup rollup * Build system: Integrate rollup with the rest of the system Adds some integration for rollup/the rollup changes with some other parts of the build system: * `make` targets * fixes typescript config * uses es6 module exports to switch to fully using rollup's umd * integrates rollup with npm prepublish so I can't forget to build * Remove unnecessary extra LIST_R.exec call As far as I can tell, this wasn't being used, and was creating an unused variable. Test plan: `make build test` * Flow: Upgrade flow to 0.111.1 and fix flow errors Test plan: `npm ci && make test` * DevDependencies: version bumps * Flow: Build files after flow changes * Tests: Remove test dependency on underscore Test plan: `make test` * flow-typed: update flow types Test plan: `make test` * v0.7.1 * Typescript dependencies: Remove @types/node dependency Now that we're bundling with rollup, we shouldn't need the @types/node dependency, which can make installation more challenging. Fixes ariabuckles#80 * DevDependencies: Update nyc/coverage for `npm audit` * v0.7.2 Co-authored-by: Aria Buckles <[email protected]> Co-authored-by: Alcaro <[email protected]> Co-authored-by: Danny Weinberg <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Danny Cochran <[email protected]> Co-authored-by: Sergey Slipchenko <[email protected]>
Patterns like <<<<<<<<<<:/:/:/:/:/:/:/:/:/:/ currently exhibit O(n^3) complexity, allowing a 5KB document to take 7174ms to parse. With this change, it drops to O(n^2) and 73ms.
Tested with
This PR will change parsing of weird things like
<foo:bar:/baz>
. If you'd prefer leaving behavior 100% unchanged, the regex can instead be changed to/^<(?=[^ >]+:\/)([^ >]+)>/
, with no meaningful performance difference.What is the project policy for what counts as a DoS? Quadratic complexity is fine, anything higher is a bug? More than 1ms per byte on a 10KB document is a bug? Anything superlinear is a bug? If the latter, I can name a few other inputs with quadratic runtime.