Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize hardening #1504

Merged
merged 9 commits into from
Jul 2, 2019
Merged

Sanitize hardening #1504

merged 9 commits into from
Jul 2, 2019

Conversation

koczkatamas
Copy link
Contributor

Marked version: master
Markdown flavor: all

Description

  • Fixes a security vulnerability discussed in email

Expectation

HTML sanitized properly if sanitize: true is used

Result

There is a known bypass of sanitization

What was attempted

Fixed the known bypass, but I expect other bypasses, so Marked now warns its users not to depend in sanitize: true, but use other means of sanitization (e.g. DOMPurify).

-->

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

  • Draft GitHub release notes have been updated.
  • CI is green (no forced merge required).
  • Merge PR

@UziTech
Copy link
Member

UziTech commented Jun 28, 2019

I think we should have a list of sanitizers in our docs and recommend using one of them instead of just recommending DOMPurify. something like this list

@koczkatamas
Copy link
Contributor Author

I think we should have a list of sanitizers in our docs and recommend using one of them instead of just recommending DOMPurify. something like this list

Works for me!

I personally recommended DOMPurify because its creators (Cure53) are one of (if not the) experts of the fields with a proven track record of finding critical bugs in even most well-known libraries / services.

Sanitize-html also looks good, but I am not sure about the other two.

Should we create a docs .MD file about these options and link to that file from the code and from other documentation?

@UziTech
Copy link
Member

UziTech commented Jun 28, 2019

Should we create a docs .MD file about these options and link to that file from the code and from other documentation?

I would just add the list under the options and link to that section in USING_ADVANCED.md

@UziTech UziTech added the L0 - security A security vulnerability within the Marked library is discovered label Jun 28, 2019
README.md Outdated Show resolved Hide resolved
@@ -39,7 +39,7 @@ Also read about:

## Usage

### Warning: 🚨 Marked does not [sanitize](https://marked.js.org/#/USING_ADVANCED.md#options) the output HTML by default 🚨
### Warning: 🚨 Marked does not [sanitize](https://marked.js.org/#/USING_ADVANCED.md#options) the output HTML. Please use a sanitize library, like [DOMPurify](https://github.com/cure53/DOMPurify) (recommended), [sanitize-html](https://github.com/apostrophecms/sanitize-html) or [insane](https://github.com/bevacqua/insane) on the output HTML! 🚨
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have this list in one place and just link to it. I feel like it will be hard to keep these lists in sync if we ever have to update them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion where would you like to put that list exactly? It's your project afterall and I feel like this is the point where this change has nothing to do with the security fix.

Also there are already two README.mds (/README.md and /docs/README.md) with similar content, so you have to keep those in sync too...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koczkatamas: Thank you so much for this!

@styfle: Where would the best place be to store this list? Thinking it would be a nice list for users moving forward as we are planning to get out of the sanitizer business and working looking for things like this anyway. (I have a note to catch up on what I've missed with all the weirdness of life soon.)

@davisjam
Copy link
Contributor

davisjam commented Jun 29, 2019

OK, finally had a chance to review this.

If I understand correctly, I see 2-3 problems being discussed.

  1. Marked's default sanitizer is problematic insofar as it will not protect against all security-themed inputs.
  2. The sanitizer is not always applied properly.
  3. (my own addition) Marked does not clearly document when the sanitizer is applied and what the default behavior of the sanitizer is.

I agree that deprecating the sanitization feature seems like a good path forward.

  • Marked's job is converting markdown to HTML. The sanitization of this HTML (whether security-related or not) is an orthogonal problem and should be handled by the application.
  • I think having the references to HTML sanitizers in the docs is good advice.

@@ -434,7 +434,7 @@ Lexer.prototype.token = function(src, top) {
: 'html',
pre: !this.options.sanitizer
&& (cap[1] === 'pre' || cap[1] === 'script' || cap[1] === 'style'),
text: cap[0]
text: this.options.sanitize ? (this.options.sanitizer ? this.options.sanitizer(cap[0]) : escape(cap[0])) : cap[0]
Copy link
Contributor

@davisjam davisjam Jun 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a semantic change for anyone using this feature, right? Same comment on the similar line below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for example if their code allowed XSS then now it won't. 😉

There are no test cases on how the sanitizer should work, so basically any previous change could change this feature also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, if anyone is currently invoking marked with the sanitize option, the output may change if we land this PR.

  1. Yes, they will be (less) XSS-able
  2. But since we didn't really document what "sanitize" did, they might have enjoyed its previous behavior (e.g. assuming that by sanitize we meant removing unclosed HTML tags or similar).

As a result I'm not sure how to semver this.

@joshbruce
Copy link
Member

joshbruce commented Jun 30, 2019

Looking for @davisjam to approve this one to the point of merge as I don't have the security knowledge to feel right in doing it. With that said, given the plan to get out of the sanitizer business altogether by 1.0 - I'm pretty okay here.

If the community can hit 100% CommonMark compliance and no known security issues, 1.0 should be released and we can begin the work of refactoring and re-engineering toward the single responsibility of parsing markdown and letting the inputs and outputs be handled by extension or somewhere else altogether.

EDIT: That might seem left-field. If this fixes the known security issue, great. If we are at 100% CommonMark compliance, great. We can release 1.0 and remove all the sanitizer logic along with all related corrections and workarounds, including this one.

@UziTech
Copy link
Member

UziTech commented Jun 30, 2019

Here is our current gfm and commonmark compliance chart:

--------------------------------------
|                 GFM                |
|                                    |
| Tables               8 of  8  100% |
| Task list items      2 of  2  100% |
| Strikethrough        2 of  2  100% |
| Autolinks           11 of 11  100% |
| Disallowed Raw HTML  0 of  1    0% |
--------------------------------------

------------------------------------------------------------
|                        CommonMark                        |
|                                                          |
| Tabs                                     10 of  11   91% |
| Precedence                                1 of   1  100% |
| Thematic breaks                          19 of  19  100% |
| ATX headings                             15 of  18   83% |
| Setext headings                          22 of  27   81% |
| Indented code blocks                     12 of  12  100% |
| Fenced code blocks                       28 of  29   97% |
| HTML blocks                              43 of  43  100% |
| Link reference definitions               25 of  28   89% |
| Paragraphs                                8 of   8  100% |
| Blank lines                               1 of   1  100% |
| Block quotes                             22 of  25   88% |
| List items                               35 of  48   73% |
| Lists                                    13 of  26   50% |
| Inlines                                   1 of   1  100% |
| Backslash escapes                        11 of  13   85% |
| Entity and numeric character references  13 of  17   76% |
| Code spans                               20 of  22   91% |
| Emphasis and strong emphasis             84 of 131   64% |
| Links                                    68 of  87   78% |
| Images                                   15 of  22   68% |
| Autolinks                                15 of  19   79% |
| Raw HTML                                 19 of  21   90% |
| Hard line breaks                         15 of  15  100% |
| Soft line breaks                          2 of   2  100% |
| Textual content                           3 of   3  100% |
------------------------------------------------------------

We have a long way to go on some of the sections

@davisjam davisjam self-requested a review July 2, 2019 03:53
Copy link
Contributor

@davisjam davisjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to land this, but as I've noted in my comments, we should be careful with the semver on the release.

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree this should be considered a breaking change

@UziTech UziTech merged commit 2a1bbed into markedjs:master Jul 2, 2019
@UziTech UziTech mentioned this pull request Jul 5, 2019
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L0 - security A security vulnerability within the Marked library is discovered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants