[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

seaerchin · 2023-04-19T10:28:20Z

Problem

Previously, there was inconsistent behaviour caused by sanitization on the backend. This is because of dompurify's sanitization config, where it will automatically html encode certain special characters if it detects that there is a html tag.

This process affects not just content within the tag, but the string as a whole. For example,

const x = "& <b>&something</b>"

will have both ampersands encoded even though the first one is outside of the b-tag.

Closes LS-81

Solution

In order to make sure that sanitization takes place properly, this PR establishes an invariant, as follows:
frontmatter content is never html encoded.

This is chosen over html encoding all content in our frontmatter due to the existence of the 3 special pages (homepage/nav/contact-us), where users are able to input html (:sadge:)

In order to preserve this property, a few rules have to be followed (done alr at present)

devs never call sanitize directly for markdown files but through a given interface (<convert|retrieve>DataFromMarkdown).
the respective conversion/retrieval functions separately sanitize frontmatter/body
we run unescape after sanitizing frontmatter.

This allows us

Testing

navigate to the CMS and choose a testing site
make the page have special character in frontmatter (can be done through editing the page directly on github if lazy)
rename a page to have a special character in the title
check that clicking settings on the page does not show a html encoded title

Notes
This change might be destructive - existing pages w html encoded properties in their frontmatter will have them unescaped. I'm not entirely sure how this impacts things like permalink etc but it's also possible to guarantee only for the title property

seaerchin · 2023-04-19T10:31:53Z

@alexanderleegs tagging you separately in case there is anything that isn't back-compat

kishore03109

This change might be destructive - existing pages w html encoded properties in their frontmatter will have them unescaped. I'm not entirely sure how this impacts things like permalink etc but it's also possible to guarantee only for the title property

hmm, if we are not sure the possible implications of this, is there a reason why we don't make this change only for the title property then to reduce surface area of bugs?

kishore03109 · 2023-04-20T02:06:19Z

src/utils/markdown-utils.js

+    // so this does not do anything destructive.
+    // Do note that frontmatter containing pre-existing html encoded characters (&amp;)
+    // will get transformed regardless.
+    (val) => _.unescape(val)


Nit: This solution escapes all html characters, wdyt about only escaping &, since that is the most common case that we are encountering at the moment?

the common case isn't the only case - this means that the same bug can appear, just with a different character that's escaped. in the event that it happens, we'd have to expend eng resources to dig through + fix so i'd rather just escape all

Do you think there are going to be additional security concerns that we might have with allowing all html for all frontmatter?
Considering we have CSP headers + this is already the case for some pages, it seems ok, but just checking in

this PR doesn't allow/disallow html, it just escapes encoded html present in frontmatter. the sanitization invariant is still preserved (front matter still sanitised)

alexanderleegs

creating new pages creates this commented out thing at the moment!

alexanderleegs · 2023-04-20T03:33:06Z

@alexanderleegs tagging you separately in case there is anything that isn't back-compat

I think this is fine, we previously didn't do any html encoding for the special pages either as far as i know?

seaerchin · 2023-04-20T03:46:02Z

creating new pages creates this commented out thing at the moment!

does this actually impact editing experience? this is injected due to sanitize encountering an empty body and injecting it into a document. if it's concerning, we could always avoid sanitization if it's an empty string

kishore03109

nit:

does this actually impact editing experience? this is injected due to sanitize encountering an empty body and injecting it into a document. if it's concerning, we could always avoid sanitization if it's an empty string

this does seem confusing at the first glance... could we add a test case to show this behaviour is expected?

alexanderleegs · 2023-04-20T05:02:46Z

creating new pages creates this commented out thing at the moment!

does this actually impact editing experience? this is injected due to sanitize encountering an empty body and injecting it into a document. if it's concerning, we could always avoid sanitization if it's an empty string

Could we put in the check for empty string then? As a user creating a new page, having something you didn't input immediately show up in your editing view probably isn't ideal

seaerchin added 5 commits April 19, 2023 15:32

chore(markdown): add spec for bug

ac1ded7

chore(markdown): add more test

62dbe11

fix(markdown-utils): change sanitization order

0dabe56

test(markdown): add more test cases

591ac78

fix(markdown-utils): change so that frontmatter is ALWAYS unescaped

cefaf7a

seaerchin requested a review from a team April 19, 2023 10:28

kishore03109 reviewed Apr 20, 2023

View reviewed changes

alexanderleegs reviewed Apr 20, 2023

View reviewed changes

seaerchin requested review from a team, kishore03109 and alexanderleegs April 20, 2023 03:46

kishore03109 approved these changes Apr 20, 2023

View reviewed changes

test(markdown): add new spec for empty strings

60bac00

chore(markdown-utils): skip sanitisation if content is empty

f0625d2

alexanderleegs approved these changes Apr 20, 2023

View reviewed changes

seaerchin merged commit 0f10b20 into develop Apr 20, 2023

seaerchin deleted the IS-81/fix/third-nav-title branch April 20, 2023 06:36

alexanderleegs mentioned this pull request Apr 20, 2023

Release/0.23.0 #721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

seaerchin commented Apr 19, 2023 •

edited

Loading

seaerchin commented Apr 19, 2023

kishore03109 left a comment

kishore03109 Apr 20, 2023 •

edited

Loading

seaerchin Apr 20, 2023

kishore03109 Apr 20, 2023

seaerchin Apr 20, 2023

alexanderleegs left a comment

alexanderleegs commented Apr 20, 2023

seaerchin commented Apr 20, 2023

kishore03109 left a comment •

edited

Loading

alexanderleegs commented Apr 20, 2023

[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

[LS-81] fix(markdown-utils): change sanitization process + add unescape #718

Conversation

seaerchin commented Apr 19, 2023 • edited Loading

Problem

Solution

Testing

seaerchin commented Apr 19, 2023

kishore03109 left a comment

Choose a reason for hiding this comment

kishore03109 Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

seaerchin Apr 20, 2023

Choose a reason for hiding this comment

kishore03109 Apr 20, 2023

Choose a reason for hiding this comment

seaerchin Apr 20, 2023

Choose a reason for hiding this comment

alexanderleegs left a comment

Choose a reason for hiding this comment

alexanderleegs commented Apr 20, 2023

seaerchin commented Apr 20, 2023

kishore03109 left a comment • edited Loading

Choose a reason for hiding this comment

alexanderleegs commented Apr 20, 2023

seaerchin commented Apr 19, 2023 •

edited

Loading

kishore03109 Apr 20, 2023 •

edited

Loading

kishore03109 left a comment •

edited

Loading