Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Your library encourages unescaped HTML by supporting it in many cases. #249

Closed
joshgoebel opened this issue Nov 21, 2020 · 46 comments
Closed

Comments

@joshgoebel
Copy link

joshgoebel commented Nov 21, 2020

Reference:
highlightjs/highlight.js#2884

Some users get entirely the wrong idea when you support this behavior (highlighting unescaped code). This allows all sorts of potential JS injection attacks for users who don't understand why proper escaping is necessary.

Are you aware of the security implications of encouraging this?

@Neohiro79
Copy link

While I really appreciate the way of trying to fix security vulnerabilities which might potentially coming up, at least this solution is a solution which works in an understandable way (for me).

If I am hosting this on my very own, nothing should happen as far as I am aware of.

Of course, if you fire up the console and put some additional code in it or use some weird ads on your website, you always "have potential security risks". But this is true for every website out there and has nothing to do with this "masking" of "<" ">" elements. This is not a form field which will get pushed via PHP to a server:

Info about Client Side Injection Attacks

I personally don't get the drama here right now - but I might have not the whole picture...

At least Craig's solution is by far the most reliable one when it comes to "highlighting" html code the way I expect it to be highlighted from a highlighter - and additionally he provided the best feedback so far from all "highlighers" out there.

@Neohiro79
Copy link

Neohiro79 commented Nov 21, 2020

So yes, Javascript works (in every modern browser nowadays, as far as I know):

https://jsfiddle.net/neohiro79/zbfdhxjy/11/

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

Yes, but this is potentially very bad:

<script>
alert("You've been hacked, OR WORSE");
// mine some bitcoin, steal some user data, etc.
</script>

And this does nothing (because it's just text, NOT Javascript).

&lt;script&gt;
alert(&quot;You've been hacked, OR WORSE&quot;);
&lt;/script&gt;

I didn't really open this issue for you more for the library author.

I personally don't get the drama here right now - but I might have not the whole picture...

You don't. The key: Not all users are you. This is very dangerous behavior for a highlighting library to be encouraging. There are very real risks when someone DOES use this library with PHP and a server and then highlight some code (just like you did)... and then their website is defaced (or much much worse).

I expect it to be highlighted from a highlighter

HTML is complicated... sometimes just because something is easy doesn't mean there aren't serious risks. :)

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

@Neohiro79 Just to be clear there are also technical issues with trying to highlighting HTML like that:

<pre><code class="language-html">
Am I &amp; HTML or what?
&quot; Is there a problem?
</code>
</pre>

For example you can't highlight this because &amp; and &quot; will have been converted into & and " by the browser... so it will fail to highlight ALL HTML properly - because it's impossible (without escaping).

Or:

<pre><code class="language-html">
<BOLD>really</BOLD>
</code>
</pre>

This will come out as <bold>really</bold> with all the capitalization being lost... because it's not text, it's HTML and HTML doesn't care about case (so most browsers use lowercase tags).

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

@Neohiro79 Don't feel bad though, this is a very common confusion. But it's the way it is for good reason. :) And lots of security nightmares have occurred on the web due to people not understanding this who perhaps should have. :)

You might say:

"I live in the boonies I never lock my car, or my house and I leave my huge stash of gold on my front porch - where it's perfectly safe - works for me."

And you might be speaking the truth. :-) But that's also terrible advice for a LOT of people in say urban cities. :-)

@Neohiro79
Copy link

Neohiro79 commented Nov 21, 2020

Josh, I am really thankful that you at least gave me a proper answer to my question and also provided me with a workaround.

But I don't get the point of your security concerns. Javascript is as far as I am aware of activated by default in all modern browsers nowadays. So basically EVERY script, even on CDN's might be "compromised" - hence "loaded" into the browser without ANY option to "stop the browser from doing so" - except "denying all javascripts" - which noone really does nowadays.

If I host all my scripts and html files on my server having only potentially some "input-fields" which might get submitted through php itself, I only run into troubles (as far as I am aware of) when you do not use htmlentities to proper ensure that no code which might beeing submitted is potentially beeing executed on the server itself. That would harm your files on the server, yes.

That said, since every client and browser on this planet actually executes javascript without any security measures other than maybe a sandboxed environment, I don't get your point of the implied security concerns in case someone "highlights" the code the browser anyway doesn't execute since it is in a <code> block to show the code in a proper way.


@edit: Which was proofed wrong thanx to joshs ongoing resitance, as can be seen here:

BUT IT DOES see: https://jsfiddle.net/neohiro79/zbfdhxjy/14/


I might be missing something here - if so, please give me a hint since I want to know what I might need to look out for when using rainbow.js.

If you're talking about a node.js environment which is executed on the server - well, yes, that might be a complete other topic, but who is using node.js when beeing concerned about security issues, right? ;-)

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

I'm talking about the HTML environment on the client. I think I covered it with my analogy earlier.

Just because your gold is safe on your front porch doesn't mean putting a huge stack of gold on front porches is a good idea for most people - and perhaps even you will get your gold stolen one day. :-) Same reason you don't run with knives... no one means to trip and impale themselves and die, yet every year it happens to someone. :-)

What if I want to highlight this code:

<script>
alert("This is ONLY to be highlighted, it should NOT actually show an alert in my browser.");
</script>

You can't UNLESS you properly escape the HTML. That is the correct way to do it.

@joshgoebel
Copy link
Author

Thankfully there are no examples in the README showing this risky behavior. Hopefully the author will come around eventually and dissuade you. :-)

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

Here the author is saying the same ting I'm telling you: #244 :-)

And more issues:

It's like half the issues are about people doing this wrong and the pain as a result.

@Neohiro79
Copy link

Neohiro79 commented Nov 21, 2020

What if I want to show someone this snippet:

I just use the "<code>" element, very simple:

<code>

<script> alert("This is ONLY to be highlighted, it should NOT actually show an alert in my browser."); </script>

</code>

and hightlight it properly with a "hightligher-script".


@edit:

BUT IT DOES see: https://jsfiddle.net/neohiro79/zbfdhxjy/14/


And since I've edited it in here in git, I needed to use those `backticks ` for the <code> element to proper show up, since I used a form-field to "send" it to the server to "show" it to you.

Then, just a personal guess, github is using the `backticks` to properly address those chars with htmlentities / php on the backend once the "comment" button has been pressed to properly "recode" those < > special chars into something which cannot be used in a bad manner.

That's how it should be done. And the "highlighter script" - in my humble opinion - has nothing to do with this.
Or should have. At least of what I am aware of.

@joshgoebel
Copy link
Author

That is not safe either:

<code>
<script> alert("This is ONLY to be highlighted, it should NOT actually show an alert in my browser."); </script>
</code>

@joshgoebel
Copy link
Author

github is using the backticks to properly address those chars with htmlentities / php on the backend once the "comment" button has been pressed to properly "recode" those < > special chars into something which cannot be used in a bad manner.

Yes because if they didn't they would have major security problems.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

I am still waiting for the example how rainbow.js is going to be able to "destroy" my website ...

The gold on your porch might be safe... for now... but good luck. :) Because a lot of peoples gold is getting stolen off porches... I'll keep my gold elsewhere.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 21, 2020

FYI: Unrelated You use three backticks for multi-line code blocks on GitHub.

```
This is a code block.
```

@Neohiro79
Copy link

I think I found out what you're talking about:

https://jsfiddle.net/neohiro79/zbfdhxjy/14/

This is the concern you have, right?

The browser executes Javascript-Code even when it is written in a code-element, that's the point?

@Neohiro79
Copy link

Yes, that's something I wasn't aware of yet, I really thought javascript code inside a codeblock would NOT be executed by default. What a fuckup.

But even then, if I "use" javascript-code to be highlighted, I only need to address this javascript "<script>" tag - no other tags, since Javascript won't execute outside of those script-tags.

Even more, why is it then, that close to all highlighters I tested, only HTML comments made "troubles" - not "CSS" and not "Javascript"?

I will look into this topic further, I hope we can keep up this conversation once I figured it out.

@joshgoebel
Copy link
Author

I only need to address this javascript "<script>" tag - no other tags

I've given 2-3 examples of where this breaks in very weird ways. See many of the open issues here for tons examples. The problem in there is NO way to get the ACTUAL "text" in-between tags (when there is HTML or comments in the middle), because the browser throws that text it away... and store a "representation" of it. Such as my example with the browser turning all your <BOLD> into <bold>... or sometimes adding and removing random pieces of your HTML that it doesn't like.

<code>
[stuff that includes HTML]
</code>

There is no way to get whatever [stuff that includes HTML] is verbatim. And that's a problem if you're actually trying to highlight exactly what is in-between the blocks. Which is kind of important if say you're wanting to highlight specific HTML to ask a question or show someone a problem.

, why is it then, that close to all highlighters I tested, only HTML comments made "troubles" - not "CSS" and not "Javascript"?

I do not know what you are asking. And I'd be carefully to make this a numbers game. It's possible for a lot of people to do something wrong and it still be a bad idea. :-)

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

What you want is something like:

<verbatim>
I can put anything here and the browser will give me EXACTLY what it is, HTML, CSS, JS, etc... but not "execute it"...
</verbatim>

But the problem is what if we wanted to show how to use it?

<verbatim>
This is how to use verbatim:
<verbatim>
I can put anything here and the browser will give me EXACTLY what it is, HTML, CSS, JS, etc... but not "execute it"...
</verbatim>
</verbatim>

What the heck is going on now? How do we know which </verbatim> ends the block? This is why you need escaping.

XML actually has this concept with "CDATA":

<![CDATA[
   anything can go here, HTML, JS, whatever, XML, and it's text not XML
  <html><this>is<text>not</xml>or</html>
]]>

And the rule is "anything goes" UNTIL ]]> -whenever it finds that, the magic CDATA block is over. So if your text needs to include that you'd out of luck - and you need to invent your own escaping.

@Neohiro79
Copy link

Neohiro79 commented Nov 22, 2020

, why is it then, that close to all highlighters I tested, only HTML comments made "troubles" - not "CSS" and not "Javascript"?

Original problem I found astounding was this:

http://jsfiddle.net/neohiro79/zbfdhxjy/24/

I thought basically, that most highlighters I've tested at that point with that code, that it just "eats" the html comments for whatever reason. What I haven't been aware of or seen at that time, were the missing html tags itself p tag, style tag and the script tag.

Once you made your statement about the missing encapsulation or encoding and got around my doubts, I went to html-entities, changed the code - and all of the sudden the magic happened, I saw everything as I would've expected.

See here:

http://jsfiddle.net/neohiro79/zbfdhxjy/30/

What might seem basically logic to you once knowing about this very fact of encoding practise, it wasn't for me.
Interestingly, if this very topic is basically so concerning and important, I ask myself, why no other highlighter (Rainbow.js, Mircolight.js, Highlight.js, Prism.js & Enlighter.js) is really making this an "important to know topic" on their website or helpfile.

I quickly implemented all of them in a jsfiddle as written in their helpfile just to see how they worked (it was originally about the topic of having more than one line comments which have been highlighted false) and found that close to none of them were showing up the html comments properly as expected - but all others (CSS, JS comments and content) showed up.


What the heck is going on now? How do we know which ends the block? This is why you need escaping.

I would say since we're all using proper html5 based on proper XHTML 1.0 which was basically used before my guess would be that's clear since the encapsulation is always inside another.


There is no way to get whatever [stuff that includes HTML] is verbatim. And that's a problem if you're actually trying to highlight exactly what is in-between the blocks.

So basically if you grab an ID or class in the DOM you will get to the "contents" of any given tag innerHTML or innerText, see here w3schools - innerText

I wasn't aware of the fact that the browser is mixing things up internally to much to avoid javascript execution within code blocks, but for me it seems like a huge fuckup, to allow code execution within code tags itself, since they are basically there to "show code" as I understand them, not to tell the browser, "hey execute that, it's code". How f* dumb is that.

It's like you're going to jump into a car, close the door, and during your drive the door will just suddenly open up because it wasn't locked in the first place. What happend? Well, while you might think the door was closed when you closed the door, fun-fact it wasn't, since the engeneers found that too primitive, hence they implemented a second nice looking but good hidden lock-button, which will lock the door once pressed.


but still ...

basically my main question, why the need to mask ALL < and > chars, when one could only mask the ones of containing the questionable chars of "script" (or in other words the script tag itself) - in case it is put in a code tag remains unsolved to me, because that seems to be the main problem, having some weired "<script>" tags inside the browser on places you don't wanna have them, cause they will execute.

I mean right now the practise seems more like a "just get rid of em all" approach (eventhough it is not neccessary IMHO).

@ccampbell
Copy link
Owner

ccampbell commented Nov 22, 2020

Just wanted to point out that rainbow does escape tags for you when it can.

rainbow/src/util.js

Lines 66 to 74 in d606601

/**
* Encodes < and > as html entities
*
* @param {string} code
* @return {string}
*/
export function htmlEntities(code) {
return code.replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/&(?![\w\#]+;)/g, '&amp;');
}

The issue is that when you put something like this the browser executes the tags as they were written on the page before the javascript from rainbow runs to rewrite them:

<pre><code>
<script>var whatever = true;</script>
</code></pre>

So it is specifically when you are highlighting html code in this manner that you need to use html entities &lt; and &gt; since there is nothing rainbow can do otherwise.

<pre><code>
&lt;script&gt;var whatever = true;&lt;/script&gt;
</code></pre>

Also, as an aside, you should set the language to html when you are highlighting JavaScript code wrapped in a <script> tag. If you are highlighting a JavaScript code block not wrapped in a <script> tag then leaving things unescaped should usually be fine. For example:

<pre><code class="lang-js">if (foo > 25) {
    console.log('good!');
}</code></pre>

Rainbow works from node.js too so if you are doing the rendering on the server side, I believe it will already output proper html entities.

@joshgoebel
Copy link
Author

that it just "eats" the html comments for whatever reason. What I haven't been aware of or seen at that time, were the missing html tags itself p tag, style tag and the script tag.

Well as you are learning this all depends on various behavior of innerText, innerHTML, textContent... HTML isn't content, so I personally think the "correct" behavior is to silently drop it all (which leads to a teaching moment when people go "wait where did my code go?????") - which is exactly what HLJS does (for the most part). Obviously Rainbow has made a different choice - though one I don't understand at all since it seems half of the support issues could be avoided with a small change to the library to help educate people if they are doing this wrong.

What might seem basically logic to you once knowing about this very fact of encoding practise, it wasn't for me.

Oh no that was clear. :-) I'm just a bit more used to "appeal to authority" working. :-) I maintain Highlight.js. I run a website where people can paste code (http://pastie.org/). Though I guess I didn't exactly present credentials. I've been doing this quite a long time. I'm happy to explain things, but when people seem not to want to listen it can get frustrating fast. I'm glad to see you're coming around. :-)

Interestingly, if this very topic is basically so concerning and important, I ask myself, why no other highlighter (Rainbow.js, Mircolight.js, Highlight.js, Prism.js & Enlighter.js) is really making this an "important to know topic" on their website or helpfile.

That's a good question. I'd say it's a bit of an "beginner" question (and our README assumes more than that level of knowledge), but I might even be wrong on that... but even if so there are lots of beginners I suppose... so it's probably worth some documentation. Highlight.js has a small problem though in that our handling of HTML is a bit more "complex" than I've led you to believe. We actually preserve HTML and pass it thru, which allows some neat tricks:

var x;
<span class="important">var y;</span>

We will highlighting this like as such (on the browser):

<span class="keyword">var</span> x;
<span class="important"><span class="keyword">var</span> y;</span>

And this is one advantage of MIXING code and html... Someone might define important as { background: gold} and now they have an easy way to highlight that line of code on their blog. The code inside MOST HTML tags can be more tags or text... and we allow for both. We highlight the text, we preserve the HTML.

Yes, it's complex and long-term I want to kill this (or move it to a plugin) since this is a feature not everyone needs. So in the future out of the box it's far more likely we'd silently drop the HTML and just render:

<span class="keyword">var</span> x;
<span class="keyword">var</span> y;

close to none of them were showing up the html comments properly as expected - but all others (CSS, JS comments and content) showed up.

I assume they are all using variants of innerHTML vs innerText vs textContent... all have slightly different behavior, but really hard to say without looking at the code of every engine.

I would say since we're all using proper html5 based on proper XHTML 1.0 which was basically used before my guess would be that's clear since the encapsulation is always inside another.

HTML5 is typically not valid XHTML nor is it required to be. You are not required to close or pair tags. The real question was how do you how do you know which verbatim is which. What if you add more:

<verbatim>
<verbatim><verbatim><verbatim><verbatim>
</verbatim></verbatim></verbatim></verbatim>
</verbatim>

Is that a bunch of nested verbatims or a single verbatim with a bunch of verbatim code examples? You could invent some complex rules to decide, but it's not clear at all. Hence, escaping (or something like CDATA). XML (with no ambiguity):

<verbatim>
<![CDATA[
<verbatim><verbatim><verbatim><verbatim>
</verbatim></verbatim></verbatim></verbatim>
]]>
</verbatim>

So basically if you grab an ID or class in the DOM you will get to the "contents" of any given tag innerHTML or innerText

innerHTML is never guaranteed to be the same as your actual HTML, only "very similar". The other attributes (innerText, textContent, etc) have different behavior - you'd have to read the docs. I don't keep all the behaviors on the top of my head.

I wasn't aware of the fact that the browser is mixing things up internally to much to avoid javascript execution within code blocks, but for me it seems like a huge fuckup, to allow code execution within code tags itself, since they are basically there to "show code" as I understand them, not to tell the browser, "hey execute that, it's code". How f* dumb is that.

I've been doing this too long to comment other than to say your expectation is just wrong. <code> tags aren't "magic" they can have HTML inside them just like ANY other tag. And again there are reasons for this, what if I want to highlight some code?

<code>
formatWholeComputer() <span style="color:red"> // never do this</span>
</code>

How do you think highlighters work? We put HTML inside code tags... HTML that colors the text different colors, just like the last example. We don't change the code tag, it's still a code tag - with HTML inside it.

basically my main question, why the need to mask ALL < and > chars, when one could only mask the ones of containing the questionable chars of "script" (or in other words the script tag itself) - in case it is put in a code tag remains unsolved to me, because that seems to be the main problem, having some weired "<script>" tags inside the browser on places you don't wanna have them, cause they will execute.

I already gave an example <BOLD> turning into <bold>... you're safer to just replace them all. That's my opinion. You definitely have to replace all the <... (to prevent starting a REAL HTML tag)... you can get into arguments about >... PrismJS/prism#2516 I think replacing them all is more consistent (and perhaps safer) so that's what Highlight.js does. Prism.js seems to only replace <, &, and non-breakable spaces (for some reason).

Highlight.js also encodes ", ' (in our output) as well just to be on the safe side.

If you are doing this escaping by hand, you should stop and use a blogging platform or something. This is a job the computer should be doing for you. :-)

@ccampbell
Copy link
Owner

I am also going to close this issue as I don’t encourage unescaped HTML code. It won’t work most of the time, and of course, can introduce xss vulnerabilities.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

Also even in XHTML land the point I was making is if "verbatim" magically means 'this is not code"... then the first "" isn't an opening tag.. it's just text, it could be booger... I'm asking "how do you put text on the page that looks like HTML"

So if I really want the TEXT/output to be:

boogers-boogers-boogers-boogers
</verbatim></verbatim></verbatim></verbatim>

You'd start with:

<verbatim>
boogers-boogers-boogers-boogers
</verbatim></verbatim></verbatim></verbatim>
</verbatim>

But this has a BIG problem because the browser is probably only going to see:

<verbatim>
boogers-boogers-boogers-boogers
</verbatim>

And who knows what it will do with the rest of my code probably hide it and not display it. So you escape it:

<verbatim>
boogers-boogers-boogers-boogers
&lt;/verbatim&gt;&lt;/verbatim&gt;&lt;/verbatim&gt;&lt;/verbatim&gt;
</verbatim>

It's ugly, but now the browser isn't confusde.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

It won’t work most of the time, and of course, can introduce xss vulnerabilities.

It actually works a lot of the time (other than all the weird side issues) - which led to this whole thread... :-) is there a reason you don't just use textContent or innerText to avoid this issue completely? I feel like you'd have a LOT less support issues.

@ccampbell FYI: I wasn't accusing but early on I got the impression that this person had spoken to the author of the library and the author had encouraged this or marketed it as a feature. :-) I'm glad I misunderstood.

@ccampbell
Copy link
Owner

is there a reason you don't just use textContent or innerText to avoid this issue completely?

I guess it depends on what you are talking about. Rainbow doesn’t do any dynamic markup injection or anything like that. It takes what is already rendered on the page. If someone wants to dynamically insert markup to be highlighted that is entirely outside the scope of this library. I see I have one examples in the README using innerHTML, so I suppose I can update that, but I was using that more for simplicity’s sake to point out how the API works.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

I'm not sure you are following. Your library current supports (perhaps encourages this). From looking at ALL your support issues its obvious people are very confused - because for simple cases raw HTML actually works (despite being a huge security risk)... but then for other cases it flops:

<pre><code>
<html>...</html> <!-- this will disappear of course since it's not valid HTML -->
<!-- is this HTML or code to be highlighted? -->
<span>Is this HTML or code to be highlighted</span>
<script>stealUserData()</script>
<STRONG>bold</STRONG> <!-- will turn into <strong> -->
</code></pre>

What is the purpose of allowing the unescaped HTML at all?

It encourages users in the sample cases to think this is how your library can and should be used. Since you'll actually highlight MOST of that.

So you could do one of two things:

  • switch to textContent, just silently dropping all the HTML - making it clear that HTML must be escaped
  • OR if there is no real reason to allow this, the don't allow it - show an error or warning instead.

See the proposal I linked to just above. highlightjs/highlight.js#2886 Your library itself could easily detect that the user (mistakenly) include RAW HTML and then show them a warning or error with a link to a section of the documentation (or an issue) that explains proper HTML escaping once and for all - vs you having to answer the question over and over.

And it would make the web a more secure place. If your library allows this (and people are doing it often judging from issues) then it seems very obvious that some of your users are potentially opening themselves up to a serious security vulnerability by using your library - one that you could so easily be saving them from.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

Your library current supports (perhaps encourages this).

To be clear I mean it's your library encouraging bad behavior, not your documentation or README.

Say someone uses your library on their blog:

<p>Here is an example of Bold:</p>

<pre><code class="language-html">
<strong>This will be bold</strong>
</code></pre>

And they go "Oh neat, that's so easy. I love this library!" I could give you a long list of support issues where your users have done exactly this... except you only see the users who got tripped up on minor glitches... you don't see all the happy users doing it wrong.

BUT then later they build a more complex project:

<pre><code class="language-html">
<?php echo $USER_INPUT ?>
</code></ore>

And boom, their website is exploited, user data is stolen, etc... this isn't directly your fault... but you did nothing to prevent them from getting the wrong idea about how to use your library... so then they get the wrong idea, and cut their hand wide open - with the knife you handed them - because you assumed they knew knives were dangerous... but they didn't...

It'd be so easy to say "Hey knives are sharp, be careful." :)

@Neohiro79
Copy link

@ccampbell FYI: I wasn't accusing but early on I got the impression that this person had spoken to the author of the library and the author had encouraged this or marketed it as a feature. :-) I'm glad I misunderstood.

I completely felt it was my fault that josh stepped over to you and was accusing you of something you didn't even know what he was talking about and why and I didn't even know that it existed before. I am always pissed off when someone tried to accuse someone out of the blue - regardless the matter.

Since my lack of knowledge in this case triggered this whole issue, I am deeply sorry if someone of you two got brain-troubles in the process. Josh potentially was just pissed of the fact that I mentioned that your library is much more intuitive and working from the startup - so I'm sorry for that - that might've triggered the whole anger of Josh here firsthand.

@josh As you guessed correctly, no, I didn't checked 'who you are' (who cares ;-)), I just thought some guy who is for whatever reason I don't get right now just hugely pissed off :-)

I really think that we all could achieve so much more when we would just keeping all our "proudness" away and just keep discussing topics as what they are (and doing so right now as far as I see) and trying to explain things to each other in a manner and language that we all can "enhance" ourselves.

And let me tell you, Craig is really a role model in this regard. I was happy to finally get a useful answer from "someone" of those "libraries" (who in most cases don't work as expected) ...

So I'm happy to see I finally can learn and ask something here which might be important to know.

@Neohiro79
Copy link

BUT then later they build a more complex project:

  <pre><code class="language-html">
  <?php echo $USER_INPUT ?>
  </code></pre>

And that was exactly my point from the beginning. The question, how someone might potentially "insert" a malicious script on your site will only "come up" when someone is able to "push" code through an input field to a site which shows this exact content in a "comment-field" or whatever WITHOUT PROPER ENCODING ON PHP/SERVER-LEVEL BEFOREHAND like php-htmlentities BEFORE putting that code "into place".

So basically that has been the whole issue - or am I wrong?

If someone is not aware of the fact of proper masking before putting input to a server this is or becomes a potential issue, since javascript code can be executed inside a code or pre code tag which then might occur.


Which basically leads me to this question: If this is a main concern - wouldn't it be more concerning not to be concerned about the broad usage of node.js and all it's javascript-dependencies getting javascript-dependencies getting javascript-dependencies of javascript-dependencies noone really can say what is loaded when and where from INTO or ONTO your computer (say VSC for example)?

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

Josh potentially was just pissed of the fact that I mentioned that your library is much more intuitive and working from the startup - so I'm sorry for that - that might've triggered the whole anger of Josh here firsthand.

I was not angry, maybe very slightly miffed. :-) The way you said it did read a certain way... :-) "your stupid library is doing it wrong and broken" Could be I was reading a lot into it. :-)

And I tend to write a lot, it doesn't mean I'm "pissed off". :-) I just over communicate.

Don't think I'm being prideful, just trying to make the internet a bit more secure place and save some people a lot of potential pain later.

Who is Craig again LOL? :-) It's also possible you misunderstood someone's answer to be suggesting something they weren't suggesting. :-) I don't think you'll find many highlighting library authors encouraging unescaped HTML (so my topic choice here was probably quite poor)... :-)

@joshgoebel
Copy link
Author

The question, how someone might potentially "insert" a malicious script on your site will only "come up" when someone is able to "push" code through an input field to a site

No. You can shoot yourself in the foot too. A lot of people have automated systems for publishing their sites... So one day you just add:

I will teach you how some JS functions work today!

<pre><code>
alert("Hi! This is an alert");
answer = confirm("Are you sure?");
</code></pre>

And run publish and go to lunch and now you broke your own website! Truly it's not the worst thing, but it's still egg on your face - and easily avoided by putting proper practices in place.

@joshgoebel joshgoebel changed the title Does your library encourage unescaped HTML code inside blocks? Your library encourages unescaped HTML by supporting it 90% of the time. Nov 22, 2020
@joshgoebel joshgoebel changed the title Your library encourages unescaped HTML by supporting it 90% of the time. Your library encourages unescaped HTML by supporting it in many cases. Nov 22, 2020
@joshgoebel
Copy link
Author

@Neohiro79 But again my point was always larger than you and your personal website. Teach one person to do this wrong and they teach other people and then you have a bunch of people doing it wrong. :-) You should do it right so that others learn from your good example instead of learning from your your bad behavior - because they MIGHT be dealing with random user input.

@Neohiro79
Copy link

He never did Josh ... :-)

See here:

#248

Be assured, after this whole conversation and all the in- and for sure upcoming useful comments here, that I would hug both of you :-)

@Neohiro79
Copy link

@Neohiro79 But again my point was always larger than you and your personal website. Teach one person to do this wrong and they teach other people and then you have a bunch of people doing it wrong. :-) You should do it right so that others learn from your good example instead of learning from your your bad behavior - because they MIGHT be dealing with random user input.

Yeah, you're right. That's why I really like this conversation, since 'some guys' who finally 'know something' can 'tell someone' something which might help others too in the future. That's how it should be!

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

If this is a main concern - wouldn't it be more concerning not to be concerned about the broad usage of node.js and all it's javascript-dependencies getting javascript-dependencies getting javascript-dependencies of javascript-dependencies noone really can say what is loaded when and where from INTO or ONTO your computer (say VSC for example)?

And that my friend is the beginning of wisdom. npm install ... 500 libraries installed! should scare people shitless. :-) VS Code has security walls in place so that the worst thing plugins can do is fuck up your VS Code - not your whole computer, but that's also a valid concern... any plugin you install definitely could be malicious and mess up your VS Code experience at the very least.

This is a VERY real attack vector - and has been used for attacks in the past. (with Ruby Gems off the top of my head) Compromise the library server, upload a "bad" library instead of a good one, infect everyone who installs it.

https://news.ycombinator.com/item?id=5139583

I'm not sure what npm might have in place to mitigate this kind of attack...

Related: PrismJS/prism#1680 (auto-loading 'infected'/bad languages from a CDN is a risk)

@joshgoebel
Copy link
Author

@ccampbell Sorry for turning your issue into a huge chat room. ;-)

@Neohiro79
Copy link

No. You can shoot yourself in the foot too. A lot of people have automated systems for publishing their sites...

Meaning that a lot of php tools or plugins out there do NOT use proper htmlentities encapsulation and hence this is or shall be a "second" security wall?

To sum it up for me:

  • if I use proper htmlentities encapsulation when uploading
    potential input or files BEFORE putting them into place
  • if I don't use any wp-plugins (and code all php-parts myself)
  • if I have 100% control over my implemented javascript libraries without any sideloads
    (meaning all is coded and hosted by myself or for 100% sure a trusted library residing on my server)
  • if I use such pre-encapsulated code snippets inside pre or pre - code blocks or my own ones

then and only then I shall be 'save, happy and super-secure' ?

@Neohiro79
Copy link

Neohiro79 commented Nov 22, 2020

I'm asking because I found a javascript library which basically can reverse the encoded parts into real html code again, so basically the only security measure which will work is htmlentities on unknown user input and no use of any CDN whatsoever.

see here:

https://mothereff.in/html-entities

I was likely to use this as a "convenience-tool" for me not always having to change my code I want to "present" online.

@joshgoebel
Copy link
Author

joshgoebel commented Nov 22, 2020

Let's keep this focused on HTML escaping. Proper website security is a HUGE issue and gets into many, many things. I haven't the time or desire to get into all that. :) The general idea: If not's not HTML code meant to be rendered as HTML code, then escape it - so it's not accidentally confused by the browser as HTML code.

htmlspecialchars is really all you need for "safe" escaping. htmlentities goes a lot further I think.

and no use of any CDN whatsoever.

Or make sure the CDN assets are what they claim to be. https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity

@joshgoebel
Copy link
Author

Most big CDNs go to great trouble to not be hacked, but yes, any code you don't control is always a risk. A lot of people use Highlight.js from several CDNs we trust, and that's reasonable for many people.

Security is about layers and sometimes knowing "how much do you need".

  • Do you keep your gold on the porch?
  • Do you tell people where your gold is?
  • Do you keep your gold under the mattress?
  • At the bank?
  • At your house, but with armed guards and attack dogs and a state of the art alarm system?

Perhaps you need a safe, but you may not need guard dogs and armed guards - but some people do.

@Neohiro79
Copy link

@josh Could you please do me a favour and get in contact with @Peter here, since he tries to get a VSC plugin to work (it's about the complete unusual usecase of someone wanting to finally print his code through a printer to control what he might have written or mark some things which might need to be rewritten).

It seems there is a can of worms once trying to find the reason for the problem of not having multi-line-comments syntax-highlighted as intended - and even more, your inside-knowledge might be really helpful for him I guess:

Issue about "Extension causes high cpu load"
PDConSec/vsc-print#69

Issue about "incorrect coloring for html comment across multiple lines"
PDConSec/vsc-print#63


@that issues with VSC - yes, that scares me, that's why I asked.

@joshgoebel
Copy link
Author

I have no idea why that issue is relevant to this discussion.

@Neohiro79
Copy link

This issue might not be relevant to this discussion, but he uses your highlight.js in his plugin.
I think the problem was exactly this topic we discussed here ("not masking the html parts") when triggering the printout.

But yes, I might ask for too much.

@joshgoebel
Copy link
Author

I see now, wasn't obvious at a glance. I commented.

@Neohiro79
Copy link

Yeah, sorry, I realised it looked completely like falling out of line ...

Too impulsive myself sometimes too ;-)

@Neohiro79
Copy link

Thanks so much josh! I will let all these comments sink in, since I guess you might've something else to do today and Craig will have no more follow-up emails from us in his mailbox, at least for today :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants