Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prism seems to not highlight HTML comments (at least when I tried it) #2651

Closed
Neohiro79 opened this issue Nov 20, 2020 · 11 comments
Closed

Comments

@Neohiro79
Copy link

Neohiro79 commented Nov 20, 2020

You can see the problem in this JSFiddle here:

http://jsfiddle.net/neohiro79/zbfdhxjy/

Is there something I've missed?

I even tried to give it a classname "comment" as proposed somewhere - but the HTML comments won't show up.

@RunDevelopment
Copy link
Member

Prism only highlights the text content of code elements. You have escape all < and & characters inside code elements or use our Unescaped markup plugin.

@Neohiro79
Copy link
Author

Neohiro79 commented Nov 21, 2020

Thanx a lot for the quick tip and answer! It finally worked once I've escaped all < and > characters:

http://jsfiddle.net/neohiro79/zbfdhxjy/1/

@Neohiro79
Copy link
Author

Neohiro79 commented Nov 21, 2020

Something what still bothers me, even if the Unescaped markup plugin is used:

What is the reason for naming the class language-markup if it is not 'markup'ing the markup language called HTML at all?

It won't till a so called plugin "unescaped markup" is used - BUT HERE COMES THE CAVEAT: you can't use script tags inside this plugin, since it is based on putting things inside the script tag itself - and if you cannot use the alternative of the <-- masking because you want exactly this: seeing the comments.

You can only use it in a weired combination like here:

http://jsfiddle.net/neohiro79/zbfdhxjy/2/ >> @edit that point is solved - I forgot to escape the html entities.

It also puts some sort of weired additional 'tabspace' before the 'style' tags for some reason.

see here: http://jsfiddle.net/neohiro79/zbfdhxjy/3/ >> @edit that point is solved, too. I used tabs instead of spaces.


Still, this question remains unsolved:

If this markup-plugin can realise the specific <-- and --> parts through some sort of RegEx magic algorithm, why is this not working as one might expect for the special comment class?

Am I the only one who wants to use and see comments in HTML code snippets?

@RunDevelopment
Copy link
Member

Markup inside of code elements isn't treated as text by the browser. It's treated as markup and markup has meaning, so please don't put unescaped script tags inside code elements (unless they have type="text/plain") because the scripts are going to be executed. Same for style and all other tags that have side effects. This why you're supposed to escape all < and & characters.

Unescaped Markup (UM) only works because it uses markup elements (comments and plain text script tags) that don't have side effects.

What is the reason for naming the class language-markup if it is not "markup'ing the markup language called HTML at all?

I don't know what you mean by that.

If this markup-plugin can realise the specific <-- and --> parts through some sort of RegEx magic algorithm, why is this not working as one might expect for the special comment class?

Because there is no magic. UM will only be active for code elements that contain no text and exactly one comment. I'll add that to the doc.

Am I the only one who wants to use and see comments in HTML code snippets?

Most certainly not. We get an issue like this every month or so and the answer is always the same: Escape your < and & characters or use UM. There are no other (safe) solutions.

@Neohiro79
Copy link
Author

Allright, I get it. Seems like Browser Vendors have f* up broadly. Why the f* hell javascript itself is beeing executed once inside a code tag ??? This should NOT be the case - I mean what is the reason for putting things in a code tag, for sure NOT code execution.

But anyway, since it is like that, I seem to have to accept it. Why bothering searching for a meaning ...

What is the reason for naming the class language-markup if it is not "markup'ing the markup language called HTML at all?

I meant exactly this - if I use class = "language-markup" it should 'markup' all the code inside as what it is: HTML, CSS and JS.
But you answered that already - meaning the reason why it won't work is because browsers won't behave.

Because there is no magic. UM will only be active for code elements that contain no text and exactly one comment.

I don't get that.

However, what I still don't get is why are "comments" seen as malicious either? There cannot be any 'malicious' part in comments since they for sure won't execute - or am I wrong even in this case?

Can't the 'comment' class accept at least what it implies? Accepting comments?

@RunDevelopment
Copy link
Member

Seems like Browser Vendors have f* up broadly.

I don't think so. HTML parsing rules are already complex enough, so parsing markup inside code tags differently just to enable this use case is very much unjustified, especially since HTML always had a solution for your problem: Escape all < and & characters. (I know, I sound like a broken record at this point).

Also, let's keep the discussion civil, please.

if I use class="language-markup" it should 'markup' all the code inside as what it is

That's not what class="language-markup" means. The language-xxxx class is simply a way to tell Prism the language of your code (e.g. HTML, C++, JS, Haskell). Prism can't automatically detect the language of code just by looking at it, so you have to help Prism. That's all that the language-xxxx does.

However, what I still don't get is why are "comments" seen as malicious either? There cannot be any 'malicious' part in comments since they for sure won't execute - or am I wrong even in this case?

Don't worry, comments aren't seen as malicious and won't be executing anything either.

In the case of comments, UM is only active for code elements that don't have any text. The reasons for that are simplicity and efficiency. Also, if UM "unescaped" all comments, then this would also bring compatibility issues with other plugins (e.g. Keep Markup). It would also unnecessarily slow down your website for all your clients since UM would have to check for comments inside the DOM subtrees of all code elements (- slow down not by much, but even that is too much for unnecessary).

Can't the 'comment' class accept at least what it implies? Accepting comments?

A comment class is just that, a class. There is no accepted web standard that attributes any special meaning to this class, so there is no reason for us to do so.

@Neohiro79
Copy link
Author

Neohiro79 commented Nov 21, 2020

That's not what class="language-markup" means. The language-xxxx class is simply a way to tell Prism the language of your code (e.g. HTML, C++, JS, Haskell). Prism can't automatically detect the language of code just by looking at it, so you have to help Prism. That's all that the language-xxxx does.

I knew that already, my point is if there is a "hinting" class for Prism to know which language to expect, and there is a generic name like "markup" instead of say "html". I understand it as a "container-class" for "html, css and javascript". At least that's what I would expect. I was irritated that it didn't seem to work in Prism.js as I would expect, since it worked properly in Rainbow.js, as shown here:

http://jsfiddle.net/neohiro79/zbfdhxjy/20/

It's class language-html handles ALL languages which I am used to expect properly, without even escaping.

And yes, I know, there have been "security-concerns" about this usecase about working without escaping, but personally that's what I expected at that point in time.

Now, Prism.js seems to work 'similar', but you need to know to PROPER ESCAPE and use the classname "language-markup", which also results in what I wanted to have:

http://jsfiddle.net/neohiro79/zbfdhxjy/21/

So basically, I am happy now - but as said, you need to know the PROPER ESCAPING PRACTISE, else you will see all kind of weird errors in the syntax highlighting, which is really irritating and annoying if you are not expecting this kind of behaviour, since it is not extra pointed out in the manual as far as I am aware of.

@RunDevelopment
Copy link
Member

since it worked properly in Rainbow.js, as shown here:

That's interesting that Rainbow.js uses innerHTML. It's the only syntax highlighter that does that AFAIK. As you pointed out, this is a security concern (you can see the script being executed in your example) but it also has practical consequences as well:

  • How will this work for nested code block? (Let's say I wanted to make a tutorial for Rainbow.js, will it be able to highlight code blocks that follow its own format?)
  • This means that you can't highlight arbitrary HTML code snippets with Rainbow.js, it has to be parsable HTML. E.g. unclosed comments, scripts, styles, or the code </code> can't be highlighted. Admittedly, this is a bit nitpicky, but it's an assumption Rainbow.js makes.

It's class language-html handles ALL languages which I am used to expect properly, without even escaping.

Prism is built on top of existing best practices. People already used pre elements and code elements to display code before Prism came to be and HTML5 recommended the language-xxxx class.

Escaping certain characters is how HTML always used to be and, given that changing is behaviour will break the internet, most likely always will be.

which is really irritating and annoying if you are not expecting this kind of behaviour, since it is not extra pointed out in the manual as far as I am aware of.

I'll add a note.

@Neohiro79
Copy link
Author

Since I've had a long and very interesting conversation with @joshgoebel (the maintainer of highlight.js) about this very topic, I also came to the conclusion, after understanding the whole problem, that it is not a good idea to use innerHTML - at least not firsthand and without proper "opt-in".

Since HTML code container also execute that javascript which is written in them - even before any highlighter or other script could potentially intervene and mask potentially harmful code, it would be better to "blend-in" a "clear warning" to the user, explaining why and how to proper mask his code to ensure it works as expected than to just allow to run the script.

see conclusion here:

highlightjs/highlight.js#2886

@RunDevelopment
Copy link
Member

Just read through your discussion over there :) I'm sorry that I couldn't explain the issue as well as Josh did but I'm glad that we all came to an understanding.

@Neohiro79
Copy link
Author

:-) yeah, he was very on point of the whole issue and finally convincing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants