Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some HTML entities are double-decoded. #2927

Closed
RichardNeill opened this issue Aug 19, 2018 · 4 comments · Fixed by #2932
Closed

Some HTML entities are double-decoded. #2927

RichardNeill opened this issue Aug 19, 2018 · 4 comments · Fixed by #2932
Labels
bug something broken

Comments

@RichardNeill
Copy link

Titles etc have a strange double-decoding bug for HTML entities.

For example, it is impossible to include a literal &lt;b&gt; within a title or tooltip - it is always converted to <b>. This has two consequences:

  1. If I actually want a raw angle-bracket, for example to say Threshold < 3, then this is brittle, because sometimes the bracket could be interpreted as beginning a tag.

  2. There is a possible security risk - it is impossible to enforce an "htmlspecialchars()" conversion to make user input safe, because entities are double-decoded.

Try for example:

layout = {
  title: "This &lt;b&gt;could break&lt;/b&gt; &lt;script&gt;alert(\"?\")&lt;/script&gt; protection",
};
  1. This wrongly shows the words "could break" in bold. This is a bug.

  2. However, there seems to be some special-case handling of "script" because the alert does not trigger, and the script tag is shown literally. This indicates a belt-and-braces fix in the special-case of "script", I think. So while the behaviour is safe, it's confusing, given (1).

Thanks for your time and your help.

@alexcjohnson
Copy link
Collaborator

Thanks @RichardNeill - you're right, we're prematurely decoding those entities. It's not really double conversion, since we're not creating HTML at all, but directly instantiating corresponding SVG elements.

I don't believe there are any security holes here (if you do find one please do let us know!), but that is the reason we're extra constrained - because plotly.js is used on our public-facing platform, we can't allow users to take responsibility for security of their own plots, we need the conversion to be safe no matter what string is supplied.

@alexcjohnson alexcjohnson added the bug something broken label Aug 20, 2018
@RichardNeill
Copy link
Author

Hi Alex, Thanks for your comment.

The immediate source of confusion for me is that both &gt; and < are converted in the title/y-axis/tooltip into the start of tag character <.

So for example, if I have

chartTitle = 'example<b><pre><i><s><script>alert(4);</script><a onblur="alert(3)">x&lt;&gt;&amp;&eacute. 3<4>5<a href="">b</a>';

Then what I see is:
example<pre><s><script>alert(4);</script>x<>&&eacute. 3<4>5

So, sometimes the tags are literal, and sometimes they are not. The handling is inconsistent, depending on the tag.

An end user can certainly inject raw html into a chart - though so far, I haven't been able to find a way to exploit it.

But it is definitely true that the system is inconsistent... a <b> tag is treated differently to a <pre> tag, and some things cannot be expressed.

There is no way, for example, to create a chart with the following, literal, title: Frequency of "<b>" and "<i>" tags in our website.' . Here are some things I tried...

chartTitle = 'Frequency of "<b>" and "<i>" tags in our website.';
chartTitle = 'Frequency of "&lt;b&gt;" and "&lt;i&gt;" tags in our website.';
chartTitle = 'Frequency of "&amp;lt;b&amp;gt;" and "&amp;lt;i&amp;gt;" tags in our website.';
chartTitle = 'Frequency of "&#x3C;b&#x3E;" and "&#60;i&#62;" tags in our website.';

Not one of them will allow me to write a literal open-angle,b,close-angle into the title of the chart!

I hope that makes it clearer.

@alexcjohnson
Copy link
Collaborator

The handling is inconsistent, depending on the tag.

Yes, we have an open item to document this better plotly/documentation#766
Currently there are a few references to this system, eg https://plot.ly/javascript/reference/#layout-annotations-text and https://help.plot.ly/adding-HTML-and-links-to-charts/ but they have not kept pace with the available options, and they're not referenced from most of the places you can use them (which is essentially all the svg-based text rendering in the library). As a general rule, style tags are supported, but layout and content tags are not. See also #1387 and #2515

An end user can certainly inject raw html into a chart

Certainly by using your own javascript on your own page you can add whatever new elements you want, but you cannot inject any raw HTML or javascript using just the figure data and layout objects, which are the portable parts. If I'm wrong about that statement, that would constitute a security hole and I'm eager to hear about it!

Anyway I'll have a fix shortly for the immediate issue of decoding entities improperly. Turns out there are also some easy performance and functionality gains to be had.

We are not likely to support the full set of named HTML entities, just the special characters (&, <, >) that are difficult to enter explicitly, and a few that we've supported for a long time (μ, nbsp, ×, ±, °) but I think we can support all of the numbered entities. So for any named entity not in that list, just use a unicode literal or the numbered entity. There is a way to get the browser to decode arbitrary named entities, eg https://gomakethings.com/decoding-html-entities-with-vanilla-javascript/, but it involves node.innerHTML and I don't want to take the chance of this introducing a security hole even though I think we could do it safely... just seems to be an unnecessary risk.

@RichardNeill
Copy link
Author

Thank you. I confirm that this is now fixed, at least for what I needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants