\u2028 and \u2029 break rendering #375

mariusandra · 2016-04-07T14:29:40Z

Hey,

I'm operating a website (https://www.apprentus.com) where we user react_on_rails with prerendering to server content to our users. Currently it's just the homepage, with other pages under development.

I started getting errors that our homepage wasn't loading for some users. Digging deeper I found it's because of the character "\u2028" that was being sent in the props under the "latest classes" section, that features user entered content.

Digging further I found this link:
https://bugs.chromium.org/p/v8/issues/detail?id=1907
... and learned these two are special characters in JS.

Try it out yourself by running this in your browser

eval('console.log("\u2028")')

If you change the number to 2027 or 2030 then it works fine.

This PR is a quick fix that solved the problem for me, although there might be a more elegant solution available.

Marius

This change is

justin808 · 2016-04-08T04:25:03Z

app/helpers/react_on_rails_helper.rb

@@ -299,7 +299,7 @@ def server_rendered_react_component_html(props, react_component_name, dom_id,
 (function() {
  var railsContext = #{rails_context(server_side: true).to_json};
 #{initialize_redux_stores}
-  var props = #{props_string(props)};
+  var props = #{props_string(props).gsub("\u2028", '').gsub("\u2029", '')};


Maybe we should do this in one gsub?

What are the 2028 and 2029 chars?

justin808 · 2016-04-08T04:27:26Z

@mariusandra If possible, please submit your project to https://github.com/shakacode/react_on_rails/blob/master/PROJECTS.md

justin808 · 2016-04-20T07:14:57Z

@alexfedoseev This looks reasonable.

@mariusandra Can you please sync up to master and make sure CI is passing.

mariusandra · 2016-04-20T07:42:50Z

Hey, sorry for my silence up to now. I wanted to still test if with this change the rendered DOM on the server and on the client side will be identical or not. ... but I spent all my free time in the last 2 weeks preparing a long presentation for my local Ruby User Group.

I'll sync and test this within 24h. Sorry for the wait :).

And I'll submit my project to the README as well!

coveralls · 2016-04-21T07:41:54Z

Coverage remained the same at 85.325% when pulling 4a85397 on mariusandra:master into bf834bc on shakacode:master.

mariusandra · 2016-04-21T07:45:51Z

Hey,

First, instead of the double .gsub(), it's much faster to use .delete in this case, as demonstrated by this gist: https://gist.github.com/dominikh/208915

So instead of

props_string(props).gsub("\u2028", '').gsub("\u2029", '')

to use

props_string(props).delete("\u2028\u2029")

Then, my suspicions were correct. With the change proposed by the original PR, I got this error using the "hello world" sample app, with a name of "basreree\u2028aereter\u2029earer":

... so just removing these naughty characters in the block that gets sent to the server side rendering part is not enough. They need to be stripped out everywhere.

I tried a few different approaches, but in the end all that had to be done was to strip them from the data-props attribute in the tag that the react_component method in the react_on_rails_helper.rb outputs. So in this thing:

<div class="js-react-on-rails-component" 
     data-component-name="HelloWorldApp" 
     data-dom-id="HelloWorldApp-react-component-0" 
     data-props="{&quot;_locale&quot;:&quot;en&quot;,&quot;name&quot;:&quot;basreree aereter earer&quot;}" 
     data-trace="true" 
     style="display:none">
</div>

The code block above does contain the actual \u2028 and \u2029 characters in data-props! Try selecting the text like so:

The final patch was to update ReactOnRails::ReactComponent::Options to have def props output a json string of the props, with the characters deleted.

Since this method was called 4 times, I also added a memoization.

      def props
        @props ||= options[:props] ? options[:props].to_json.delete("\u2028\u2029") : NO_PROPS
      end

Now, since this is not my codebase, I'm not sure if props outputting a json string will cause issues somewhere or not... but it seems to work so far!

mariusandra · 2016-04-21T07:55:12Z

This does break the tests.

One option is to do JSON.parse(options[:props].to_json.delete("\u2028\u2029")), but that seems a bit wasteful to me... even though a modern JSON parser should be quite fast.

justin808 · 2016-04-27T02:26:42Z

@mariusandra Why are these characters getting into the data? Can you strip this before passing to react-on-rails?

mariusandra · 2016-04-27T06:38:40Z

@justin808 I'm not sure where the characters come from. It was possibly some text the user copied from MS Word into a textarea... or a situation like that. In any case they were found in user entered text and that caused a 500 error instead of showing the page.

I could strip these characters before passing props to react-on-rails, but then I would have to implement this filtering at every interaction my app has with react-on-rails. It would be easier if this functionality was already in react-on-rails, possibly also saving other users hours of time tracking this bug down.

...

What the problem boils down to is best summarised with this article: http://timelessrepo.com/json-isnt-a-javascript-subset

JSON is not a subset of JavaScript. Almost, but not quite. The difference is in these two annoying characters, \u2028 and \u2029. They're OK in JSON, but not OK in a JS string.

In react-on-rails, the JSON made from props gets passed to execjs as a JS object:

react_on_rails/app/helpers/react_on_rails_helper.rb

Line 282 in 557cb3e

var props = #{props_string(props)};

and if it contains those characters, execjs crashes and a 500 error ensures.

However it seems it's OK to pass those characters in the "data-props" attribute for client side rendering. There they are parsed as part of a JSON object (where they are allowed), not as part of JS (where they are not).

Reading the link I gave above, they propose a solution:

Luckily, the solution is simple: If we look at the JSON specification we see that the only place where a U+2028 or U+2029 can occur is in a string. Therefore we can simply replace every U+2028 with \u2028 (the escape sequence) and U+2029 with \u2029 whenever we need to send out some JSONP.

It’s already been fixed in Rack::JSONP and I encourage all frameworks or libraries that send out JSONP to do the same. It’s a one-line patch in most languages and the result is still 100% valid JSON.

My original fix was to just remove these characters from the server rendered string.
My second fix was to remove these characters from props for the server and the client.

The proper fix™ is to replace them on the server side with their escape sequence:

var props = #{props_string(props).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')};

It's the same fix they use in RACK:
https://github.com/judofyr/rack-contrib/blob/24f3da12ea7f763826d0166591a45eba2d9bfce5/lib/rack/contrib/jsonp.rb#L79

coveralls · 2016-04-27T06:59:01Z

Coverage remained the same at 85.325% when pulling 59ce7d0 on mariusandra:master into 557cb3e on shakacode:master.

justin808 · 2016-04-27T07:41:23Z

One comment.

Reviewed 1 of 2 files at r3.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

app/helpers/react_on_rails_helper.rb, line 302 [r1] (raw file):
Let's add some detailed comments such as shown here.

Also, are we sure that 2 gsubs are the fastest way?

    # Pads the response with the appropriate callback format according to the
    # JSON-P spec/requirements.
    #
    # The Rack response spec indicates that it should be enumerable. The
    # method of combining all of the data into a single string makes sense
    # since JSON is returned as a full string.
    #
    def pad(callback, response, body = "")
      response.each do |s|
        # U+2028 and U+2029 are allowed inside strings in JSON (as all literal
        # Unicode characters) but JavaScript defines them as newline
        # seperators. Because no literal newlines are allowed in a string, this
        # causes a ParseError in the browser. We work around this issue by
        # replacing them with the escaped version. This should be safe because
        # according to the JSON spec, these characters are *only* valid inside
        # a string and should therefore not be present any other places.
        body << s.to_s.gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')
      end

Comments from Reviewable

justin808 · 2016-05-06T08:39:02Z

@mariusandra Please see my last comment and rebase on top of master after squashing your commits.

justin808 · 2016-05-09T08:02:50Z

@mariusandra Let's get this one rebased along with a changelog entry, and I'll get this into our upcoming 6.0 release.

mariusandra · 2016-05-09T08:10:11Z

Hey, it'll be done before the end of the day. Sorry for the wait :)

coveralls · 2016-05-09T17:42:25Z

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

coveralls · 2016-05-09T17:47:15Z

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

coveralls · 2016-05-09T17:50:58Z

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

mariusandra · 2016-05-09T17:52:21Z

Hi, I have added a comment in the source file and a changelog entry.

About the speed of 2 gsubs, I'm not sure, but I believe so. Since we're replacing one character with 6, we can't rely on the fast String.tr or String.delete this time. The only other way I can think of is one regexp gsub:

"he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')

When benchmarking, it's a bit slower for this simple string:

n = 1000000
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; "he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; "he\u2028llo \u2029world".gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

Results:

       user     system      total        real
double gsub  5.010000   0.110000   5.120000 (  5.135415)
regexp gsub  5.350000   0.030000   5.380000 (  5.389713)

A lot slower with a long string with many occurrences:

n = 10000 # smaller n
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

results:

       user     system      total        real
double gsub  1.740000   0.080000   1.820000 (  1.824301)
regexp gsub  2.810000   0.020000   2.830000 (  2.832139)

And a lot slower in this case:

n = 100000
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

results:

       user     system      total        real
double gsub  0.970000   0.110000   1.080000 (  1.071568)
regexp gsub  3.390000   0.040000   3.430000 (  3.446440)

so I would stick with it, unless someone knows of a better and faster way.

Plus, knowing that it's good enough for RACK should be comforting.

justin808 · 2016-05-10T05:34:46Z

Thanks for all the hard work on this one, especially by doing the profiling.

Previously, mariusandra wrote…

Hi, I have added a comment in the source file and a changelog entry.

About the speed of 2 gsubs, I'm not sure, but I believe so. Since we're replacing one character with 6, we can't rely on the fast String.tr or String.delete this time. The only other way I can think of is one regexp gsub:

"he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')

When benchmarking, it's a bit slower for this simple string:

n = 1000000

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; "he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; "he\u2028llo \u2029world".gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

Results:


       user     system      total        real

double gsub  5.010000   0.110000   5.120000 (  5.135415)

regexp gsub  5.350000   0.030000   5.380000 (  5.389713)

A lot slower with a long string with many occurrences:

n = 10000 # smaller n

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

results:


       user     system      total        real

double gsub  1.740000   0.080000   1.820000 (  1.824301)

regexp gsub  2.810000   0.020000   2.830000 (  2.832139)

And a lot slower in this case:

n = 100000

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

results:


       user     system      total        real

double gsub  0.970000   0.110000   1.080000 (  1.071568)

regexp gsub  3.390000   0.040000   3.430000 (  3.446440)

so I would stick with it, unless someone knows of a better and faster way.

Plus, knowing that it's good enough for RACK should be comforting.

Reviewed 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 1 unresolved discussion.

Comments from Reviewable

justin808 reviewed Apr 8, 2016
View reviewed changes

mariusandra force-pushed the master branch from 2199bc8 to 4a85397 Compare April 21, 2016 07:39

mariusandra force-pushed the master branch from 4a85397 to 59ce7d0 Compare April 27, 2016 06:36

justin808 added the ready to merge label May 6, 2016

mariusandra force-pushed the master branch from 6f80c78 to 1a879d8 Compare May 9, 2016 17:28

escape \u2028 and \u2029 to compensate for JSON and JS difference

99ab86f

mariusandra force-pushed the master branch from 1a879d8 to 99ab86f Compare May 9, 2016 17:29

changelog for \u2028 and \u2029 bug

837ca64

justin808 merged commit 56c0c8d into shakacode:master May 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

\u2028 and \u2029 break rendering #375

\u2028 and \u2029 break rendering #375

mariusandra commented Apr 7, 2016

justin808 Apr 8, 2016

justin808 commented Apr 8, 2016

justin808 commented Apr 20, 2016

mariusandra commented Apr 20, 2016

coveralls commented Apr 21, 2016

mariusandra commented Apr 21, 2016

mariusandra commented Apr 21, 2016

justin808 commented Apr 27, 2016

mariusandra commented Apr 27, 2016

coveralls commented Apr 27, 2016

justin808 commented Apr 27, 2016

justin808 commented May 6, 2016

justin808 commented May 9, 2016

mariusandra commented May 9, 2016

coveralls commented May 9, 2016

coveralls commented May 9, 2016

coveralls commented May 9, 2016

mariusandra commented May 9, 2016 •

edited

Loading

justin808 commented May 10, 2016

\u2028 and \u2029 break rendering #375

\u2028 and \u2029 break rendering #375

Conversation

mariusandra commented Apr 7, 2016

justin808 Apr 8, 2016

Choose a reason for hiding this comment

justin808 commented Apr 8, 2016

justin808 commented Apr 20, 2016

mariusandra commented Apr 20, 2016

coveralls commented Apr 21, 2016

mariusandra commented Apr 21, 2016

mariusandra commented Apr 21, 2016

justin808 commented Apr 27, 2016

mariusandra commented Apr 27, 2016

coveralls commented Apr 27, 2016

justin808 commented Apr 27, 2016

justin808 commented May 6, 2016

justin808 commented May 9, 2016

mariusandra commented May 9, 2016

coveralls commented May 9, 2016

coveralls commented May 9, 2016

coveralls commented May 9, 2016

mariusandra commented May 9, 2016 • edited Loading

justin808 commented May 10, 2016

mariusandra commented May 9, 2016 •

edited

Loading