Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\u2028 and \u2029 break rendering #375

Merged
merged 2 commits into from
May 10, 2016
Merged

Conversation

mariusandra
Copy link
Contributor

Hey,

I'm operating a website (https://www.apprentus.com) where we user react_on_rails with prerendering to server content to our users. Currently it's just the homepage, with other pages under development.

I started getting errors that our homepage wasn't loading for some users. Digging deeper I found it's because of the character "\u2028" that was being sent in the props under the "latest classes" section, that features user entered content.

Digging further I found this link:
https://bugs.chromium.org/p/v8/issues/detail?id=1907
... and learned these two are special characters in JS.

Try it out yourself by running this in your browser

eval('console.log("\u2028")')

If you change the number to 2027 or 2030 then it works fine.

This PR is a quick fix that solved the problem for me, although there might be a more elegant solution available.

Marius


This change is Reviewable

@@ -299,7 +299,7 @@ def server_rendered_react_component_html(props, react_component_name, dom_id,
(function() {
var railsContext = #{rails_context(server_side: true).to_json};
#{initialize_redux_stores}
var props = #{props_string(props)};
var props = #{props_string(props).gsub("\u2028", '').gsub("\u2029", '')};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Maybe we should do this in one gsub?
  • What are the 2028 and 2029 chars?

@justin808
Copy link
Member

@mariusandra If possible, please submit your project to https://github.com/shakacode/react_on_rails/blob/master/PROJECTS.md

@justin808
Copy link
Member

@alexfedoseev This looks reasonable.

@mariusandra Can you please sync up to master and make sure CI is passing.

@mariusandra
Copy link
Contributor Author

Hey, sorry for my silence up to now. I wanted to still test if with this change the rendered DOM on the server and on the client side will be identical or not. ... but I spent all my free time in the last 2 weeks preparing a long presentation for my local Ruby User Group.

I'll sync and test this within 24h. Sorry for the wait :).

And I'll submit my project to the README as well!

@coveralls
Copy link

Coverage Status

Coverage remained the same at 85.325% when pulling 4a85397 on mariusandra:master into bf834bc on shakacode:master.

@mariusandra
Copy link
Contributor Author

Hey,

First, instead of the double .gsub(), it's much faster to use .delete in this case, as demonstrated by this gist: https://gist.github.com/dominikh/208915

So instead of

props_string(props).gsub("\u2028", '').gsub("\u2029", '')

to use

props_string(props).delete("\u2028\u2029")

Then, my suspicions were correct. With the change proposed by the original PR, I got this error using the "hello world" sample app, with a name of "basreree\u2028aereter\u2029earer":

screen shot 2016-04-21 at 09 11 58

... so just removing these naughty characters in the block that gets sent to the server side rendering part is not enough. They need to be stripped out everywhere.

I tried a few different approaches, but in the end all that had to be done was to strip them from the data-props attribute in the tag that the react_component method in the react_on_rails_helper.rb outputs. So in this thing:

<div class="js-react-on-rails-component" 
     data-component-name="HelloWorldApp" 
     data-dom-id="HelloWorldApp-react-component-0" 
     data-props="{&quot;_locale&quot;:&quot;en&quot;,&quot;name&quot;:&quot;basreree
aereter
earer&quot;}" 
     data-trace="true" 
     style="display:none">
</div>

The code block above does contain the actual \u2028 and \u2029 characters in data-props! Try selecting the text like so:

ezgif-14518587

The final patch was to update ReactOnRails::ReactComponent::Options to have def props output a json string of the props, with the characters deleted.

Since this method was called 4 times, I also added a memoization.

      def props
        @props ||= options[:props] ? options[:props].to_json.delete("\u2028\u2029") : NO_PROPS
      end

Now, since this is not my codebase, I'm not sure if props outputting a json string will cause issues somewhere or not... but it seems to work so far!

@mariusandra
Copy link
Contributor Author

This does break the tests.

One option is to do JSON.parse(options[:props].to_json.delete("\u2028\u2029")), but that seems a bit wasteful to me... even though a modern JSON parser should be quite fast.

@justin808
Copy link
Member

@mariusandra Why are these characters getting into the data? Can you strip this before passing to react-on-rails?

@mariusandra
Copy link
Contributor Author

@justin808 I'm not sure where the characters come from. It was possibly some text the user copied from MS Word into a textarea... or a situation like that. In any case they were found in user entered text and that caused a 500 error instead of showing the page.

I could strip these characters before passing props to react-on-rails, but then I would have to implement this filtering at every interaction my app has with react-on-rails. It would be easier if this functionality was already in react-on-rails, possibly also saving other users hours of time tracking this bug down.

...

What the problem boils down to is best summarised with this article: http://timelessrepo.com/json-isnt-a-javascript-subset

JSON is not a subset of JavaScript. Almost, but not quite. The difference is in these two annoying characters, \u2028 and \u2029. They're OK in JSON, but not OK in a JS string.

In react-on-rails, the JSON made from props gets passed to execjs as a JS object:

var props = #{props_string(props)};

and if it contains those characters, execjs crashes and a 500 error ensures.

However it seems it's OK to pass those characters in the "data-props" attribute for client side rendering. There they are parsed as part of a JSON object (where they are allowed), not as part of JS (where they are not).

Reading the link I gave above, they propose a solution:

Luckily, the solution is simple: If we look at the JSON specification we see that the only place where a U+2028 or U+2029 can occur is in a string. Therefore we can simply replace every U+2028 with \u2028 (the escape sequence) and U+2029 with \u2029 whenever we need to send out some JSONP.

It’s already been fixed in Rack::JSONP and I encourage all frameworks or libraries that send out JSONP to do the same. It’s a one-line patch in most languages and the result is still 100% valid JSON.

My original fix was to just remove these characters from the server rendered string.
My second fix was to remove these characters from props for the server and the client.

The proper fix™ is to replace them on the server side with their escape sequence:

var props = #{props_string(props).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')};

It's the same fix they use in RACK:
https://github.com/judofyr/rack-contrib/blob/24f3da12ea7f763826d0166591a45eba2d9bfce5/lib/rack/contrib/jsonp.rb#L79

@coveralls
Copy link

Coverage Status

Coverage remained the same at 85.325% when pulling 59ce7d0 on mariusandra:master into 557cb3e on shakacode:master.

@justin808
Copy link
Member

One comment.


Reviewed 1 of 2 files at r3.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


app/helpers/react_on_rails_helper.rb, line 302 [r1] (raw file):
Let's add some detailed comments such as shown here.

Also, are we sure that 2 gsubs are the fastest way?

    # Pads the response with the appropriate callback format according to the
    # JSON-P spec/requirements.
    #
    # The Rack response spec indicates that it should be enumerable. The
    # method of combining all of the data into a single string makes sense
    # since JSON is returned as a full string.
    #
    def pad(callback, response, body = "")
      response.each do |s|
        # U+2028 and U+2029 are allowed inside strings in JSON (as all literal
        # Unicode characters) but JavaScript defines them as newline
        # seperators. Because no literal newlines are allowed in a string, this
        # causes a ParseError in the browser. We work around this issue by
        # replacing them with the escaped version. This should be safe because
        # according to the JSON spec, these characters are *only* valid inside
        # a string and should therefore not be present any other places.
        body << s.to_s.gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')
      end

Comments from Reviewable

@justin808
Copy link
Member

@mariusandra Please see my last comment and rebase on top of master after squashing your commits.

@justin808
Copy link
Member

@mariusandra Let's get this one rebased along with a changelog entry, and I'll get this into our upcoming 6.0 release.

@mariusandra
Copy link
Contributor Author

Hey, it'll be done before the end of the day. Sorry for the wait :)

@coveralls
Copy link

Coverage Status

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 82.473% when pulling 837ca64 on mariusandra:master into 9491179 on shakacode:master.

@mariusandra
Copy link
Contributor Author

mariusandra commented May 9, 2016

Hi, I have added a comment in the source file and a changelog entry.

About the speed of 2 gsubs, I'm not sure, but I believe so. Since we're replacing one character with 6, we can't rely on the fast String.tr or String.delete this time. The only other way I can think of is one regexp gsub:

"he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')

When benchmarking, it's a bit slower for this simple string:

n = 1000000
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; "he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; "he\u2028llo \u2029world".gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

Results:

       user     system      total        real
double gsub  5.010000   0.110000   5.120000 (  5.135415)
regexp gsub  5.350000   0.030000   5.380000 (  5.389713)

A lot slower with a long string with many occurrences:

n = 10000 # smaller n
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

results:

       user     system      total        real
double gsub  1.740000   0.080000   1.820000 (  1.824301)
regexp gsub  2.810000   0.020000   2.830000 (  2.832139)

And a lot slower in this case:

n = 100000
Benchmark.bm do |x|
   x.report('double gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }
   x.report('regexp gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }
end

results:

       user     system      total        real
double gsub  0.970000   0.110000   1.080000 (  1.071568)
regexp gsub  3.390000   0.040000   3.430000 (  3.446440)

so I would stick with it, unless someone knows of a better and faster way.

Plus, knowing that it's good enough for RACK should be comforting.

@justin808
Copy link
Member

:lgtm_strong: Thanks for all the hard work on this one, especially by doing the profiling.

Previously, mariusandra wrote…

Hi, I have added a comment in the source file and a changelog entry.

About the speed of 2 gsubs, I'm not sure, but I believe so. Since we're replacing one character with 6, we can't rely on the fast String.tr or String.delete this time. The only other way I can think of is one regexp gsub:

"he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')

When benchmarking, it's a bit slower for this simple string:

n = 1000000

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; "he\u2028llo \u2029world".gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; "he\u2028llo \u2029world".gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

Results:


       user     system      total        real

double gsub  5.010000   0.110000   5.120000 (  5.135415)

regexp gsub  5.350000   0.030000   5.380000 (  5.389713)

A lot slower with a long string with many occurrences:

n = 10000 # smaller n

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; ("he\u2028llo \u2029world" * 100).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

results:


       user     system      total        real

double gsub  1.740000   0.080000   1.820000 (  1.824301)

regexp gsub  2.810000   0.020000   2.830000 (  2.832139)

And a lot slower in this case:

n = 100000

Benchmark.bm do |x|

   x.report('double gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub("\u2028", '\u2028').gsub("\u2029", '\u2029'); end }

   x.report('regexp gsub') { n.times do ; (("hello" * 100) + "\u2028 \u2029" + ("world" * 100)).gsub(/[\u2028\u2029]/) {|s| '\u' + s.ord.to_s(16) }; end }

end

results:


       user     system      total        real

double gsub  0.970000   0.110000   1.080000 (  1.071568)

regexp gsub  3.390000   0.040000   3.430000 (  3.446440)

so I would stick with it, unless someone knows of a better and faster way.

Plus, knowing that it's good enough for RACK should be comforting.


Reviewed 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@justin808 justin808 merged commit 56c0c8d into shakacode:master May 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants