Skip to content

Commit

Permalink
AO3-5826 Update nokogiri to 1.10.5 (#3697)
Browse files Browse the repository at this point in the history
* [Security] Bump nokogiri from 1.10.4 to 1.10.5

Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.4 to 1.10.5. **This update includes a security fix.**
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md)
- [Commits](sparklemotion/nokogiri@v1.10.4...v1.10.5)

Signed-off-by: dependabot-preview[bot] <[email protected]>

* Remove code for closing uncode tags that relied on an error message that no longer exists in libxml2

* Test that shows contents of third paragraph are not as expected

* Remove code for closing uncode tags that relied on an error message that no longer exists in libxml2

* ACTUALLY the test that shows the contents of the third paragraph

* Third time is the charm showing the useful test failure

* Write a passing test and add a comment explaining it does not really resemble reality
  • Loading branch information
sarken authored and redsummernight committed Nov 28, 2019
1 parent cccba97 commit 3eb3371
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 12 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ GEM
netrc (0.11.0)
newrelic_rpm (6.4.0.356)
nio4r (2.3.1)
nokogiri (1.10.4)
nokogiri (1.10.5)
mini_portile2 (~> 2.4.0)
nokogumbo (1.4.9)
nokogiri
Expand Down
7 changes: 0 additions & 7 deletions lib/html_cleaner.rb
Original file line number Diff line number Diff line change
Expand Up @@ -354,15 +354,8 @@ def close_unclosed_tag(text, tag, line_number)
end

def add_paragraphs_to_text(text)
# By default, Nokogiri closes unclosed tags very late, often at
# the end of the document. We want runaway tags closed at the end
# of the line
doc = Nokogiri::XML.parse("<myroot>#{text}</myroot>")
doc.errors.each do |error|
match = error.message.match(/Premature end of data in tag (\w+) line (\d+)/)

text = close_unclosed_tag(text, match[1], match[2]) if match

match = error.message.match(/Opening and ending tag mismatch: (\w+) line (\d+) and myroot/)
text = close_unclosed_tag(text, match[1], match[2]) if match
end
Expand Down
16 changes: 12 additions & 4 deletions spec/miscellaneous/lib/html_cleaner_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -845,7 +845,15 @@
expect(doc.xpath("./p[contains(@class, 'bar')]").children.to_s.strip).to eq("foobar")
end

it "should close unclosed inline tags before double linebreak" do
# When we call add_paragraphs_to_text, everything gets wrapped inside myroot
# tags, and the closing myroot tag is treated as a mismatch for strong, ergo
# strong is closed on the second paragraph while the em tag remains open.
# In real world use, however, this content would most likely be run through
# Sanitize.clean first, which would close both the em and strong tags at the
# very end, so we wouldn't have a mismatch and the strong tag would be
# reopened in every paragraph, just like the em tag is. More info at:
# https://github.com/otwcode/otwarchive/pull/3692#issuecomment-558740913
it "should close mismatched tags" do
html = """Here is an unclosed <em>em tag.
Here is an unclosed <strong>strong tag.
Expand All @@ -854,11 +862,11 @@

doc = Nokogiri::HTML.fragment(add_paragraphs_to_text(html))
expect(doc.xpath("./p[1]/em").children.to_s.strip).to eq("em tag.")
expect(doc.xpath("./p[2]/strong").children.to_s.strip).to eq("strong tag.")
expect(doc.xpath("./p[3]").children.to_s.strip).to eq("Stuff.")
expect(doc.xpath("./p[2]/em/strong").children.to_s.strip).to eq("strong tag.")
expect(doc.xpath("./p[3]/em").children.to_s.strip).to eq("Stuff.")
end

it "should close unclosed tag withing other tag" do
it "should close unclosed tag within other tag" do
pending "Opened bug report with Nokogiri"
html = "<strong><em>unclosed</strong>"
doc = Nokogiri::HTML.fragment(add_paragraphs_to_text(html))
Expand Down

0 comments on commit 3eb3371

Please sign in to comment.