Ensure html attachments without <tbody> elements duplicate successfully #250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
At the moment, there's a bug where some attachments are not being successfully duped when a new edition is created by the
Edition#create_draft
method. We'e received a couple (or more) Zendesk tickets regarding the issue.After some digging it's become clear that this occurs when a user inputs questionable HMTL for tables. Specifically when they don't use a
<tbody>
for the body of the table.When a user saves a HTML attachment on Whitehall it's validates it via the Govspeak::HtmlValidator.
This calls the Document#to_html method. When we pass
sanitize: true
into the options we then call theSanitize#sanitize
method from theSanitize
gem which uses thefragment
method. Here's the code:https://github.com/rgrove/sanitize/blob/main/lib/sanitize.rb#L66-L68
The
Nokogiri::HTML5.fragment
has some cool, but slightly unexpected behaviour that is outlined herehttps://makandracards.com/makandra/481802-how-to-prevent-nokogiri-from-fixing-invalid-html
Essentially though it fixes up invalid HTML. One side effect is that it can add extra linebreaks that were not there before, which is happening in this case.
By the time the HMTL is compared with and without sanitisation in the Validator, subtle whitespace changes have snuck in which causes the
==
operator to fail.While we could definitely implement a bespoke fix for this issue, we've decided to go with removing linebreaks from the html before comparing it with and without sanitisation. It's largely just noise and could definitely cause issues down the line again when further releases occur.
Trello card
https://trello.com/c/HSlot63V/623-investigate-bug-html-attachments-sometimes-dont-copy-over-to-new-drafts