Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop adding whitespace in html that is not in the source document #165

Merged
merged 2 commits into from
Mar 25, 2018

Conversation

Zegnat
Copy link
Member

@Zegnat Zegnat commented Mar 25, 2018

This seems to be an under-documented feature of DOMDocument::saveHTML. It may sometimes add a \n at the end of its output. So when you just concatenate the string outputs of this method you may be introducing line breaks that weren’t in the original source.

I think adding a trim of some sort is wrong, as you might then also be trimming Text nodes that actually should contain the line break.

Instead what I have found to fix this is to move all the nodes into a DocumentFragment and retrieving the HTML of this fragment in one go.


Prior to this fix, the parser returns the following, note the \n in ["properties"]["content"][0]["html"]:

<div class="h-entry"><div class="e-content"><p>1</p><p>2</p></div></div>
{
  "type": [ "h-entry" ],
  "properties": {
    "content": [ {
      "html": "<p>1</p>\n<p>2</p>",
      "value": "1 2"
    } ]
  }
}

@aaronpk aaronpk added this to the 0.4.2 milestone Mar 25, 2018
@aaronpk aaronpk merged commit 5eeef8b into microformats:master Mar 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants