Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CData overwriting bug #43

Merged
merged 3 commits into from
Dec 26, 2016
Merged

Conversation

calebmer
Copy link
Contributor

@calebmer calebmer commented Dec 24, 2016

I have some XML that looks like this:

      <content:encoded>
<![CDATA[<a name="How.to.meet.someone.new"></a>
<h1>How to meet someone new</h1>

<a name="L.and.make.them.like.you."></a>
<h2>(and make them like you)</h2>

<ul>
<li>Smile</li>
<li>Be inquisitive, ask thoughtful questions</li>
<li>body language, mirroring</li>
<li>Be open to peoples POV</li>
<li>What are your Goals when meeting someone new?</li>
</ul>

]]>
      </content:encoded>

Which is valid XML. However, when parsed by xmldoc the val property for the content:encoded element is: '\n '. This is because the white space (in this case '\n ') emits a text event in the SAX parser after a cdata event has already been emit. So the text event overrides the cdata event even though isValCdata is true. This behavior seems to be very fault intolerant for what is acceptable by most other XML parsers. As another note, this behavior also makes XML that looks like <a>b<c/>d</a> very difficult to work with.

This PR changes the behavior of this library so that comments, cdata, and text may all be treated as XML nodes similarly to the XMLElement object. The val property is still maintained for convenience by appending cdata and text values, but to get comments you now have to look at the children array.

The breaking changes that occur because of this change:

  • No more isValCData or isValComment. An element can have more then one child node type.
  • Comments are no longer added to val. You can find them in children instead.
  • The children, firstChild, and lastChild properties may be a text, cdata, comment, or element object and not just an element object as they were in the past. To help filter out the objects you don’t want, use the new type property.

If you only use the convenience methods, nothing should change for you.

There are simpler ways to fix the bug I expressed above, I just feel like this is an important change to make for the stability of this library given it may cause other bugs.

Thanks so much for your work, this is a great library 👍

[Edit by @nfarina]: Closes #23, #40

@nfarina
Copy link
Owner

nfarina commented Dec 25, 2016

At first glance this looks like a really great overhaul that will make a lot of folks very happy :) I will look more in depth this week!

@nfarina
Copy link
Owner

nfarina commented Dec 25, 2016

Actually just finished reviewing. I think it's perfect. Probably should bump the major version after this though, what do you think?

@calebmer
Copy link
Contributor Author

I'd agree about the major version bump, although if you're not comfortable moving to 1.x you're still allowed to make breaking changes in the 0.x line.

@nfarina
Copy link
Owner

nfarina commented Dec 26, 2016

I think it's finally time for 1.0 - been long enough! Thanks again for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants