-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDATA scanning in XML not behaving properly #531
Comments
Have added a new feature in neko 'http://cyberneko.org/html/features/scanner/cdata-early-closing' version 4.6.0. You have to set this if you are parsing XHtml code because there we do not have to do this strange early closing. Hopefully there is a way to do this from antisamy. |
@spassarop - Can you research this? Neko-htmlunit v4.6.0 is included in the AntiSamy:1.7.7 we just released. |
I was not able to reproduce such output entirely. I tried adding the custom tags to the default policy and use the whole XML and also tried just scanning the CDATA. All by guessing policy and input string as it was not explicitly stated with a code example. What I do get is this kind of output regarding the CDATA section in every scan If I add the feature @rbri mentioned, the output changes to What we can do, if that matches the desired behavior, is to add a new directive that allows to set that feature in SAX and DOM parsers by policy. What I am not sure is what default value to use, probably the best would be setting it to @akshay-kr, if this description and analysis seems accurate to your needs, let us know. |
Sorry for making thinks a bit more complicated. But i found another issue in neko regarding validating of attribute names. The root cause is more or less the same like for this one - when parsing html some things are really different (and more complicated) compared to parsing Xhtml. |
Actually we are using Antisamy plugin to parse XML with content inside CDATA tag which used to work before this commit HtmlUnit/htmlunit-neko@49a31c0 was added in
htmlunit-neko
For example an XML like this,
Before this commit the result for CDATA scanning part was
<![CDATA[<div></div>]]>
but after this commit the result is<![CDATA[<div]]>]]>
We are parsing this XML, specifically the content inside CDATA and then storing it. Later when viewing we extract the content inside CDATA and render it on the web page.
Also raised an issue for same on
htmlunit-neko
repo,HtmlUnit/htmlunit-neko#125
Is this the expected behaviour going forward? Is there a way we can bring back previous behaviour for folks who maybe using the same for XML content parsing.
The text was updated successfully, but these errors were encountered: