-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitespace in text tokenized as IGNORABLE_WHITESPACE
in XmlReader
#241
Comments
Note on the "by the way": Just tested the behavior of When |
never be recorded as ignorable whitespace, even when parsed as separate parts. This should fix #241.
Sorry about the delay. I've just fixed it in dev. This will still parse it at separate events, but will note that the whitespace is delimited by entities and thus not (attempt to) detect whitespace (and thus not generate ignorable whitespace events). |
When I have this XML
I expect to get the following events when I iterate through it via the
XmlReader
:START_DOCUMENT
START_ELEMENT
localName="user"TEXT
text="dude "ENTITY_REF
text="&"TEXT
text=" "ENTITY_REF
text="<"TEXT
text="dudette"ENTITY_REF
text=">"END_ELEMENT
localName="user"END_DOCUMENT
However, number 5 doesn't turn up as a
TEXT
but as anIGNORABLE_WHITESPACE
.I think this is a bug, this is not an ignorable whitespace. Whitespaces between XML elements, such as
<user>abc</user> <id>1234</id>
would be ignorable.(By the way, the existence of
CDSECT
andENTITY_REF
was a pitfall (aka footgun) for me, I assumed before that the XMLReader would already have all text content, i.e. I expected there would be justTEXT
text="dude & <dudette>" and thenEND_ELEMENT
.)The text was updated successfully, but these errors were encountered: