Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace characters in attribute values are not normalized #24

Closed
bwrrp opened this issue Oct 11, 2019 · 3 comments
Closed

Whitespace characters in attribute values are not normalized #24

bwrrp opened this issue Oct 11, 2019 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@bwrrp
Copy link

bwrrp commented Oct 11, 2019

I have an XML document containing an attribute with carriage return characters: (https://github.com/w3c/xsdtests/blob/master/msData/regex/RegexTest_63.xml)

If I read the XML spec correctly regarding attribute value normalization (https://www.w3.org/TR/xml/#AVNormalize), I think these should be normalized to a single space character. However, the following test case shows that this does not currently happen:

const p = new saxes.SaxesParser();
p.onopentag = n => console.log(JSON.stringify(n.attributes['att']));
p.write('<doc att="a\rb"/>').close();
// logs "a\rb"
@lddubeau lddubeau added the bug Something isn't working label Oct 11, 2019
@lddubeau
Copy link
Owner

That's a bug. Thank you for the informative report. I've been working on 4.0.0 these days (rc2 is out for trial, rc3 is being worked on), which fixes a slew of problems with EOL handling. I'm going to squeeze this in (probably in rc3). This issue was in sax, and I did not detect the issue when I forked saxes from sax.

I recall looking at that section of the spec before, but I believe that was when I was working on salve as a consumer of parsing output rather than a producer of that output.

@lddubeau lddubeau self-assigned this Oct 11, 2019
@lddubeau
Copy link
Owner

lddubeau commented Oct 11, 2019

I've just pushed out rc4 which contains a fix. It would be good if you could try it. (rc3 was a misfire... ☹️ )

(You'll have to install saxes@next to get rc4.)

@bwrrp
Copy link
Author

bwrrp commented Oct 11, 2019

That was fast! I can confirm that each \r has now been turned into a space. The XML file linked in the original issue does not contain a doctype, which is probably why these three spaces are not further combined into one, so this seems to be working as expected. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants