Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce valid xml and preserve utf8 characters in attributes #588

Closed
wants to merge 3 commits into from

Conversation

mrpi
Copy link

@mrpi mrpi commented Feb 6, 2016

The XML escaping had three problems:

  • The escape sequence "&#xXX" always has to be terminated with a semicolon
  • Bit-values larger than SCHAR_MAX had been converted to "&#xFFFFFFXX"
  • The input was expected to have exactly one byte per char which is not valid for UTF8. As UTF8 is the default encoding on most modern platforms and the default in XML it seems reasonable to keep valid UTF8 encoded input untouched.

My patch is not perfect for platforms/source code working with other encodings than ASCII or UTF8. But at least the created XML should be accepted by a valid parser.
Everything else would require C++11 u8""-string literals all over the place or an xml encoder that performs character set conversion.


class XmlEncode {
public:
enum ForWhat { ForTextNodes, ForAttributes };
enum ForWhat { ForTextNodes, ForAttributes };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y u trailing whitespace

mrpi added 2 commits February 6, 2016 16:08
- Minor adjustments to mimic styling of surrounding code
  P.S.: a .clang-format file would be nice ;-)
@gatopeich
Copy link

When is this going to be in a Catch release? The issue renders JUnit output unusable in many cases...

razeh added a commit to razeh/Catch that referenced this pull request Aug 24, 2016
This commit fixes the following scenario:
  * You have a test that compares strings with embedded control
  characters.
  * The test fails.
  * You are using JUnit tests within TeamCity.

Before this commit, the JUnit report watcher fails on parsing the XML
for two reasons: the control characters are missing a semicolon at the
end, and the XML document doesn't specify that it is XML 1.1.

XML 1.0 --- what we get if we don't specify an XML version --- doesn't support embedding control characters --- see
http://stackoverflow.com/questions/404107/why-are-control-characters-illegal-in-xml
for all of the gory details.

This is based on PR catchorg#588 by @mrpi
@razeh razeh mentioned this pull request Aug 24, 2016
@philsquared
Copy link
Collaborator

There were a few issues here, and several iterations attempting to address them (thanks for all involved, btw). I'm now not entirely clear if any of it is still outstanding?
Could you please confirm one way or the other, and if anything is outstanding please restate?

@mrpi
Copy link
Author

mrpi commented Oct 14, 2016

Just tested with Catch v1.5.7.
This test:
TEST_CASE("xml compare failing") { REQUIRE(std::string(u8"普通話") == u8"北方方言"); }

produces this xml:
<TestCase name="xml compare failing"> <Expression success="false" type="REQUIRE" filename="/home/ludger/projects/liquidpp/test/render.cpp" line="23"> <Original> std::string(u8"&#xFFFFFFE6;&#xFFFFFF99;&#xFFFFFFAE;&#xFFFFFFE9;&#xFFFFFF80;&#xFFFFFF9A;&#xFFFFFFE8;&#xFFFFFFA9;&#xFFFFFFB1;") == u8"&#xFFFFFFE5;&#xFFFFFF8C;&#xFFFFFF97;&#xFFFFFFE6;&#xFFFFFF96;&#xFFFFFFB9;&#xFFFFFFE6;&#xFFFFFF96;&#xFFFFFFB9;&#xFFFFFFE8;&#xFFFFFFA8;&#xFFFFFF80;" </Original> <Expanded> "&#xFFFFFFE6;&#xFFFFFF99;&#xFFFFFFAE;&#xFFFFFFE9;&#xFFFFFF80;&#xFFFFFF9A;&#xFFFFFFE8;&#xFFFFFFA9;&#xFFFFFFB1;" == "&#xFFFFFFE5;&#xFFFFFF8C;&#xFFFFFF97;&#xFFFFFFE6;&#xFFFFFF96;&#xFFFFFFB9;&#xFFFFFFE6;&#xFFFFFF96;&#xFFFFFFB9;&#xFFFFFFE8;&#xFFFFFFA8;&#xFFFFFF80;" </Expanded> </Expression> <OverallResult success="false"/> </TestCase>

http://www.xmlvalidation.com says:

Errors in the XML document:
4: 38 Character reference "&#xFFFFFFE6" is an invalid XML character.

@horenmar horenmar added the Resolved - pending review Issue waiting for feedback from the original author label Jan 13, 2017
@horenmar
Copy link
Member

@mrpi If I run your test case against current master, it works. As long as its on Linux.

If I try it on Windows, I end up with "Warning C4566: character represented by universal-character-name '\u8A00' cannot be represented in the current code page (1252)" and invalid XML, but I blame that on wrong setting somewhere (which I am not able to find nor google properly).

@philsquared
Copy link
Collaborator

Works for me too on macOS (with Clang).

@philsquared
Copy link
Collaborator

Could anyone who originally had problems here (@mrpi?, @gatopeich?) confirm or deny whether there are still issues with this?

@mrpi
Copy link
Author

mrpi commented Feb 6, 2017

XML output looks good now!
Thanks!

@philsquared
Copy link
Collaborator

Cool - thanks!
I'm closing this now, then

@philsquared philsquared closed this Feb 6, 2017
@philsquared philsquared removed the Resolved - pending review Issue waiting for feedback from the original author label Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants