Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gherkin: (C) Fix Windows related issues #193

Merged
merged 4 commits into from
May 1, 2017

Conversation

brasmusson
Copy link
Contributor

@brasmusson brasmusson commented May 1, 2017

Summary

Fix Windows related issues:

In addition make the code conform to C90.

Details

When looking into #168, a set of problem were found and fixed.

  • Use UTF-16 internally when the wchar_t is 2 bytes in size.
    Most Windows compilers use 2 bytes for wide characters (wchar_t), that mean the not all unicode code points can be represented by a wide character. To encode code points from the Supplementary Planes UTF-16 are used (using two wide characters for each code point). The decoding of UTF-8 (which is done directly in the code) need to use UTF-16 in the internal wide character representation when necessary.
  • Always output UTF-8 encoded data (fixes Gherkin C tests fail on Wine #168).
    Windows generally does not use UTF-8, even when using unicode it is most likely UTF-16 and not UTF-8. Therefore it seems best to always do the UTF-8 encoding directly in the code, instead of relying on print functions form the complier libraries.
  • Avoid swprintf
    Many (older) Windows compilers use a different signature for swprintf than the ISO standard (excluding the count argument in the signature), therefore it is better to avoid using it and instead calculate the string representation of the error location directly (which was the only use of swprintf).
    Another reason to avoid swprintf is that it did not exist in the ISO standard until C99.

In addition make the code conform to C90.
After the use of swprintf was removed, only the use of // for some commented out code was only thing that inhibit the compilation of the code with GCC (>v4.6) using -std=c90 (or -ansi), so the use of // was removed.

Interestingly enough using -std=c90` with GCC (>v4.6) did not complain on repeated typedefs even though they are not allowed until C11. GCC (<v4.6) on the other hand does not allow them at all, so they are removed.

Motivation and Context

Fixes #168.

How Has This Been Tested?

On OS name: "linux", version: "4.8.0-36-generic", arch: "amd64", family: "unix" (ubuntu 16.04) using:

  • gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  • clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
  • gcc (Alpine 6.2.1) 6.2.1 20160822 (docker)

On OS name: "linux", version: "4.4.0-72-generic", arch: "i386", family: "unix" (ubuntu 16.04) using:

  • gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  • clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)

On OS name: "windows 7", version: "6.1", arch: "amd64", family: "windows" using:

  • x86_64-w64-mingw32-gcc (GCC) 5.4.0
  • x86_64-pc-cygwin-gcc (GCC) 5.4.0

On OS name: "windows xp", version: "5.1", arch: "x86", family: "windows" using:

  • i686-pc-mingw32-gcc (GCC) 4.5.2
  • i686-pc-cygwin-gcc (GCC) 4.7.3

Types of changes

  • Bug fix (non-breaking change which fixes an issue).
  • New feature (non-breaking change which adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).

Checklist:

  • I've added tests for my code (tested by the acceptance tests).
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

On Windows wchar_t is 2 bytes large, and use UTF-16. This means that
for the case of code points > 0xFFFF (and wchar_t is only 2 bytes
large), the code point read from the UTF-8 source need to be converted
to two UTF-16 surrogates (wchar_t wide characters).
Windows does not use UTF-8 by default. To make sure that UTF-8 is used
in the output, do manual conversion from unicode/UTF-16 to an UTF-8
byte sequence for output. It seems safer that to try to add Windows
specific code to set UTF-8 output on that platform.
To support a wider range of compilers, remove the C90 breaches so that
the code compiles also with -ansi and/or -std=c90 (using gcc).
Another reason to not use swprintf is that on many Windows compilers
swprintf has a different signature than the ISO standard specification.
When cleaning up commented out code in gherkin_generate_tokens.c (to
remove the usage of // for comments), settle to use the
FileTokenScanner to get coverage of it in the acceptance tests.

The code does not compile with -std=c90 -pedantic (using gcc), the
conformance to C90 is not taken that far.
Technically C does not support repeated typedefs until C11, however
modern compilers do allow it (gcc also using -std=c90). But since
gcc <v4.6 does not support that, remove the usage of them.
@aslakhellesoy aslakhellesoy merged commit 6cdd13e into master May 1, 2017
@brasmusson brasmusson deleted the gherkin-c-2-byte-wchar-t branch May 1, 2017 18:17
aslakhellesoy added a commit that referenced this pull request May 1, 2017
@lock
Copy link

lock bot commented Oct 24, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gherkin C tests fail on Wine
2 participants