-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cabal config corrupted when using Unicode #2557
Comments
Looks like we just need to let GHC handle the text encoding for us, rather than going through ByteString. |
Once this is fixed, we should also add a regression test. |
@ttuegel you self-assigned this bug 2 years ago... did you make any progress? Do you mind if I try giving this one a shot myself? ;-) |
@hvr Have at it! I think at the time I had some reason to believe this would be simple to fix, but I never got to it. |
The config-state header is a human readable line prepended to the binary serialisation which looks like Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6 However, the functions generating and parsing this header didn't take into account that package names are not limited to the ASCII subset and blindly used the ByteString `pack` function which truncates away the high bits of the `Char` code point resulting in a corrupted header with a non-sensical package-name. The fix is simply to serialise the package-name with the UTF-8 encoding which works nicely with the rest of the UTF-8 unaware string handling functions. Hence the fix is a lot shorter than this commit message. Fixes haskell#2557
…encoding This takes care of knock-off effects of haskell#2557 Specifically, the `Paths_*.hs` and `cabal_macros.h` files would result being incorrectly by a `rewriteFileEx` which isn't UTF-8 capable. Now the `cabal_macros.h` file is written out exactly like the `.h` file generated internally by `ghc` is generated; note however that standard CPP doesn't support non-ASCII characters in CPP symbols and will thus not work with a standard CPP preprocessor.
The config-state header is a human readable line prepended to the binary serialisation which looks like Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6 However, the functions generating and parsing this header didn't take into account that package names are not limited to the ASCII subset and blindly used the ByteString `pack` function which truncates away the high bits of the `Char` code point resulting in a corrupted header with a non-sensical package-name. The fix is simply to serialise the package-name with the UTF-8 encoding which works nicely with the rest of the UTF-8 unaware string handling functions. Hence the fix is a lot shorter than this commit message. Fixes haskell#2557
…encoding This takes care of knock-off effects of haskell#2557 Specifically, the `Paths_*.hs` and `cabal_macros.h` files would result being incorrectly by a `rewriteFileEx` which isn't UTF-8 capable. Now the `cabal_macros.h` file is written out exactly like the `.h` file generated internally by `ghc` is generated; note however that standard CPP doesn't support non-ASCII characters in CPP symbols and will thus not work with a standard CPP preprocessor.
The config-state header is a human readable line prepended to the binary serialisation which looks like Saved package config for pkgname-1.2.3 written by Cabal-2.5.0.0 using ghc-8.6 However, the functions generating and parsing this header didn't take into account that package names are not limited to the ASCII subset and blindly used the ByteString `pack` function which truncates away the high bits of the `Char` code point resulting in a corrupted header with a non-sensical package-name. The fix is simply to serialise the package-name with the UTF-8 encoding which works nicely with the rest of the UTF-8 unaware string handling functions. Hence the fix is a lot shorter than this commit message. Fixes haskell#2557
…encoding This takes care of knock-off effects of haskell#2557 Specifically, the `Paths_*.hs` and `cabal_macros.h` files would result being incorrectly by a `rewriteFileEx` which isn't UTF-8 capable. Now the `cabal_macros.h` file is written out exactly like the `.h` file generated internally by `ghc` is generated; note however that standard CPP doesn't support non-ASCII characters in CPP symbols and will thus not work with a standard CPP preprocessor.
The text was updated successfully, but these errors were encountered: