Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack MUST ignore and override user locale variables #4859

Closed
fare opened this issue Jun 12, 2019 · 7 comments
Closed

Stack MUST ignore and override user locale variables #4859

fare opened this issue Jun 12, 2019 · 7 comments

Comments

@fare
Copy link

fare commented Jun 12, 2019

General summary/comments (optional)

While building source code, compilers and build tools should ALWAYS process each and every source file using the encoding with which the file was written by its authors and released by its maintainers, and NEVER process any of those files with the locale inherited from the end-user when they introduce any discrepancy whatsoever. The proper thing to do is thus to NEVER, EVER heed the user-inherited locale variables LANG and LC_* — the very idea flies in the face of the determinism aimed at by stack. If some interactive flag allows to explicitly inherit those variables, any discrepancy in encoding should still lead to a prominent warning unless explicitly hushed.

The only imaginable defaults that make any sense for the locale are POSIX and en_US.UTF-8. The POSIX default would impose needless pain for no gain whatsoever in a day where UTF-8 is now a widely accepted and supported standard, so the only sensible and useful default is en_US.UTF-8 (I would have proposed the more neutral C.UTF-8 but it doesn't work on Darwin).

I was faced with this bug while building with stack a Haskell program that depended on language-javascript, and had a painful debug session until I found how to configure a suitable shell.nix for stack.yaml. Drilling to root causes led me to find that it's a fundamental bug in all of Nix, Cabal, Hackage and Stack. Remarkably, I fixed the very same issue in Common Lisp, where the build system ASDF assumes that all source code is UTF-8 by default, unless overridden by the library maintainers, and never ever heeding user locale. The switch was slightly painful, hounding maintainers of tens of libraries and actually pulling the switch only a year after warning everyone. The switch should be simpler for stack, as I suspect no one uses latin1, latin2, euc-jp or koi8-r anymore in any Haskell package.

See also:
https://www.snoyman.com/blog/2016/12/beware-of-readfile
agda/agda#2922
input-output-hk/cardano-sl@ed8c892

NB: I filed the same issue against nixpkgs and cabal:
NixOS/nixpkgs#63014
haskell/cabal#6076

Steps to reproduce

(unset LANG LC_ALL LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION;
stack build language-javascript )

Expected

It should compile successfully, as if I had built with LC_ALL=en_US.UTF-8

Actual

--  While building package language-javascript-0.6.0.12 using:
      /home/fare/.stack/setup-exe-cache/x86_64-linux-nix/Cabal-simple_mPHDZzAJ_2.4.0.1_ghc-8.6.5 --builddir=.stack-work/dist/x86_64-linux-nix/Cabal-2.4.0.1 build --ghc-options " -ddump-hi -ddump-to-file -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1
    Logs have been written to: /home/fare/.stack/global-project/.stack-work/logs/language-javascript-0.6.0.12.log

    Configuring language-javascript-0.6.0.12...
    Preprocessing library for language-javascript-0.6.0.12..
    happy: src/Language/JavaScript/Parser/Grammar7.y: hGetContents: invalid argument (invalid byte sequence)

Stack version

$ stack --version
1.9.3.1 x86_64 hpack-0.31.2

Method of installation

  • via nix-env from nixpkgs-unstable.
@dbaynard
Copy link
Contributor

Thanks @fare for raising this issue! As I understand, the specific bug report is that Cabal reads input with the encoding set in the locale; the general bug is that any such code will use the locale setting. Have I understood you correctly?

@fare
Copy link
Author

fare commented Jun 12, 2019

The immediate breakage was from happy rather than Cabal. But if Cabal or any other program tried to read a file using System.IO primitives that heed user locale, this will only introduce opportunities for failures and discrepancies. I should probably file the very same bug against Cabal, though.

Even if Cabal fixes this bug, stack should probably fix it on its side, too. And so should nix: the same issue applies to any build tool that aims at any reproducibility.

@fare
Copy link
Author

fare commented Jun 12, 2019

For reference, my workaround was to use a shell-file: shell.nix in my stack.yaml, where the shell.nix has this contents, where conditional compilation is for the sake of macOS:

{ghc}:
with (import <nixpkgs> {});

haskell.lib.buildStackProject {
  inherit ghc;
  name = "alacrity";
  buildInputs = [ z3 ];
  shellHook = lib.optionalString (glibcLocales != null) ''
    export LOCALE_ARCHIVE="${glibcLocales}/lib/locale/locale-archive"
  '' + ''
    export LC_ALL=C.UTF-8
  '';
}

Of course, the above only works when using Nix. A more general solution should work even without Nix, and should probably use en_US.UTF-8 instead.

@fare fare changed the title Stack MUST override user locale variables Stack MUST ignore and override user locale variables Jun 12, 2019
@fare
Copy link
Author

fare commented Jun 12, 2019

Indeed, C.UTF-8 doesn't work on darwin, whereas en_US.UTF-8 does.

@fare
Copy link
Author

fare commented Jun 12, 2019

Also, the nix guys think everything is alright with their tool. It's on stack to make utf-8 available. NixOS/nixpkgs#63014

@qrilka
Copy link
Contributor

qrilka commented Jun 17, 2019

@fare #4294 was merged quite some time ago and now we have already a release with it.

@snoyberg
Copy link
Contributor

I agree that this locale-sensitive behavior for file reading is incorrect (you referenced my blog post on it), but disagree with this being something Stack should try to modify. It can lead to even more confusing behavior if Stack is circumventing the natural behavior of the programs it is calling out to. Closing as wontfix (though the specific case for Nix does appear to already be addressed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants