Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power failure during stack build results in corruption to global Stack directory #3248

Open
rdnetto opened this issue Jul 5, 2017 · 9 comments

Comments

@rdnetto
Copy link
Contributor

rdnetto commented Jul 5, 2017

General summary/comments (optional)

If a power failure occurs while stack build is compiling a library, the incomplete files are stored in ~/.stack but the package is marked as compiled. This means that any attempt to compile projects depending on it will fail.

Output when trying to compile a project using the library:

    /home/reuben/project_sigma/src/Main.hs:4:1: error:
        Bad interface file: /home/reuben/.stack/snapshots/x86_64-linux-tinfo6-nopie/lts-8.21/8.0.2/lib/x86_64-linux-ghc-8.0.2/conduit-combinators-1.1.1-BCs1IpMatSe9TKW4okLeA/Conduit.hi
            Data.Binary.getWord8: end of file

    /home/reuben/project_sigma/src/Main.hs:7:1: error:
        Failed to load interface for ‘Data.Conduit.CSV’
        Perhaps you meant
          Data.Conduit (from conduit-1.2.11)
          Data.Conduit.Lazy (from conduit-extra-1.1.16)
          Data.Conduit.Lift (from conduit-1.2.11)
        Use -v to see a list of the files searched for.

    --  While building package sigma-0.1.0.0 using:
          /home/reuben/.stack/setup-exe-cache/x86_64-linux-tinfo6-nopie/Cabal-simple_mPHDZzAJ_1.24.2.0_ghc-8.0.2 --builddir=.stack-work/dist/x86_64-linux-tinfo6-nopie/Cabal-1.24.2.0 build exe:sigma --ghc-options " -ddump-hi -ddump-to-file"
        Process exited with code: ExitFailure 1
  • stack build conduit-combinators is a no-op, because Stack thinks the library is already compiled
  • deleting the directory for conduit-combinators doesn't fix the problem, only changes the error.
  • stack exec ghc-pkg check does not detect the problem (unless you delete the directory)

Workaround

The simplest solution is to simply delete ~/.stack, but that requires an expensive rebuild.

A more efficient workaround is to run stack exec ghc-pkg unregister conduit-combinators-1.1.1, followed by stack build. This needs to be repeated for each package that demonstrates the error.

Expected behaviour

  • the packages should not actually be marked as having completed compilation until the output files have been synced to disk
  • it would help if stack had a command that could verify the integrity of compiled libraries (e.g. a checksum) and recompile those which failed the integrity check

Stack version

$ stack --version
Version 1.4.0, Git revision e714f1dd3fade19496d91bd6a017e435a96a6bcd (4640 commits) x86_64 hpack-0.17.0

Running on Sabayon Linux amd64 on btrfs, using kernel 4.9.30.

Method of installation

  • compiled from source
@Blaisorblade
Copy link
Collaborator

This is really unfortunate, I've had similar enough problems to understand the annoyance.

If a power failure occurs while stack build is compiling a library, the incomplete files are stored in ~/.stack but the package is marked as compiled.

the packages should not actually be marked as having completed compilation until the output files have been synced to disk

Agreed, to the extent possible. We should mark the package as compiled only after it's built—but without a transactional filesystem, it's not obvious the order of operations will be preserved.

There might be an atomic API which doesn't require syncing the whole disk, but I'm not sure which. I used to think rename was atomic on POSIX, but IIRC https://lwn.net/Articles/322823/ suggests things are more complicated. Haven't had time to read that again, and I'm not sure of btrfs's guarantees. Let alone other Windows, OS X, and other *nixes.

Also, building out of ~/.stack and moving outputs into place might help (assuming this is atomic enough), but it might not be easy, since the target installation path is an input to the compilation—I forget when and where you can move files.

it would help if stack had a command that could verify the integrity of compiled libraries (e.g. a checksum) and recompile those which failed the integrity check

Details are tricky. What's an easy way to verify the integrity of some binary output, without having to write new code to parse binary files? If your analysis is correct, a checksum won't help here. It will only help if Stack fully compiles and checksums the library originally and them something else modifies it — but Stack never modifies output.

Also, to some extent this is probably GHC's fault. Press Ctrl-C while building the local project and you can get in a similar situation—it's enough that one of the .hi or .o file is missing, GHC won't try to rebuild it (and will give similar errors). Yes, GHC can't deal with that! But at least there you just have to clean the package you're building.

@garry-cairns
Copy link

Just to add to this — I frequently get emergency shutdowns for overheating on my laptop when compiling things with stack. That also results in this behaviour. Some means of limiting the amount of power available to stack/ghc would be useful I think. I suppose one could do it by compiling from a docker container.

@mgsloan
Copy link
Contributor

mgsloan commented Jun 15, 2018

I agree with @Blaisorblade 's analysis that this is likely an issue with upstream tools that stack invokes. As far as I know, all of stack's own file IO is resistant to this sort of problem, since all it writes are caches that get discareded if they are unreadable.

@Blaisorblade
Copy link
Collaborator

@garry-cairns That's unfortunate—you might want to disable parallel builds via -j1. Otherwise, stack can do nothing about it.

@garry-cairns
Copy link

@Blaisorblade thanks I'll try that

@garry-cairns
Copy link

@Blaisorblade Actually the best strategy seems to be to keep the parallel builds but manually cancel the build every 15-20 points in the progress tracker and let the system cool down.

@blipvert
Copy link

it would help if stack had a command that could verify the integrity of compiled libraries (e.g. a checksum) and recompile those which failed the integrity check

Details are tricky. What's an easy way to verify the integrity of some binary output, without having to write new code to parse binary files? If your analysis is correct, a checksum won't help here. It will only help if Stack fully compiles and checksums the library originally and them something else modifies it

Not true at all. The critical window of failure occurs after the binary file has been closed, but not yet fully synced to disk. During this time, a coherent view of the file's entire content is nevertheless available to readers, and if a checksum were then to be calculated, it would represent what data is supposed to be there after a successful filesystem sync. In the case where some unsynced data did not survive a power failure, the checksum will either be missing or it will differ from that of the binary file on disk; in either case, it will provide a strong indication that the package should be rebuilt.

@Blaisorblade
Copy link
Collaborator

@blipvert Fair enough, if that's indeed what's happening (not that I can imagine anything else).

@nh2
Copy link
Collaborator

nh2 commented Apr 8, 2019

We've done some work in the direction of fixing this in GHC and stack, see #4559.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants