Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for hi-file-parser #5134

Open
bgamari opened this issue Dec 28, 2019 · 43 comments
Open

Remove the need for hi-file-parser #5134

bgamari opened this issue Dec 28, 2019 · 43 comments

Comments

@bgamari
Copy link

bgamari commented Dec 28, 2019

After an extended debugging session with @simonmichael it came to my attention that Stack directly parses GHC's interface files via the hi-file-parser library. This is an unfortunate state of affairs. Interface files are interface to GHC and consequently come with no particular stability guarantees.

To make everyone's life easier, it would be better if we can find another way to expose the information that you need from interface files. Can you describe what information Stack needs that isn't currently available from other means? Also note that we have an open GHC proposal, the extended dependency information proposal, which may be helpful. We would love to have feedback on this.

I have opened GHC #17620 to track the GHC side of this.

@simonmichael
Copy link
Contributor

simonmichael commented Dec 28, 2019

The symptom that I was seeing is this: when building with a GHC 8.10 alpha, eg using a stack file like this, you see a lot of "Failed to decode" warnings, one per .hi file. Eg:

~/src/hledger$ stack --stack-yaml=stack-ghc8.10.yaml build --fast hledger-lib
Stack has not been tested with GHC versions above 8.6, and using 8.10.0.20191210, this may fail
Stack has not been tested with Cabal versions above 2.4, but version 3.1.0.0 was found, this may fail

Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Text/Megaparsec/Custom.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Hledger.hi Decoding failure:
         Invalid magic: e49ceb0f
...
Warning: Failed to decode module interface: /Users/simon/src/PLAINTEXTACCOUNTING/hledger/hledger-lib/.stack-work/dist/x86_64-osx/Cabal-3.1.0.0/build/Text/Tabular/AsciiWide.hi Decoding failure:
         Invalid magic: e49ceb0f
WARNING: Ignoring hledger-lib's bounds on base (>=4.9 && <4.14); using base-4.14.0.0.
Reason: allow-newer enabled.
~/src/hledger$

Observed on mac catalina and on ubuntu 18.04.3, using stack 2.1.3. I wrongly assumed these were coming from the new element in the toolchain (GHC), so I went and bothered #ghc.

@qrilka
Copy link
Contributor

qrilka commented Dec 29, 2019

@haitlahcen probably you could add your 5 cents here as the author of #4545 ?

@hussein-aitlahcen
Copy link

hussein-aitlahcen commented Dec 29, 2019

@qrilka

The problem was that Stack was trying to extract modules dependencies from the dumped .txt hi file (which was obviously not a machine readable format, causing various issues). We reversed the interface format (not documented by GHC...) to directly read the binary file.

I believe that the proposal @bgamari linked is pretty much what was missing from GHC side to avoid doing what we have done here. Thanks for having noticed the link. I guess that we will be able to extract everything for the generated format as it seems to contain all the informations we need (@snoyberg confirmation would be great here).

Finally, removing the dependency to hi-file-parser is (probably) just a matter of updating this function to directly extract the dependencies from the GHC's generated machine readable file.

@bgamari
Copy link
Author

bgamari commented Dec 29, 2019

If all you need is intra-package module dependencies then it's also possible that GHC's -M mode will suffice. This produces Makefile-style output which encodes the dependency graph. The extended dependency proposal is intended to provide a more build-system agnostic format with information that the -M output currently lacks (e.g. inter-module dependencies, required language extensions, etc.)

@hussein-aitlahcen, could you describe precisely which information you extract from the interface file? I see mentions of dependency file names in the implementation but it's hard to say precisely what kind of dependencies we are talking about here (e.g. native CPP dependencies, Haskell modules within a package, inter-package Haskell dependencies, etc.).

@hussein-aitlahcen
Copy link

hussein-aitlahcen commented Dec 29, 2019

@bgamari I believe it is intra-package module deps with native TH/CPP dependencies, see the previous version of the parsing where we extract addDependentFile and the module dependencies. I've an example of the dumped hi file we were parsing here given this kind of source.

@bgamari
Copy link
Author

bgamari commented Dec 29, 2019

Alright, it would be appreciated if you could engage on the GHC proposal, raising these points so we have a record of precisely what we are designing towards.

@DestyNova
Copy link

Is there a workaround for this? I'm getting errors like this when trying build Liquid Haskell:

Warning: Failed to decode module interface:
         /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/Language/Haskell/Liquid/UX/Tidy.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface:
         /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/Language/Haskell/Liquid/WiredIn.hi Decoding failure:
         Invalid magic: e49ceb0f

Warning: Failed to decode module interface: /home/omf/code/haskell/liquid/liquidhaskell/.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0/build/LiquidHaskell.hi
         Decoding failure: Invalid magic: e49ceb0f

--  While building package liquidhaskell-0.8.6.0 using:
      /home/omf/.stack/setup-exe-cache/x86_64-linux-tinfo6/Cabal-simple_mPHDZzAJ_3.2.0.0_ghc-8.10.1 --builddir=.stack-work/dist/x86_64-linux-tinfo6/Cabal-3.2.0.0 build lib:liquidhaskell exe:liquid --ghc-options " -fdiagnostics-color=always"
    Process exited with code: ExitFailure 1

I tried specifying a different resolver but it wasn't happy (I'm not sure which one to pick).

@snoyberg
Copy link
Contributor

There is no workaround right now. The best solution is to improve hi-file-parser to support the new binary format.

@hsyl20
Copy link
Contributor

hsyl20 commented May 18, 2020

The interface magic change was a mistake (see https://gitlab.haskell.org/ghc/ghc/issues/18180) but hi-file-parser needs to support S/ULEB128 decoding anyway.

@AndreasPK
Copy link
Contributor

I was pointed to this ticket recently. To summarize the current state of affairs from the GHC side:

Going forward the plan for GHC itself (starting with 8.10.2) is to:

  • Skip the dummy field in the future.
  • Encode the magic numbers as fixed width.

As a workaround for 8.10.2 hi-file-parser will need to:

  • Decode numbers as GHC does (using LEB128).
  • Account for the removal of the dummy field.

I regret having broken this for stack. But ghc can't reasonably guarantee that the iface format won't change again in the future. There has been discussion about trying to apply compression to them as well as encoding certain values as sub-byte sequences of bits. If any of those turn out to be beneficial the format might change again radically in 8.14.

@bgamari
Copy link
Author

bgamari commented May 18, 2020

I would reiterate that it would be helpful if stack could comment on the extended dependency generation proposal so we can confirm that it is indeed sufficient to solve the underlying issue here. It would be great to move stack off of hi-file-parser.

@snoyberg
Copy link
Contributor

One of the reasons we have hi-file-parser is because processing the text based hi-dump files caused massive slowdown both from GHC and Stack. I don't know if that problem would reemerge with JSON. We would also still need hi-file-parser for earlier versions of GHC. Finally, I don't know how Cabal would expose access to these files.

Most likely we'd be able to move over to parsing the new format, but I'm not certain.

@snoyberg
Copy link
Contributor

@AndreasPK no concerns from my side. Changes in the hi binary file format is absolutely expected, we just need to update hi-file-parser to handle it. My only request would be if you know the format is changing and you remember to give a ping on that repo. But as you can see, we'll notice pretty quickly due to the warnings.

We had these problems before with the hi-dump format too, and moved to explicit and scary-looking warnings just to make sure it didn't silently get ignored.

That library supports different parsing for different GHC versions, so updating as you mention ideally won't be too difficult.

@snoyberg
Copy link
Contributor

Looking at the code on this just a bit: it doesn't seem like it's really possible to continue using the hi-file-parser approach based on the changes you're planning. Ideally there would be a reliable way to tell which version of the format is being parsed immediately. However, if the magic number is changing back, we have no way of telling whether it's the new variable length encoding or fixed length encoding in the future.

@haitlahcen do you have any thoughts on all of this?

For context of the original change to use hi-file-parser, see #4545.

@minad
Copy link

minad commented Jun 18, 2020

What is the status on this? I just upgraded to stackage nightly with GHC 8.10 and seeing a lot of "invalid magic" warnings. Why has stackage moved to 8.10 if it cannot cope with the 8.10 hi files yet - or maybe this is not a real problem besides printing a warning?

@simonmichael
Copy link
Contributor

It's harmless, for now at least; but annoying/alarming for users. From snoyberg's last comment it sounds like the version of hi files format needs to be managed and exposed a bit more clearly on the GHC side.

@tfausak
Copy link
Contributor

tfausak commented Jun 19, 2020

Is there a way to ignore just this warning? Using GHC 8.10 with Stack outputs a lot of noise to my terminal. I want all of the warnings from Stack and GHC in general, but I'd like to disable this one because it's not actionable for me and everything seems to work regardless.

@etorreborre
Copy link

Same concern here :-)

@snoyberg
Copy link
Contributor

I think I have a good solution to this warning issue. We want the warnings to appear to Stack devs, but no one else. What I think I'm going to do is set up a new config flag for "developer-mode" or something like that. The default for the official binaries will be to set it to off. The default when you build from Git will be for it to be on. When the flag is on, warnings like this will be set to WARN level. When the flag is off, these messages will be DEBUG level. That will allow Stack developers to hear about these issues, and users to ignore them.

@bgamari I don't know what other comments I can provide to you on the GHC proposal. From what I can see, it's not addressing the concerns that originally led to hi-file-parser to be written. I'd also rather not have this separate library, but it doesn't seem like we have any choice given the lack of a consistent, multi-GHC-version interface for determining TH dependent files.

snoyberg added a commit that referenced this issue Jun 21, 2020
@tfausak
Copy link
Contributor

tfausak commented Jun 21, 2020

Sounds good to me @snoyberg! I'm not a Stack developer, but the code in #5325 looks good to me.

snoyberg added a commit that referenced this issue Jun 22, 2020
snoyberg added a commit that referenced this issue Jun 23, 2020
snoyberg added a commit that referenced this issue Jun 24, 2020
snoyberg added a commit that referenced this issue Jun 24, 2020
@tonyday567
Copy link

In the meantime, for those going blind with ghc-8.10 development:

stack build --test --file-watch 2>&1 | sed '/^Warning:/,/Invalid magic: e49ceb0f$/d'

@Dessix
Copy link

Dessix commented Jul 11, 2020

If you like a bit of color in your terminal, Tony's command can be done this way:
stack build --test --file-watch --color=always 2>&1 | sed -r '/^(\x1b\[[0-9;]*m)?Warning:(\x1b\[[0-9;]*m)? Failed to decode/,/e49ceb0f.*?$/d' 2>&1

dacto added a commit to dacto/alpine-haskell-stack that referenced this issue Jul 14, 2020
Note:
  Interface file structure changed with GHC 8.10.1.
  This causes Stack to emit warnings like the following after build:

	Warning: Failed to decode module interface:
		 <..>/.stack_work/dist/<tuple>-<hash>/Cabal-3.2.0.0/build/demo/demo-tmp/Main.hi
	         Decoding failure:
	         Invalid magic: e49ceb0f

   For more details, see: commercialhaskell/stack#5134
@ProofOfKeags
Copy link

@snoyberg With a few months of reflection, is this still an approach you consider good? I noticed that this is still an issue in stack 2.5.1. If the approach is one you'd still feel comfortable with, it sounds like it is a reasonably low touch change, that I could take a stab at. It sounds simple from your description, but this will be my first foray into the stack codebase, so I'm not sure what I'm in for exactly. I'm guessing the only reason it hasn't been done is due to not being prioritized given that 8.10.x isn't in the stackage LTS set.

@snoyberg
Copy link
Contributor

Sorry, a few different things have come up in this thread. Which approach are you talking about here?

Overall, I'd support any implementation for this that lets Stack continue to parse the .hi files. If that comes from an upstream GHC proposal, that's great, but I don't think the one mentioned above will help.

@ProofOfKeags
Copy link

ProofOfKeags commented Oct 28, 2020

For specificity, this one: #5134 (comment)

But your response just now indicates that this falls under the generalization you're amenable to.

@stackptr
Copy link

stackptr commented Feb 8, 2021

Is this issue still valid? I encountered this today when building a project on ghc 8.10 but upon upgrading stack (from 2.3.1 to 2.5.1) it seems like these warnings are no longer appearing.

@tfausak
Copy link
Contributor

tfausak commented Feb 8, 2021

The change made in #5325 hides this warning from typical users. The problem still exists behind the scenes. You should be able to set stack-developer-mode: true in your stack.yaml to see the warnings.

@tonyday567
Copy link

Can be closed from a users perspective.

@andreasabel
Copy link

I didn't observe the warnings with stack 2.5.1, but they appeared after a stack upgrade that got me to 2.5.1.1:

$ stack --version
Version 2.5.1.1 x86_64 hpack-0.33.0

@gergoerdi
Copy link

This now blocks Stack from being used with GHC 9.0.1: #5486 #5445

@konn
Copy link

konn commented Mar 23, 2021

I'd never encountered this issue in GHC <9, but today I encountered this with GHC 8.10.4 on macOS, in which stack silently dies and --verbose flag reveals it. Is there any workaround, or plan for the fix in the near future?

@snoyberg
Copy link
Contributor

The short term issue for GHC 9.0 should be addressed by #5508. I don't believe we have a long-term solution for getting the dirty file information out of GHC effectively yet, but with this workaround in place, it simply means that dirty file checking will not work correctly with addDependentFile (the current status quo).

@bgamari
Copy link
Author

bgamari commented Apr 8, 2021

@bgamari I don't know what other comments I can provide to you on the GHC proposal. From what I can see, it's not addressing the concerns that originally led to hi-file-parser to be written. I'd also rather not have this separate library, but it doesn't seem like we have any choice given the lack of a consistent, multi-GHC-version interface for determining TH dependent files.

@snoyberg, I'm afraid I don't follow. The proposal does provide information about TH dependent files (via the dependencies key). Can you clarify?

@snoyberg
Copy link
Contributor

snoyberg commented Apr 8, 2021

I think we're saying the same thing. Stack has a need here: learn about TH dependent files. That information is available in .hi files. Stack needs to get that information. GHC currently doesn't provide a method for doing that directly. Previously, we use .hi-dump files, but that caused issues because in some cases those files were massive. So someone contributed a binary parser, which is now broken due to changes in GHC's file format. You mentioned a GHC proposal, which I think we now both agree is irrelevant to the issue at hand. We're left with the original challenge: how do we efficiently determine dependent files inside of Stack?

For now, the "solution" is to simply fail on newer GHC versions, and hope that someone cares enough about this to add additional parsing capabilities to hi-file-parser.

@bgamari
Copy link
Author

bgamari commented Apr 8, 2021

You mentioned a GHC proposal, which I think we now both agree is irrelevant to the issue at hand. We're left with the original challenge: how do we efficiently determine dependent files inside of Stack?

I'm trying to suggest that the proposal is relevant: it provides precisely the information that you are currently parsing out of the .hi file.

@snoyberg
Copy link
Contributor

snoyberg commented Apr 8, 2021

You're right. I completely misread your previous line. Apologies. Yes, this should address the needs hi-file-parser is filling going forward. We'd still need hi-file-parser for existing GHC versions, but that shouldn't be an issue.

@hsyl20
Copy link
Contributor

hsyl20 commented Apr 8, 2021

I've fixed hi-file-parser (cf commercialhaskell/hi-file-parser#2) so that it supports 8.10 and 9.0

@snoyberg
Copy link
Contributor

snoyberg commented Apr 9, 2021

Thank you @hsyl20! I've merged and released your PR, and will open up a PR against Stack shortly to use it.

@fosskers
Copy link
Contributor

fosskers commented Apr 19, 2021

Seems to work like a charm, I no longer see Invalid magic messages! (stack 2.6.0 built from master via stack upgrade --git).

@ProofOfKeags
Copy link

Seems like this can be closed now.

@AndreasPK
Copy link
Contributor

Seems like this can be closed now.

Sadly the underlying issue remains so that seem premature.

@mpilgrem
Copy link
Member

@hsyl20, would you be able to do what you did for GHC 9.0, but for GHC 9.4? See commercialhaskell/hi-file-parser#5. I think the .hi format may have changed.

@hsyl20
Copy link
Contributor

hsyl20 commented Aug 11, 2022

@mpilgrem Sure. I've responded in the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests