Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link error when modifying TH #830

Closed
wz1000 opened this issue Oct 7, 2020 · 23 comments · Fixed by haskell/ghcide#873
Closed

Link error when modifying TH #830

wz1000 opened this issue Oct 7, 2020 · 23 comments · Fixed by haskell/ghcide#873

Comments

@wz1000
Copy link
Collaborator

wz1000 commented Oct 7, 2020

When modifying TH, it is easy to run into linker errors:

Program error: 
ByteCodeLink.lookupCE
During interactive linking, GHCi couldn't find the following symbol:
  fakezuuid_THA_th_closure
This may be due to you not asking GHCi to load extra object files,
archives or DLLs needed by your current session.  Restart GHCi, specifying
the missing library using the -L/path/to/object/dir and -lmissinglibname
flags, or simply by naming the relevant files on the GHCi command line.
Alternatively, this link failure might indicate a bug in GHCi.
If you suspect the latter, please send a bug report to:
  [email protected]

Test here: haskell/ghcide#853

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 7, 2020

I can reproduce this before the object code PR too.

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

This, as well as haskell/ghcide#672 are caused by us not calling unload. This means that object code from the first successful compile is reused for all subsequent compiles.

This is further complicated by https://gitlab.haskell.org/ghc/ghc/-/issues/16525 , which means that unloading object code is disabled in GHC until https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3842 lands. Calling unload should be sufficient for bytecode on the other hand.

I have a work in progress fix/workaround here

I think merely calling unload would be good enough for bytecode. However, for object code, we have to call unloadObj manually, which means we might be risking running into #16525. For this reason, we might want to go back to using bytecode instead of object code. I haven't run into this though, and my branch seems to work so far.

I won't have any time to play around with this for about a week. If anyone wants to pick this up, please feel free to do so.

@wz1000 wz1000 pinned this issue Oct 8, 2020
@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

If anyone is going to pick this up, here are the things missing from that branch:

  1. Figuring out when object files are stable, so we don't need to unload them.
  2. More tests
  3. Trying to recreate the scenario from GHC issue #16525, verifying we don't run into that.

@bgamari
Copy link

bgamari commented Oct 8, 2020

Unfortunately, GHC #16525 can happen to pretty much anyone as it has to do with the structure of the heap. Currently the GC fails to trace CAF references properly when determining whether it can unload. For this reason we have disabled it. Forcing unloading is likely to eventually result in a crash. I have fixed this in !3842.

@pepeiborra
Copy link
Collaborator

We'll have to go back to bytecode at least in 8.8, 8.10 and possibly 9.0.
That means we lose the ability to persist the result of bytecode generation, forcing us to keep all the BCOs in memory which doesn't scale. This is because bytecode is not serializable. Is this really the case? The external interpreter sends byte codes over the wire, so there must be a way to serialise it...

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

One option is to use bytecode for files of interest and object code for everything else. However, someone has to verify that symbols defined in bytecode shadow symbols defined in object code for this to work (for when a non FOI becomes a FOI).

GHCi handles the scenario in the test case just fine with -fobject-code. I haven't been able to figure out how it does that. Replicating whatever magic is involved there is another option.

@pepeiborra
Copy link
Collaborator

What happens when a FOI that used to be a non FOI becomes a non FOI again?

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

I don't know, someone has to figure out the semantics of mixing bytecode and object code, either by reading the source or implementing it.

I tried writing unique object code filepaths for each version of the document and discovered that this won't work. Instead of one Linkable shadowing the other, we get GHC runtime linker: fatal error: I found a duplicate definition for symbol.

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

Another option I was discussing with @bgamari was distributing binaries of ghcide/hls built against GHC 8.*.* with !3842 applied. Since the patch doesn't affect code generation, it should be able to read package dbs of the original GHC just fine.

Cons are making ghcide/hls development/building from source harder (or maybe not, development binaries just won't work very well with TH).

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 8, 2020

Another discussion with @bgamari reveals that GHCi is probably using the dynamic system linker, not the RTS linker. When we load object files dynamically with the system linker, the likely behavior is that the newest object wins and shadows all the previous ones (which is what we want).

We can do the same in ghcide as a workaround.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Oct 8, 2020

To make ghcide use the system linker we would need to build it with -dynamic, which is in fact recommended in #787 and #434

That's easy enough for local builds but I suspect it will complicate the distribution of binaries in HLS.

Also, I don't think the system linker supports any form of unloading. /cc @bgamari @simonmar

@bgamari
Copy link

bgamari commented Oct 8, 2020

That means we lose the ability to persist the result of bytecode generation, forcing us to keep all the BCOs in memory which doesn't scale. This is because bytecode is not serializable. Is this really the case? The external interpreter sends byte codes over the wire, so there must be a way to serialise it...

Bytecode itself can be serialised. ResolvedBCO has a Binary instance although that might be too late in the compilation pipeline for you. Specifically the compilation pipeline for bytecode is,

CoreProgram                             \
  ⇓  GHC.CoreToByteCode.schemeTopBind    |
ProtoBCO                                 | GHC.CoreToByteCode.byteCodeGen
  ⇓  GHC.ByteCode.Asm.assembleBCOs       |
CompiledByteCode                        /
  ⇓  GHC.ByteCode.Linker.linkBCO
ResolvedBCO
  ⇓  GHCi.CreateBCO.createBCOs
HValueRef

I haven't thought hard about where the right place in this pipeline would be to persist bytecode.

@bgamari
Copy link

bgamari commented Oct 8, 2020

Also, I don't think the system linker supports any form of unloading. /cc @bgamari @simonmar

dlclose is exposed to allow the user to indicate that a loaded dynamic object is no longer needed. Whether this actually unloads anything is of course up to the implementation. However, I introduce support for "unloading" of dynamic objects in ghc!3842 (based on @niteria's previous work).

@pepeiborra
Copy link
Collaborator

Also, I don't think the system linker supports any form of unloading. /cc @bgamari @simonmar

dlclose is exposed to allow the user to indicate that a loaded dynamic object is no longer needed. Whether this actually unloads anything is of course up to the implementation. However, I introduce support for "unloading" of dynamic objects in ghc!3842 (based on @niteria's previous work).

Does the note below not apply or is it out of date ?

The dynamic linker doesn't support unloading objects currently, so if you want that functionality then the static linker is the only option. The Haxl project at Facebook is using the static linker for this reason, and also due to the performance overhead of dynamic linking. It's not known whether the dynamic linker can support unloading.

https://gitlab.haskell.org/ghc/ghc/-/wikis/dynamic-linking-debate#do-we-need-the-static-linker-anyway

@bgamari
Copy link

bgamari commented Oct 8, 2020

Does the note below not apply or is it out of date ?

The dynamic linker doesn't support unloading objects currently, so if you want that functionality then the static linker is the only option. The Haxl project at Facebook is using the static linker for this reason, and also due to the performance overhead of dynamic linking. It's not known whether the dynamic linker can support unloading.

It will not apply after ghc!3842 is merged but this hasn't happened yet. In light of the fact that unloading being disabled seems to break ghcide I am somewhat inclined to expedite this process.

@pepeiborra
Copy link
Collaborator

Ok so ghc!3842 makes that note obsolete but won't be available until 9.2.

What about the backport in https://github.com/facebook/fbghc/commits/ghc-8.8.3-facebook ?

@bgamari
Copy link

bgamari commented Oct 8, 2020

Ok so ghc!3842 makes that note obsolete but won't be available until 9.2.

What about the backport in https://github.com/facebook/fbghc/commits/ghc-8.8.3-facebook ?

It's not so much a backport as another, slightly broken, version of the same thing. I started from that patch, refactored it, and fixed various GC bugs that it triggered.

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 9, 2020

What happens when a FOI that used to be a non FOI becomes a non FOI again?

Going over the source, as long as the ByteCode symbols are in the ClosureEnv in the PersistentLinkerState, they will shadow symbols defined in object code. Calling unload on the bytecode will remove it from there, so it should be possible to use bytecode for FOIs and object code for everything else if we manage the calls to unload properly.

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 9, 2020 via email

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 11, 2020

The error goes away if we use -fexternal-interpreter and WayDyn. However, plugins don't work anymore since they require an internal interpreter.

All the other tests seem to pass.

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 11, 2020

I can make the plugin tests pass by unsetting Opt_ExternalInterpreter and WayDyn before initializePlugins and resetting them after.

@bgamari can anything go wrong if we mix together the internal and external interpreters in the same session?

@pepeiborra
Copy link
Collaborator

@wz1000 Can we expect the external interpreter to put up with ghcide calling it concurrently and abruptly cancelling computations? I suspect we'd need one external interpreter per normalised filepath but I don't know if that's possible

@wz1000
Copy link
Collaborator Author

wz1000 commented Oct 11, 2020

@pepeiborra No idea, it seems to work fine when I'm using it in vscode. Perhaps we can mask exceptions when running TH using hscCompileCoreExprHook.

@wz1000 wz1000 unpinned this issue Nov 8, 2020
@pepeiborra pepeiborra transferred this issue from haskell/ghcide Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants