Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better calculation of SourceMap/buildplans (improves cache as well) #3922

Closed
snoyberg opened this issue Mar 14, 2018 · 6 comments · Fixed by #4412
Closed

Better calculation of SourceMap/buildplans (improves cache as well) #3922

snoyberg opened this issue Mar 14, 2018 · 6 comments · Fixed by #4412
Assignees
Milestone

Comments

@snoyberg
Copy link
Contributor

Repro: create a directory with the following two files:

# package.yaml
name: foo

dependencies:
- base
- haskell-src-exts

library: {}
# stack.yaml
resolver: lts-11.0

extra-deps:
- happy-1.19.9@rev:2

NOTE: If someone knows of a package in a Stackage snapshot that uses either happy or alex, and compiles quicker than haskell-src-exts, please let me know, we can make this repro easier to trigger.

Next: observe the following session:

$ stack build
happy-1.19.9: configure
happy-1.19.9: build
happy-1.19.9: copy/register
haskell-src-exts-1.20.2: configure   
haskell-src-exts-1.20.2: build       
haskell-src-exts-1.20.2: copy/register
foo-0.0.0: configure (lib)           
Configuring foo-0.0.0...             
foo-0.0.0: build (lib) 
Preprocessing library for foo-0.0.0..
Building library for foo-0.0.0..
[1 of 1] Compiling Paths_foo        ( .stack-work/dist/x86_64-osx/Cabal-2.0.1.0/build/autogen/Paths_foo.hs, .stack-work/dist/x86_64-osx/Cabal-2.0.1.0/build/Paths_foo.o )
foo-0.0.0: copy/register
Installing library in /Users/michael/Desktop/stack-build-tool-local/.stack-work/install/x86_64-osx/lts-11.0/8.2.2/lib/x86_64-osx-ghc-8.2.2/foo-0.0.0-J3VouLg1fVTLZlRnB5NxZt
Registering library for foo-0.0.0..
Completed 3 action(s). 
Michaels-MacBook-Pro-3:stack-build-tool-local michael$ stack build --dry-run
Would unregister locally:
foo-0.0.0
haskell-src-exts-1.20.2

Would build:
foo-0.0.0: database=local, source=/Users/michael/Desktop/stack-build-tool-local/, after: haskell-src-exts-1.20.2
haskell-src-exts-1.20.2: database=local, source=package index

No executables to be installed.

Expected: the --dry-run call should report a no-op, since all of the packages have already been built.

Actual: it insists on rebuilding haskell-src-exts. The reason is clear when you look at the following --verbose bit of output:

[debug] Ignoring package haskell-src-exts, from (InstalledTo Local,"/Users/michael/Desktop/stack-build-tool-local/.stack-work/install/x86_64-osx/lts-11.0/8.2.2/pkgdb/"), due to wrong location: (Just (InstalledTo Local),Snap)

Stack believes that haskell-src-exts should appear in the snapshot database, since all of its dependencies are in the snapshot unchanged. However, this is a lie: one of its dependencies is actually happy as a build tool. But SourceMap construction does not consider build tools, and therefore the SourceMap and ConstructPlan steps are out of sync!

I think the most straightforward resolution is to move all of the build tool calculation to the SourceMap level.

@snoyberg snoyberg added this to the P1: Must milestone Mar 14, 2018
snoyberg added a commit that referenced this issue Mar 14, 2018
@snoyberg
Copy link
Contributor Author

One approach to solving this would be strategic: implement logic in the SourceMap phase to deal with build tools, the same way we do in ConstructPlan. This kind of strategic change will be like playing whack-a-mole again. The code will continue to get harder to follow, and it will be easier to introduce regressions. Integration tests will help, but it would be better to solve this more holistically. Instead, I want to lay out a plan for a fairly significant refactoring, which I believe will make the code easier to maintain, fix a few bugs (including some we haven't experienced yet), and improve caching. Some relevant related issues:

I'm guessing there are others too. I may edit this comment to add them to this list. I'm going to add two more comments now, one describing the status quo today, and one describing some ideas to a new, better system.

@snoyberg
Copy link
Contributor Author

At a bird's eye view, this is how build plans are constructed today in Stack:

  1. Construct a source map, which maps package name to information on how a package should be built. This includes the location of the package contents (Hackage with version and revision, Git repo, etc). This also includes information on whether the package should be installed into the snapshot or local database, based on whether the package and its dependencies are listed in a snapshot file or a stack.yaml file.
    • Note that this last bit is the source of this bug, since we're paying attention to library dependencies but not build tool dependencies.
  2. Construct an installed map, looking at all of the packages available in the global, snapshot, and local databases and comparing them to what's in the source map. If any packages mismatch, either due to mismatching versions, or incorrect database location, or so on, they are pruned from the installed map.
    • In this bug: a package exists in the local database, but the source map says it should be in the snapshot database, so it gets pruned and then reinstalled. But due to the construct plan step finding the build tools, it gets installed into the local database, kicking off the cycle again.
    • Due to limitations in the metadata available in the GHC package database, we're not really able to check all of the information in the source map. I haven't reproduced yet, but I think there may be a lurking bug where we've changed between two different package locations but kept the flags and version the same, and Stack doesn't know that it needs to reinstall.
  3. Combining these two pieces of information, we have a construct plan step which looks at all of the build targets requested on the command line, and discovers all dependencies, deciding whether to use the already installed version, install a new version, which database to put it in, etc. This is the step that considers build tools.

@snoyberg
Copy link
Contributor Author

I'd like to take a step back from all of this and use the new ideas in the "better cache" approach to try a new stab at this. Let's start it all off from the source map phase. Instead of separating snapshot vs local packages, we should consider immutable vs mutable. See #3860 for more information, but basically: if it comes from the local file system, or depends on something on the local filesystem, the package is mutable. Otherwise, it's immutable. In order to proper calculate this mutable vs immutable idea, we'll need to move the build tool discovery to the source map phase, away from the construct plan phase. It may end up being logical to move version bounds checking to this phase too, away from the construct plan phase.

Here's a new invariant: whenever we install an immutable package, we install it into a location like ~/.stack/immutable/ghc-X.Y.Z/package-version/hash, where hash gives a uniquely identifying key for the full set of information used to figure out how to build that package. This would include the package location, dependencies, flags, and options.

Also, a new piece of information (which @mgsloan has been pushing for a while): we bypass having a separate snapshot database, and only have a global database (by necessity of GHC) and a local database. When we load up the installed map, we will do a package pruning as before. For immutable packages in the source map, we'll be able to ensure that the directory the library is installed into matches the expectation we have. And since this includes the hash mentioned above, we're ensuring that it's exactly the same package. (A different approach would be adding this information the GHC package database registration file, but I think that would require deeper changes to Cabal.)

For mutable packages, the need to prune would be determined differently: based on whether the source files it depends on have changed on the filesystem.

Plan construction would then go basically the same as it had before. However, if as I mentioned above we move some logic to the source map phase, that would mean things like build tools and bounds checking are not done in the construct plan phase. I think this would make both @nh2 and @ezyang happy from previous comments.

Finally: when actually building a package, we'll have some slightly new rules in place, in line with #3860. When building an immutable package, we'll do something like:

  • Check if the package already exists in the directory described above
  • If not, perform an actual build and install it there
  • Copy the registration from that immutable directory into the local package database

Alright, brain dump over :)

@mgsloan
Copy link
Contributor

mgsloan commented Mar 19, 2018

That all sounds good to me 👍 . Definitely in favor of improving this part of stack.

What is the purpose of including ghc-X.Y.Z in the immutable package path? Couldn't think be included in the hash? One reason I can imagine for including it is so that you can easily blow away all the caches for a particular compiler. I believe that these broken package registrations would be noticed and removed.

@snoyberg
Copy link
Contributor Author

snoyberg commented Mar 20, 2018 via email

@snoyberg snoyberg changed the title Build tools not considered when constructing SourceMap Better calculation of SourceMap/buildplans (improves cache as well) Apr 18, 2018
snoyberg added a commit that referenced this issue Jul 2, 2018
This is nowhere near ready, it just paints a basic idea of a direction
to start in. This is based on a whiteboard drawing I did, which may or
may not be helpful.

https://imgur.com/a/1jv3WLp
bitemyapp pushed a commit that referenced this issue Jul 9, 2018
This is nowhere near ready, it just paints a basic idea of a direction
to start in. This is based on a whiteboard drawing I did, which may or
may not be helpful.

https://imgur.com/a/1jv3WLp
snoyberg added a commit that referenced this issue Aug 2, 2018
@qrilka
Copy link
Contributor

qrilka commented Nov 16, 2018

Using local version of new-build-plan branch (waiting #4402 to get merged) I see the following result of repro:

~/ws/h/stack-build-test $ stack build --dry-run
No packages would be unregistered.

Nothing to build.

No executables to be installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants