From 7594e201ae9f6babad71bd79c3902240a6798db3 Mon Sep 17 00:00:00 2001 From: Michael Snoyman Date: Sat, 8 Sep 2018 21:43:05 +0300 Subject: [PATCH 1/3] Document an overview of builds With all of the various refactorings at play, I wanted to document what the end goal of this whole process would be, both to guide my own work, an to get feedback from others (either to improve the process, or clarify it so others can participate in the maintenance of this better). --- doc/build-overview.md | 251 ++++++++++++++++++++++++++++++++++++++++++ doc/terminology.md | 22 ---- 2 files changed, 251 insertions(+), 22 deletions(-) create mode 100644 doc/build-overview.md delete mode 100644 doc/terminology.md diff --git a/doc/build-overview.md b/doc/build-overview.md new file mode 100644 index 0000000000..9855cf327a --- /dev/null +++ b/doc/build-overview.md @@ -0,0 +1,251 @@ +
+ +# Build Overview + +__NOTE__ This document should *not be considered accurate* until this +note is removed. + +This is a work-in-progress document covering the build process used by Stack. +Stack. It was started following the Pantry rewrite work in Stack (likely to +land as Stack 2.0), and contains some significant changes/simplifications from +how things used to work. This document will likely not fully be reflected in +the behavior of Stack itself until late in the Stack 2.0 development cycle. + +## Terminology + +* Project package: anything listed in `packages` in stack.yaml +* Dependency: anything listed in extra-deps or a snapshot +* Target: package and/or component listed on the command line to be built. Can + be either project package or dependency. If none specified, automatically + targets all project packages +* Immutable package: a package which comes from Hackage, an archive, or a + repository. In contrast to... +* Mutable package: a package which comes from a local file path. The contents + of such a package are assumed to mutate over time. +* Snapshot database: a package database and set of executables for a given set + of _immutable_ packages. Only packages from immutable sources and which + depend exclusively on other immutable packages can be in this database. + *QUESTION* Would this better be called the _immutable database_ or _write only + database_? +* Local database: a package database and set of executables for packages which + are either mutable or depend on such mutable packages. Importantly, packages + in this database can be unregister, replaced, etc, depending on what happens + with the source packages. + +Outdated terminology to be purged: + +* Wanted +* Local +* Snapshot package + +## Inputs + +Stack pays attention to the following inputs: + +* Current working directory, used for finding the default `stack.yaml` file and + resolving relative paths +* The `STACK_YAML` environment variable +* Command line arguments (CLI args), as will be referenced below + +Given these inputs, Stack attempts the following process when performing a build. + +## Find the `stack.yaml` file + +* Check for a `--stack-yaml` CLI arg, and use that +* Check for a `STACK_YAML` env var +* Look for a `stack.yaml` in this directory or ancestor directories +* Fall back to the default global project + +This file is parsed to provide the following config values: + +* `resolver` +* `compiler` +* `packages` +* `extra-deps` +* `flags` +* `ghc-options` + +`flags` and `ghc-options` break down into both _by name_ (applied to a +specific package) and _general_. + +## Wanted compiler, dependencies, and project packages + +* If the `--resolver` CLI is present, ignore the `resolver` and + `compiler` config values +* Load up the snapshot indicated by the `resolver` (either config + value or CLI arg). This will provide: + * A map from package name to package location, flags, GHC options, + and if a package should be hidden. All package locations here + are immutable. + * A wanted compiler version, e.g. `ghc-8.4.3` +* If the `--compiler` CLI arg is set, or the `compiler` config value + is set (and `--resolver` CLI arg is not set), ignore the wanted + compiler from the snapshot and use the specified wanted compiler +* Parse `extra-deps` into a `Map PackageName PackageLocation`, + containing both mutable and immutable package locations. Parse + `packages` into a `Map PackageName ProjectPackage`. +* Ensure there are no duplicates between these two sets of packages +* Delete any packages from the snapshot packages that appear in + `packages` or `extra-deps` +* Perform a left biased union between the immutable `extra-deps` + values and the snapshot packages. Ignore any settings in the + snapshot packages that have been replaced. +* Apply the `flags` and `ghc-options` by name to these packages. If + any values are specified but no matching package is found, it's an + error. +* We are now left with the following: + * A wanted compiler version + * A map from package name to immutable packages with package config (flags, GHC options, hidden) + * A map from package name to mutable packages as dependencies with package config + * A map from package name to mutable packages as project packages with package config + +## Get actual compiler + +Use the wanted compiler and various other Stack config values (not all +listed here) to find the actual compiler, potentially installing it in +the process. + +## Global package sources + +With the actual compiler discovered, list out the packages available +in its database and create a map from package name to +version/GhcPkgId. Remove any packages from this map which are present +in one of the other three maps mentioned above. + +## Resolve targets + +Take the CLI args for targets as raw text values and turn them into +actual targets. + +* Do a basic parse of the values into one of the following: + * Package name + * Package identifier + * Package name + component + * Directory +* An empty target list is equivalent to listing the package names of + all project packages +* For any directories specified, find all project packages in that + directory or subdirectories therefore and convert to those package + names +* For all package identifiers, ensure that either the package name + does not exist in any of the three parsed maps from the "wanted + compiler" step above, or that the package is present as an immutable + dependency from Hackage. If so, create an immutable dependency entry + with default flags, GHC options, and hidden status, and add this + package to the set of immutable package dependencies. +* For all package names, ensure the package is in one of the four maps + we have, and if so add to either the dependency or project package + target set. +* For all package name + component, ensure that the package is a + project package, and add that package + component to the set of + project targets. +* Ensure that no target has been specified multiple times. + +We now have an update four package maps, a new set of dependency +targets, and a new set of project package targets (potentially with +specific components). + +## Apply named CLI flags + +Named CLI flags are applied to specific packages by updating the +config in one of the four maps. If a flag is specified and no package +is found, it's an error. + +## Apply CLI GHC options + +Apply GHC options from the command line to all _project package +targets_. *FIXME* confirm that this is in fact the correct behavior. + +## Apply general flags (CLI and config value) + +*FIXME* figure out and document exactly which packages these will +apply to. + +## Apply general GHC options + +*FIXME* list out the various choices here and which packages they +apply to. + +## Determine snapshot hash + +Use some deterministic binary serialization and SHA256 thereof to get +a hash of the following information: + +* Actual compiler (GHC version, path, *FIXME* probably some other + unique info from GHC, I've heard that `ghc --info` gives you + something) +* Global database map +* Immutable dependency map + +Motivation: Any package built from the immutable dependency map and +installed in this database will never need to be rebuilt. + +*FIXME* Caveat: do we need to take profiling settings into account +here? How about Haddock status? + +## Determine actual target components + +* Dependencies: "default" components (all libraries and executables) +* Project packages: + * If specific components named: only those, plus any libraries present + * If no specific components, include the following: + * All libraries, always + * All executables, always + * All test suites, _if_ `--test` specified on command line + * All benchmarks, _if_ `--bench` specified on command line + +## Construct build plan + +* Applied to every target (project package or dependency) +* Apply flags, platform, and actual GHC version to resolve + dependencies in any package analyzed +* Include all library dependencies for all enabled components +* Include all build tool dependencies for all enabled components + (using the fun backwards compat logic for `build-tools`) +* Apply the logic recursively to come up with a full build plan +* If a task depends exclusively on immutable packages, mark it as + immutable. Otherwise, it's mutable. The former go into the snapshot + database, the latter into the local database. + +We now have a set of tasks of packages/components to build, with full +config information for each package, and dependencies that must be +built first. + +*FIXME* There's some logic to deal with cyclic dependencies between +test suites and benchmarks, where a task can be broken up into +individual components versus be kept as a single task. Need to +document this better. Currently it's the "all in one" logic. + +## Unregister local modified packages + +* For all mutable packages in the set of tasks, see if any files have + changed since last successful build and, if so, unregister + delete + their executables +* For anything which depends on them directly or transitively, + unregister + delete their executables + +## Perform the tasks + +* Topological sort, find things which have no dependencies remaining +* Check if already installed in the relevant database + * Check package database + * Check Stack specific "is installed" flags, necessary for + non-library packages + * For project packages, need to also check which components were + built, if tests were run, if we need to rerun tests, etc +* If all good: do nothing +* Otherwise, for immutable tasks: check the precompiled cache for an + identical package installation (same GHC, dependencies, etc). If + present: copy that over, and we're done. +* Otherwise, perform the build, register, write to the Stack specific + "is installed" stuff, and (for immutable tasks) register to the + precompiled cache + +"Perform the build" consists of: + +* Do a cabal configure, if needed +* Build the desired components +* For all test suites built, unless "no rerun tests" logic is on and + we already ran the test, _or_ "no run tests" is on, run the test +* For all benchmarks built, unless "no run benchmarks" is on, run the + benchmark diff --git a/doc/terminology.md b/doc/terminology.md deleted file mode 100644 index 336d561b72..0000000000 --- a/doc/terminology.md +++ /dev/null @@ -1,22 +0,0 @@ -
-# Terminology - -This is a work-in-progress document covering terminology used by -Stack. It was started following the Pantry rewrite work in Stack -(likely to land as Stack 2.0), and contains some significant -changes/simplifications from previous terms. - -__NOTE__ This document should *not be considered accurate* until this -note is removed. - -Correct, new terminology - -* Project package: anything listed in `packages` in stack.yaml -* Dependency: anything listed in extra-deps or a snapshot -* Target: package and/or component listed on the command line to be built. Can be either project package or dependency. If none specified, automatically targets all project packages - -Outdated terminology to be purged: - -* Wanted -* Local -* Snapshot package From 0ef138ce0dea5325695efdd358b0804d12ce802f Mon Sep 17 00:00:00 2001 From: Michael Snoyman Date: Sun, 9 Sep 2018 08:19:27 +0300 Subject: [PATCH 2/3] Clarifications and typo corrections --- doc/build-overview.md | 42 +++++++++++++++++++++++++----------------- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/doc/build-overview.md b/doc/build-overview.md index 9855cf327a..3b23e09180 100644 --- a/doc/build-overview.md +++ b/doc/build-overview.md @@ -6,7 +6,7 @@ __NOTE__ This document should *not be considered accurate* until this note is removed. This is a work-in-progress document covering the build process used by Stack. -Stack. It was started following the Pantry rewrite work in Stack (likely to +It was started following the Pantry rewrite work in Stack (likely to land as Stack 2.0), and contains some significant changes/simplifications from how things used to work. This document will likely not fully be reflected in the behavior of Stack itself until late in the Stack 2.0 development cycle. @@ -22,15 +22,14 @@ the behavior of Stack itself until late in the Stack 2.0 development cycle. repository. In contrast to... * Mutable package: a package which comes from a local file path. The contents of such a package are assumed to mutate over time. -* Snapshot database: a package database and set of executables for a given set +* Write only database: a package database and set of executables for a given set of _immutable_ packages. Only packages from immutable sources and which depend exclusively on other immutable packages can be in this database. - *QUESTION* Would this better be called the _immutable database_ or _write only - database_? -* Local database: a package database and set of executables for packages which + *NOTE* formerly this was the _snapshot database_. +* Mutable database: a package database and set of executables for packages which are either mutable or depend on such mutable packages. Importantly, packages in this database can be unregister, replaced, etc, depending on what happens - with the source packages. + with the source packages. *NOTE* formerly this was the *local database*. Outdated terminology to be purged: @@ -58,12 +57,12 @@ Given these inputs, Stack attempts the following process when performing a build This file is parsed to provide the following config values: -* `resolver` -* `compiler` -* `packages` -* `extra-deps` -* `flags` -* `ghc-options` +* `resolver` (required field) +* `compiler` (optional field) +* `packages` (optional field, defaults to `["."]`) +* `extra-deps` (optional field, defaults to `[]`) +* `flags` (optional field, defaults to `{}`) +* `ghc-options` (optional field, defaults to `{}`) `flags` and `ghc-options` break down into both _by name_ (applied to a specific package) and _general_. @@ -139,7 +138,12 @@ actual targets. * For all package name + component, ensure that the package is a project package, and add that package + component to the set of project targets. -* Ensure that no target has been specified multiple times. +* Ensure that no target has been specified multiple times. (*FIXME* + Mihai states: I think we will need an extra consistency step for + internal libraries. Sometimes stack needs to use the mangled name + (`z-package-internallibname-z..`), sometimes the + `package:internallibname` one. But I think this will become obvious + when doing the code changes.) We now have an update four package maps, a new set of dependency targets, and a new set of project package targets (potentially with @@ -149,17 +153,21 @@ specific components). Named CLI flags are applied to specific packages by updating the config in one of the four maps. If a flag is specified and no package -is found, it's an error. +is found, it's an error. Note that flag settings are added _on top of_ +previous settings in this case, and does not replace them. That is, if +previously we have `singleton (FlagName "foo") True` and now add +`singleton (FlagName "bar") True`, both `foo` and `bar` will now be +true. ## Apply CLI GHC options Apply GHC options from the command line to all _project package targets_. *FIXME* confirm that this is in fact the correct behavior. -## Apply general flags (CLI and config value) +## Apply general flags from CLI -*FIXME* figure out and document exactly which packages these will -apply to. +`--flag *:flagname[:bool]` specified on the CLI are applied to any +project package which uses that flag name. ## Apply general GHC options From 8f0298075ec8fbe82a9bad784757accc243a5e31 Mon Sep 17 00:00:00 2001 From: Michael Snoyman Date: Sun, 9 Sep 2018 08:25:45 +0300 Subject: [PATCH 3/3] Build overview replaces architecture (fixes #4251) --- doc/architecture.md | 190 -------------------------------------------- 1 file changed, 190 deletions(-) delete mode 100644 doc/architecture.md diff --git a/doc/architecture.md b/doc/architecture.md deleted file mode 100644 index 05cb9d7cc5..0000000000 --- a/doc/architecture.md +++ /dev/null @@ -1,190 +0,0 @@ -
- -# Architecture - -__NOTE__ MSS 2018-08-22 This document is out of date, and will be made -more out of date by -[#3922](https://github.com/commercialhaskell/stack/issues/3922). I -intend to update it when implementing #3922. Tracked in -[#4251](https://github.com/commercialhaskell/stack/issues/4251). - -## Terminology - -* Package identifier: a package name and version, e.g. text-1.2.1.0 -* GhcPkgId: a package identifier plus the unique hash for the generated binary, - e.g. text-1.2.1.0-bb83023b42179dd898ebe815ada112c2 -* Package index: a collection of packages available for download. This is a - combination of an index containing all of the .cabal files (either a tarball - downloaded via HTTP(S) or a Git repository) and some way to download package - tarballs. - * By default, stack uses a single package index (the Github/S3 mirrors of - Hackage), but supports customization and adding more than one index -* Package database: a collection of metadata about built libraries -* Install root: a destination for installing packages into. Contains a bin path - (for generated executables), lib (for the compiled libraries), pkgdb (for the - package database), and a few other things -* Snapshot: an LTS Haskell or Stackage Nightly, which gives information on a - complete set of packages. This contains a lot of metadata, but importantly it - can be converted into a mini build plan... -* Mini build plan: a collection of package identifiers and their build flags - that are known to build together -* Resolver: the means by which stack resolves dependencies for your packages. - The two currently supported options are snapshot (using LTS or Nightly), and - GHC (which installs no extra dependencies). Others may be added in the future - (such as a SAT-based dependency solver). These packages are always taken from - a package index -* extra-deps: additional packages to be taken from the package index for - dependencies. This list will *shadow* packages provided by the resolver -* Local packages: source code actually present on your file system, and - referred to by the `packages` field in your stack.yaml file. Each local - package has exactly one .cabal file -* Project: a stack.yaml config file and all of the local packages it refers to. - -## Databases - -Every build uses three distinct install roots, which means three separate -package databases and bin paths. These are: - -* Global: the packages that ship with GHC. We never install anything into this - database -* Snapshot: a database shared by all projects using the same snapshot. Packages - installed in this database must use the exact same dependencies and build - flags as specified in the snapshot, and cannot be affected by user flags, - ensuring that one project cannot corrupt another. There are two caveats to - this: - * If different projects use different package indices, then their - definitions of what package foo-1.2.3 are may be different, in which case - they *can* corrupt each other's shared databases. This is warned about in - the FAQ - * Turning on profiling may cause a package to be recompiled, which will - result in a different GhcPkgId -* Local: extra-deps, local packages, and snapshot packages which depend on them - (more on that in shadowing) - -## Building - -### Shadowing - -Every project must have precisely one version of a package. If one of your -local packages or extra dependencies conflicts with a package in the snapshot, -the local/extradep *shadows* the snapshot version. The way this works is: - -* The package is removed from the list of packages in the snapshot -* Any package that depends on that package (directly or indirectly) is moved - from the snapshot to extra-deps, so that it is available to your packages as - dependencies. - * Note that there is no longer any guarantee that this package will build, - since you're using an untested dependency - -After shadowing, you end up with what is called internally a `SourceMap`, which -is `Map PackageName PackageSource`, where a `PackageSource` can be either a -local package, or a package taken from a package index (specified as a version -number and the build flags). - -### Installed packages - -Once you have a `SourceMap`, you can inspect your three available databases and -decide which of the installed packages you wish to use from them. We move from -the global, to snapshot, and finally local, with the following rules: - -* If we require profiling, and the library does not provide profiling, do not - use it -* If the package is in the `SourceMap`, but belongs to a difference database, - or has a different version, do not use it -* If after the above two steps, any of the dependencies are unavailable, do not - use it -* Otherwise: include the package in the list of installed packages - -We do something similar for executables, but maintain our own database of -installed executables, since GHC does not track them for us. - -### Plan construction - -When running a build, we know which packages we want installed (inventively -called "wanteds"), which packages are available to install, and which are -already installed. In plan construction, we put this information together to -decide which packages must be built. The code in Stack.Build.ConstructPlan is -authoritative on this and should be consulted. The basic idea though is: - -* If any of the dependencies have changed, reconfigure and rebuild -* If a local package has any files changed, rebuild (but don't bother - reconfiguring) -* If a local package is wanted and we're running tests or benchmarks, run the - test or benchmark even if the code and dependencies haven't changed - -### Plan execution - -Once we have the plan, execution is a relatively simple process of calling -`runghc Setup.hs` in the correct order with the correct parameters. See -Stack.Build.Execute for more information. - -## Configuration - -stack has two layers of configuration: project and non-project. All of these -are stored in stack.yaml files, but the former has extra fields (resolver, -packages, extra-deps, and flags). The latter can be monoidally combined so that -a system config file provides defaults, which a user can override with -`~/.stack/config.yaml`, and a project can further customize. In addition, -environment variables STACK\_ROOT and STACK\_YAML can be used to tweak where -stack gets its configuration from. - -stack follows a simple algorithm for finding your project configuration file: -start in the current directory, and keep going to the parent until it finds a -`stack.yaml`. When using `stack ghc` or `stack exec` as mentioned above, you'll -sometimes want to override that behavior and point to a specific project in -order to use its databases and bin directories. To do so, simply set the -`STACK_YAML` environment variable to point to the relevant `stack.yaml` file. - -## Snapshot auto-detection - -When you run `stack build` with no stack.yaml, it will create a basic -configuration with a single package (the current directory) and an -auto-detected snapshot. The algorithm it uses for selecting this snapshot is: - -* Try the latest two LTS major versions at their most recent minor version - release, and the most recent Stackage Nightly. For example, at the time of - writing, this would be lts-2.10, lts-1.15, and nightly-2015-05-26 -* For each of these, test the version bounds in the package's .cabal file to - see if they are compatible with the snapshot, choosing the first one that - matches -* If no snapshot matches, uses the most recent LTS snapshot, even though it - will not compile - -If you end up in the no compatible snapshot case, you typically have three -options to fix things: - -* Manually specify a different snapshot that you know to be compatible. If you - can do that, great, but typically if the auto-detection fails, it means that - there's no compatible snapshot -* Modify version bounds in your .cabal file to be compatible with the selected - snapshot -* Add `extra-deps` to your stack.yaml file to fix compatibility problems - -Remember that running `stack build` will give you information on why your build -cannot occur, which should help guide you through the steps necessary for the -second and third option above. Also, note that those options can be -mixed-and-matched, e.g. you may decide to relax some version bounds in your -.cabal file, while also adding some extra-deps. - -## Explicit breakage - -As mentioned above, updating your package indices will not cause stack to -invalidate any existing package databases. That's because stack is always -explicit about build plans, via: - -1. the selected snapshot -2. the extra-deps -3. local packages - -The only way to change a plan for packages to be installed is by modifying one -of the above. This means that breakage of a set of installed packages is an -*explicit* and *contained* activity. Specifically, you get the following -guarantees: - -* Since snapshots are immutable, the snapshot package database will not be - invalidated by any action. If you change the snapshot you're using, however, - you may need to build those packages from scratch. -* If you modify your extra-deps, stack may need to unregister and reinstall - them. -* Any changes to your local packages trigger a rebuild of that package and its - dependencies.