-
Notifications
You must be signed in to change notification settings - Fork 22
gps for Implementors
Hi! We're thrilled you're thinking about building a tool with gps
. Feedback
on these docs, or the library itself, is always welcome! 🎉
gps
is a library for folks interested in fully solving the package management
problem.
It's designed to provide a simple API that does the hard work for you, letting
you focus on your particular tool's desired workflows and UX instead of the writhing complexities of
package management. Specifically, gps
answers this question for a tool:
Given a tree of Go code, the dependencies it transitively imports, and some constraints on which versions of those imported packages are acceptable, what versions of those dependencies should be used?
There's a lot there, and much of it turns on what "should" means. This
documentation is written with the goal of explaining how to create tools using
gps
that let you control how this question gets answered.
Note that it might be worth skimming the general introduction, if you haven't already.
- Minimum Viable Implementation
-
Solver Inputs:
SolveParameters
-
Preparing a
Solver
-
The
SourceManager
Solutions
and Failures
The absolute minimum required to get gps
running looks something like this:
package main
// Derived from https://github.com/sdboyer/gps/blob/master/example.go (which compiles!)
import (
"go/build"
"os"
"path/filepath"
"strings"
"github.com/sdboyer/gps"
)
func main() {
// Operate on the current directory
root, _ := os.Getwd()
// Assume the current directory is correctly placed on a GOPATH, and derive
// the ProjectRoot from it
srcprefix := filepath.Join(build.Default.GOPATH, "src") + string(filepath.Separator)
importroot := filepath.ToSlash(strings.TrimPrefix(root, srcprefix))
params := gps.SolveParameters{
RootDir: root,
ImportRoot: gps.ProjectRoot(importroot),
}
// Set up a SourceManager. This mediates access to actual code repositories.
sourcemgr, _ := gps.NewSourceManager(NaiveAnalyzer{}, ".repocache", false)
defer sourcemgr.Release()
solver, _ := gps.Prepare(params, sourcemgr)
solution, err := solver.Solve()
if err == nil {
// If no failure, blow away the vendor dir and write a new one out,
// stripping nested vendor directories as we go.
os.RemoveAll(filepath.Join(root, "vendor"))
gps.CreateVendorTree(filepath.Join(root, "vendor"), solution, sourcemgr, true)
}
}
type NaiveAnalyzer struct{}
// DeriveManifestAndLock gets called when the solver needs manifest/lock data
// for a particular project (the gps.ProjectRoot parameter) at a particular
// version. That version will be checked out in a directory rooted at path.
func (a NaiveAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
return nil, nil, nil
}
func (a NaiveAnalyzer) Info() (name string, version *semver.Version) {
v, _ := semver.NewVersion("v0.0.1")
return "example-analyzer", v
}
35 LoC! Hardly a robust or featureful implementation, but not too shabby for a
working dependency manager that does the dep-fetching parts of what go get
does (modulo
arguments).
To explain how to implement gps
in a somewhat more realistic tool, we're going to break this example down, building up from the simpler parts into the more complex. First up: SolveParameters
.
The SolveParameters
struct holds all the arguments and options your tool provides to gps
. The
properties here determine both what data a solver will operate on, as well as
control over some of its internal solving behaviors.
However, the only two required properties are the ones we see here:
root, _ := os.Getwd()
params := gps.SolveParameters{
RootDir: root,
ImportRoot: gps.ProjectRoot("github.com/sdboyer/example-project"),
}
solver, _ := gps.Prepare(params, sourcemgr)
RootDir
is the root filesystem path of the project on which you want gps
to
operate, so os.Getwd()
makes at least some sense for a naive case. It's
probably wise, though not currently required, that your tool ensure that
RootDir
points to a directory under an active $GOPATH
. It's also probably
wise that the tail portion of RootDir
, trimmed of $GOPATH/src
, be the same
as ImportRoot
.
ImportRoot
is a
ProjectRoot
- just a
type alias for string
- but a crucial concept in gps
, as the docs indicate.
gps
relies heavily on the idea of "projects" - trees of packages, all of
which are to be covered by a single vendor/
directory. ProjectRoot
is the
import path at the root of that tree. So, if we were to stop using os.Getwd()
and explicitly set RootDir
, our SolveParameters
might look like this:
params := gps.SolveParameters{
RootDir: "/home/sdboyer/go/src/github.com/sdboyer/example-project",
ImportRoot: gps.ProjectRoot("github.com/sdboyer/example-project"),
}
solver, _ := gps.Prepare(params, sourcemgr)
Almost everywhere that ProjectRoot
is used in gps
, it must
correspond not only to the root of a project, but also to the root of a
repository. This restriction may be relaxed in the future, but for now, assuming that project
root == repository root avoids a whole bunch of classes of problems.
The only exception to this rule is the situation we're looking at right now: when declaring the project for which we're solving, it's OK for it not to be at a repository root. The only catch is, if your users want to make their project consumable as a dependency, then the project root must be the repository root. So, this restriction is probably fine for start-of-the-import-graph projects or company-internal monorepos.
Now, the fun stuff!
gps
aims to make as few assumptions as possible, but package management is
hard, and some assumptions are unavoidable. One of gps
' primary assumptions is
that there are two types of metadata that describe a project/package tree:
-
Manifests
are primarily for expressing version constraints on dependencies. They do so by providing lists ofProjectConstraints
. Manifests are the single most important input to agps
solver. -
RootManifests
composeManifest
, but provide some additional information - overrides and ignores. This reflects the special privileges afforded to the root project. -
Locks
describe an exact, reproducible build. Locks are theSolution
that agps
solver returns - its outputs (Solution
composesLock
). However, locks can also act as supplemental inputs.
While gps
does define interfaces that a tool must implement, it does not care
where a tool gets the information from. It's probably a good idea to represent
these as files contained within the project tree; for example, glide's manifest
is glide.yaml
, and its
lock is glide.lock
; its
Config
and
Lock
types, which
handle those files, implement gps
' Manifest
and Lock
, respectively. (At least, once gps is merged in)
In the example, we included neither a
RootManifest
nor a Lock
. This works - gps
will simply treat every external
import of the root project as
having no version constraint - but for a
real tool, not providing at least a RootManifest
sorta misses the point of using gps
.
Let's say the project we're operating on imports github.com/foo/bar
, and we
only want to accept the master
branch for it. This manifest expresses that
requirement:
m := gps.SimpleManifest{
Deps: []gps.ProjectConstraint{
{
Ident: gps.ProjectIdentifier{
// The project root to which the constraint applies
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
},
// The constraint itself - a branch named master
Constraint: gps.NewBranch("master"),
},
},
}
Constraints
themselves
are mostly what meets the eye - they only allow versions of a certain type
through. There's a little more to
ProjectIdentifiers
,
though. They're another layer atop of ProjectRoot
s, which are in turn a layer
on top of plain old import paths.
ProjectIdentifiers
allow you to designate
an alternate network location from which a given root import path should be
fulfilled:
gps.ProjectIdentifier{
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
NetworkName: "github.com/sdboyer/bar",
}
This tells gps
that we want to fulfill import requiremnts from the tree of
github.com/foo/bar
by sourcing it from (what is presumably) a fork. You can
also specify the exact URL:
gps.ProjectIdentifier{
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
NetworkName: "[email protected]:sdboyer/bar",
}
If NetworkName
is not specified, then it is inferred from ProjectRoot
. Thus, these are equivalent:
gps.ProjectIdentifier{
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
}
gps.ProjectIdentifier{
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
NetworkName: "github.com/foo/bar", // specifying this is unnecessary and redundant
}
Manifests also allow you to express test-only dependency constraints. The
mechanics are essentially the same - build up a []ProjectConstraint
- but
test imports and constraints are only incorporated if a flag explicitly
indicates they should be. For now, that flag is not accessible to tools; gps
hardcodes it to only be enabled for the root project's test dependencies.
(That'll make more sense when we get to ProjectAnalyzer
a bit later.)
One of gps
's chief goals is that solving should produce a solution that not
only meets version constraints, but can actually compile. To that end, gps
looks to static analysis of the source to determine
what all packages have to be present in order for a build to be successful. In
the future, gps
will also perform type compatibility analysis between
importers and importees, and reject solutions that fail that check.
For the most part, this works well out of the box. gps
knows how to ignore
stdlib import paths, the old unqualified appengine
paths, and some
other common ones. But if your user's project has, either locally or
transitively, import paths that gps
needs to ignore, then
RootManifest.IgnorePackages()
is how you tell gps
to do so. They're
reported through this method on the expectation that most tools will want to
provide the user a facility for defining ignores as part of their
manifest, so grouping them together makes sense.
Please do note that it's a method on RootManifest
, not Manifest
. Defining
ignores are one of the special privileges given only to the root project.
When you tell gps
to ignore an import path, it's important to remember that gps
not only ignores the package at that import path, but it also ignores any new
import paths introduced in that package. For example, let's say we have the following import graph: package root
imports packages A
and B
, etc.:
If the solver were operating on this import graph and were told to ignore B
, we might visualize it as so:
The import link from Root
to B
is ignored when looking at Root
's import
paths. B
never makes it into the import graph, and as a result, D
is never
discovered at all: no version constraint checks are made against it, and the
returned Solution
(which we'll get to soon) will not include it. C
,
however, is still considered as normal, because it is reachable through
A
.
Now, gps
could add a different type of 'ignore' mechanism that allows D
to be discovered through B
, but skips checks on B
and omits it from the
solution. A use case will have to present itself, though, and the mechanism it
will be given a different name; ignores will continue to operate as-is.
Overrides, like ignores, are a special privilege of the root project, and are
reported through RootManifest.Overrides()
. They allow the root to enforce
that a ProjectRoot
will always have a particular Constraint
and/or
NetworkName
, superceding any ProjectConstraint
reported from any dependency
(or root's own) Manifest.GetDependencyConstraints()
.
Put another way, overrides make it impossible for certain types of conflicts to occur.
For example, imagine a simple depgraph:
Now, say that A
wants to source C
from Cfork
instead. That's all fine for
A
when acting on its own, but for this project it's a problem, because B
still wants C
to come from C
, not Cfork
:
This disagreement between A
and B
on where C
should be sourced from
results in a conflict. With overrides, though, the root project has the power
to step in and decree that Cfork
should be used:
Of course, the RootManifest
could also mandate that regular C
be used, or
some entirely different source.
The same idea applies with version constraints - if the RootManifest
specifies a Constraint
override for a given ProjectRoot
, then that will be
swapped in for all dependencies on that ProjectRoot
.
It's worth noting that all other version constraint interactions involving narrowing what's acceptable, by computing constraint intersection. Overrides are the only way a constraint can widen beyond what any individual dependency demands.
Overrides are a powerful feature, and are tremendously useful for asserting control over an unruly depgraph and ecosystem. But they must be used with care. Because they are a special privilege of the root project, any "fixes" made via a project's overrides will have no effect if that project is pulled in as something else's dependency. Overuse of overrides has the potential to create an "arms race" in the ecosystem: the more people use overrides, the more other people have to use overrides in order to achieve a sane build.
Note that overrides are expressed a bit differently from normal constraints (though
normal constraints are likely to become like overrides). Instead
of a []ProjectConstraint
, RootManifest.Overrides()
must return a
ProjectConstraints
.
They hold the same information, but are arranged in a map
instead.
While Locks
are solver outputs, they do play some role as solver inputs, as
well. When a root lock is present, the solver will attempt to preserve the
exact versions specified in the lock. This is the main strategy gps
relies on
to minimize changes in the depgraph.
Let's say that we'd already solved for github.com/foo/bar
, and thus had some
Lock
information available. Real implementations would likely have their own
type, but we can use
gps.SimpleLock
for
now:
// the root import path, again
rootimport := gps.ProjectRoot("github.com/foo/bar")
// the revision of it that we want
version := gps.Revision("af281525a8a371ca6929f63c88e569c1c62137ed")
// the network name (here, we're getting it from a fork)
url := "github.com/sdboyer/bar"
// a list of contained packages that are actually imported
pkgs := []string{"github.com/foo/bar"}
l := gps.SimpleLock{
{
gps.NewLockedProject(rootimport, version, url, pkgs)
}
}
Note:
LockedProject
instances must be created by
NewLockedProject()
.
This is done to ensure data consistency: passing a nil Version
causes a
panic.
We can specify any type of version for a LockedProject
:
// v1.0.0 is a valid semver version; this creates a semver version
semver := gps.NewVersion("v1.0.0")
// "some-ol-tag" is not valid semver; this creates a 'plain' version
plainver := gps.NewVersion("some-ol-tag")
// "master" branch version
branch := gps.NewBranch("master")
// revision
revision := gps.Revision("af281525a8a371ca6929f63c88e569c1c62137ed")
The preference here is strongly for Revision
s - being immutable, they provide
the greatest assurance of a reproducible build. For gps
to really properly
avoid changes, though, both revision and version are needed. Given our
example input manifest, that would look like this:
pair := gps.NewBranch("master").Is("af281525a8a371ca6929f63c88e569c1c62137ed")
Finally, wrapping the manifest and lock up together, we create our
SolveParameters
, then prepare and execute a solver run:
manifest := gps.SimpleManifest{
Deps: []gps.ProjectConstraint{
{
Ident: gps.ProjectIdentifier{
ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
},
Constraint: gps.NewBranch("master"),
},
},
}
lock := gps.SimpleLock{
gps.NewLockedProject(
gps.ProjectRoot("github.com/foo/bar"),
gps.NewBranch("master").Is("af281525a8a371ca6929f63c88e569c1c62137ed"),
"github.com/sdboyer/bar",
[]string{"github.com/foo/bar"},
),
}
params := gps.SolveParameters{
RootDir: "/home/sdboyer/go/src/github.com/sdboyer/example-project",
ImportRoot: gps.ProjectRoot("github.com/sdboyer/example-project"),
Manifest: manifest,
Lock: lock,
}
solver, _ := gps.Prepare(params, sourcemgr)
solution, fail := solver.Solve()
Turns out, for this particular case, it is not possible that the solver could
return a Solution
where
the version of github.com/foo/bar
changed. The exact reason why is
complicated, but even if the master
branch of github.com/foo/bar
has new
commits, and even if our example-project
's source was importing thirty new
projects, either the solver would return a Solution
with rev af28152
, or
solving would fail (or gps
has a bug).
That's a pretty strong guarantee, but an important one for users of your tool: locked versions change only if there's absolutely no other choice.
Unless, of course, the user wants them to change. That's up next!
The ToChange
, ChangeAll
, and Downgrade
properties work in tandem with root
Lock
data to determine how much, and what kind, of change should be allowed
in solving.
As a general rule, if your tool has Lock
data available for the root project,
you should always include it in the SolveParameters
. This is true even when
your user wants to update - they've run something like <yourtool> update github.com/foo/bar
. To fulfill that user intent, pass that information along
via SolveParameters.ToChange
:
// Imagine we're building on the previous params
params.ToChange = []gps.ProjectRoot{
"github.com/foo/bar",
}
solver, _ := gps.Prepare(params, sourcemgr)
solution, fail := solver.Solve()
By putting the ProjectRoot
that we want to unlock into the ToChange
slice,
we bypass the information from the lock and allow github.com/foo/bar
to
update to whatever its master
branch points to at the moment we happened to
check.
The other option is using the global setting:
params.ChangeAll = true
Solving here would have the same effect on our version of github.com/foo/bar
,
but is generally more appropriate to use when the user has issued a command
equivalent to <tool> update --all
or <tool> update
(without any projects specified).
In most cases, setting ChangeAll
to true
has the same effect as passing a list of
all ProjectRoot
s in the lock to ToChange
. There are some subtle
differences, however, and it's preferable - not to mention easier - to only use
ChangeAll
if the user has explicitly requested a global update.
One other note: ToChange
and ChangeAll
are named as they are, rather than
ToUpgrade
and UpgradeAll
, because your tool can control the direction of
change, up or down, with the Downgrade
property. (It's Downgrade
, rather
than Upgrade
, so that the zero value corresponds to the common case -
upgrading.)
Keep in mind that, this only applies when dependencies are tagged with valid
semver tags, and semver range constraints are applied - in such cases, gps
will
work from the bottom of the constrained range, rather than the top. The PHP
community has found this capability useful in CI for ecosystem robustness;
turning that flag on and quickly running tests can help users ensure their
constraint ranges are honest.
gps
provides a tracing facility, which generates explanatory output as the
solver moves through each significant step in the solving process. You can
enable it (and pass the output straight to stdout
) like this:
params.Trace = true
params.TraceLogger = log.New(os.Stdout, "", 0)
and gps
will spew forth as it solves. TraceLogger
takes a *log.Logger
; you can redirect that output however you like.
As part of gps
' goal of making dependency management as not-hellish as
possible, we invest considerable effort in making this trace output
informative. In the future, gps
may provide a machine-friendly version of it,
but for now, it's meant for human eyes, looking at a terminal. It looks
something like this:
✓ select (root)
| ? attempt github.com/foo/bar with 1 pkgs; 1 versions to try
| | try github.com/foo/[email protected]
| ✓ select github.com/foo/[email protected] w/1 pkgs
| | ? attempt bitbucket.com/hinkle/crinkle with 1 pkgs; 4 versions to try
| | | try bitbucket.com/hinkle/[email protected]
| | | ✗ bitbucket.com/hinkle/[email protected] not allowed by constraint <=1.0.1:
| | | | <=1.0.2 from (root)
| | | | <=1.0.1 from github.com/foo/[email protected]
| | | try bitbucket.com/hinkle/[email protected]
| | | ✗ bitbucket.com/hinkle/[email protected] not allowed by constraint <=1.0.1:
| | | | <=1.0.1 from github.com/foo/[email protected]
| | | try bitbucket.com/hinkle/[email protected]
| | ✓ select bitbucket.com/hinkle/[email protected] w/1 pkgs
| | | ? attempt bang with 1 pkgs; 1 versions to try
| | | | try github.com/quark/[email protected]
| | | ✓ select github.com/quark/[email protected] w/1 pkgs
✓ found solution with 3 packages from 3 projects
In English: this trace describes a solve run that, after parsing the root
project's code successfully, first looked for a version for
github.com/foo/bar
and found 1.0.0
acceptable; then it accepted
bitbucket.org/hinkle/crinkle
at 1.0.1
after rejecting 1.0.3
and 1.0.2
due to constraints from the root and github.com/foo/bar
. Finally,
github.com/quark/quiggle
was attempted, and worked on the first version
tried, 1.0.0
.
Looking at traces for known solver inputs is probably the fastest way to get an
intuitive handle for how gps
solving works. gps
includes a
large
test
suite
of different inputs with expected outputs. Running them with go test -v -short
will include the trace output for each fixture. You can additionally
isolate a specific fixture by passing its name, e.g. go test -v -short -gps.fix="simple dependency tree"
.
Trace
and Tracelogger
are the last of the SolveParameters
, so we're ready
to move on to the next major step.
With our SolveParameters
in hand, we're ready to
Prepare()
a
Solver
for a run:
solver, err := gps.Prepare(params, sm)
solution, err := solver.Solve()
Prepare()
validates the provided SolveParameters
. If it returns without
error, then solving can proceed, initiated by calling Solve()
. Solve()
takes no parameters, as the solver operates on the parameters you originally
passed to Prepare()
.)
This is the main workhorse of gps
, and where all the complexity lies, but its
return values are pretty simple: either a Solution
is returned, or a failure
in the form of an error
. But there are other things to explore first, so
we'll come back to this later.
There's another step that your tool may want to include prior to solving,
assuming you have a Lock
produced by a previous Solve()
run handy: hashing
the solver parameters.
lock := // some procedure for loading a previous run's lock data, aka solution
solver, _ := gps.Prepare(params, sm)
digest, _ := solver.HashInputs()
if !bytes.Equal(digest, lock.InputHash()) {
solution, err := solver.Solve()
}
Every successfully prepared Solver
can hash its
inputs. The hash
incorporates certain of the inputs from your SolveParameters
, combined with
the set of external packages imported by the root project.
If the old Lock
's digest matches the one generated by the Solver
, it
guarantees that the solution already in the lock is a valid one for the solve
run you're about to perform, possibly rendering the solve run unnecessary.
Effectively, this is a long-term memoization
technique, with the hash acting as
the cache key.
This is very powerful, as it can allow tool to avoid a lot of unnecessary work. If nothing else, all hash inputs can be computed locally - no network activity required. And, thanks to Go's fast static analysis, it can be done quite quickly; on a decent SSD, even a project the size of Kubernetes generally takes only a few seconds.
All that said, it's crucial to understand the limits of the guarantee.
Most importantly, matching hash digests absolutely do not guarantee that solving will produce the same solution as what's described in the lock. It merely guarantees that it's possible the solver would reach the same solution. That means there are cases (e.g., user requesting updates) where your tool might want to proceed to solve anyway, even if the digests match - or not even bother checking the digests at all.
Also...well, it's not actually a guarantee. There are failure modes outside of
gps
's control that would make the lock's solution invalid. All of them have a
decidedly left-pad
-ish flavor.
- If the old lock contains a reference to an upstream project that was [re]moved, then it is necessarily not a valid solution. Realistically, solving is likely to fail, though it's possible it would find a solution without the missing project.
- If the old lock pins a project to a (tag) version that has been [re]moved,
then we're in a grey area. Strictly speaking, the lock is invalid, because a
solve run without the lock could not reproduce the lock. However, in the
interest of depgraph stability, if
gps
detects a situation like this, it will still try to honor the lock's version of reality - but that's not guaranteed to work.
There are various techniques your users can employ to defend themselves against
these possibilities, the simplest of which is committing the generated
vendor/
directory. Maintaining central mirrors of all dependencies is another
possibility. gps
may, in future, add a function to the SourceManager
that
checks a Lock
against these kinds of upstream issues.
The other thing needed to prepare a Solver
is a
SourceManager
,
which is responsible for negotiating all interactions with actual source
repositories, both over the network and on disk.
While SourceManager
is an interface, it's not really intended that tools implement their own.
Rather, they should generally use the
SourceMgr
type provided
by gps
, created via
NewSourceManager()
.
While the solver never explicitly type asserts that it is working with a
*SourceMgr
, it's still fairly tightly coupled to it in subtle ways.
(The SourceManager
interface was only really used to facilitate testing.)
The original example sets up a SourceMgr
like so:
cache := ".repocache"
sourcemgr, _ := gps.NewSourceManager(NaiveAnalyzer{}, cache, false)
defer sourcemgr.Release()
We'll deal with NaiveAnalyzer
in the next section. First, the second parameter:
cache
.
gps
needs a lot of information from source repositories to do its work. Some of
that information is VCS-level metadata, but the rest comes from static analysis
of code. Ordinarily, these are things we'd extract from repositories kept on
$GOPATH
. Unfortunately, that won't work for gps
.
As it runs, the solver will request information from the SourceManager
that
requires checking out different versions of code to disk. It has to do this for
a lot of versions, including many that ultimately won't appear in the solution.
This is fine to do in an isolated area that is solely dedicated to this
purpose; we needn't care about preserving existing disk state, and we can
handle inevitable errors by blowing away repositories and starting again.
Such recourse is not available to us on the $GOPATH
, where mutating
repositories is a no-no, and they may contain local-only state. Trying to avoid
unintended consequences there is a bottomless snake pit. So instead, we use the
cache
.
Of course, all that repository mutation also makes it tricky for more than one
SourceMgr
to operate on a given cache dir at a time. To that end, there is a
global lock on the entire cache tree that NewSourceManager()
takes, and is released
by
SourceMgr.Release()
.
You can pass true
as NewSourceManager()
's final bool
parameter to forcibly
override this lock (if one exists).
While the SourceManager
interface, and the SourceMgr
implementation in
particular, is primarily designed to meet the solver's needs, any tool that
needs a Solver
may well have other uses for it. It's worth exploring the
interface to see what work it might be able to do for you. In particular,
ListPackages()
and
ListVersions()
may be helpful.
If your tool does make its own use of gps
's SourceMgr
, make sure to pass
the same instance into Prepare()
for some easy cache-based performance gains.
As we covered earlier, the root project has its
Manifest
and Lock
explicitly declared up front as a component of
SolveParameters
. However, we need the same information from dependency projects -
primarily the Manifest
- but it's not possible to pass in that information up
front. Instead, we collect it on the fly, via a
ProjectAnalyzer
.
SourceMgr
relies on a ProjectAnalyzer
in order to fulfill its
GetManifestAndLock()
method. SourceMgr
takes care of making sure the requested Version
of the
repository in question is checked out, then calls the injected
ProjectAnalyzer.DeriveManifestAndLock()
method with the appropriate ProjectRoot
(root
import path) and the path
to the checkout's root directory.
type NaiveAnalyzer struct{}
func (a NaiveAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
return nil, nil, nil
}
Returning all nils from the method is basically saying, "no problems, but also, there are no constraints here - for whatever gps finds in this project's import graph, any version will do." Clearly not an optimal implementation, but it does work.
There are several things to keep in mind when implementing a ProjectAnalyzer
:
- In general, an error should only be returned if the tool detects an actual
problem - e.g., an invalid manifest file. Simply not being able to find
manifest or lock information should probably return
nil, nil, nil
. You could return an error, but doing so eliminates the version from consideration, and will likely cause solving to seize up and fail. - gps caches the results your
ProjectAnalyzer
returns fromDeriveManifestAndLock()
with respect to the revision being analyzed. Thus,ProjectAnalyzer
implementations should be stateless: given the same input code tree,DeriveManifestAndLock()
should always return the same results. - Tools should generally not be doing any static analysis of source code in
DeriveManifestAndLock()
; gps does all the necessary import graph analysis elsewhere. - Pursuant to the previous,
$GOPATH
does not matter here. The last few elements of the rootpath
andProjectRoot
are unlikely to align in the way they would need to if$GOPATH
were in play. - If you're building a tool for public consumption, it should probably
interoperate with the full range of existing tooling:
glide
,gb
,godep
,gpm
, etc. That means learning how to read their manifests and/or locks. For tools likegodep
that really only have aLock
-like concept (Godeps.json
), return it as that:nil, lock, nil
. Please, take care here. When popular tools fail to do this, it fractures the Go ecosystem.
glide
's implementation is a good
reference,
particularly with respect to its interoperation with other tools. Now, there's
just one last big thing to consider in your ProjectAnalyzer
, and it relates to that
nil, lock, nil
return pattern.
Locks
, being solver outputs, mostly don't matter to solving - though as we
covered earlier, an exception is made for the root project's lock.
That small exception allows the solver to keep versions stable across solves if
possible (unless updates are requested).
gps
has a similar exception for Locks
that come from dependencies. Versions
coming out of a dependency's lock are referred to as preferred versions. To
explain how they work, we need an example.
Given that we're solving for Root
on this import graph:
When it comes time for the solver to pick a version for D
, then:
- IF the
SolveParameters
contain no lock data (fromRoot
), OR the version forD
fails to meet constraints - AND IF the
ProjectAnalyzer
'sDeriveManifestAndLock()
method reports aLock
forB
- AND IF that
Lock
contains a version forD
(this isB
's' preferred version ofD
) - AND IF that preferred version of
D
is acceptable according to all other constraints - THEN the solution will select
D
atB
's preferred version.
This fairly abstruse path is important for the Go ecosystem because it builds
a bridge to the way most Go dependency management tools (such as
godep
) have historically worked: simply
locking to a revision. If your tool reads in a Godeps.json
file and returns it
as the Lock
from DeriveManifestAndLock()
, then the versions given therein
will probably end up in the Solution
. That'll be the case unless/until:
- Some project, root or otherwise, expresses an incompatible version constraint
- Another dependency's
Lock
expresses a preferred version for the same dep
In practice, this means gps
-based tools can transparently act like godep
for transitive dependencies until the user says otherwise, or the depgraph
makes it impossible. We sometimes call this property transitive version stability.
Now, nothing blows up if two different deps express a preferred version for a
third, but obviously, only one can ultimately win. Again, an example: say that
the solver is trying to pick a version for C
, on which both A
and B
depend. Currently, gps
allows only one preferred version to be expressed, so
the solver would pick one of A
or B
based on complex (but deterministic)
criteria. At that point, the same rules apply - the picked preferred version
will be used only if it meets all constraints, and the root lock doesn't offer
an [acceptable] version.
A tool can implement preferred versions by returning a Lock
from DeriveManifestAndLock()
:
type PreferredVersionsAnalyzer struct{}
func (a PreferredVersionsAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
return findAManifestElseNil(), findSomeLockDataElseNil(), nil
}
Now, preferred versions are definitely sorta magical. If that's not acceptable
for your tool, you can ensure they're never used by always returning a nil
Lock
:
type NoMagicAnalyzer struct{}
func (a NoMagicAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
return findAManifestElseNil(), nil, nil
}
We've finally got everything prepped - it's time to solve!
solver, _ := gps.Prepare(params, sourcemgr)
solution, err := solver.Solve()
if err == nil {
os.RemoveAll(filepath.Join(root, "vendor"))
gps.CreateVendorTree(filepath.Join(root, "vendor"), solution, sourcemgr, true)
}
If solving fails, the error returned will (or should - this is an area we're actively improving) describe what caused the failure. This could be something simple, or something complex. Either way, it's gps' goal that the error be digestible for users, so sending it back to them in some form is recommended.
If no error is returned, then the solver found a
Solution
. At this point,
most real tools would probably want to persist the solution, typically in the form of a lock file.
Persistence or no, with the solution in hand, we can write out our dependency
tree. The example implementation does so in-place after blowing away the
existing vendor/
directory. That's extremely unsafe, of course; real tools
will probably want to write out a tree first to a tmp dir, then swap it in for
the existing vendor/
directory only if doing so caused no errors.
The final boolean parameters to CreateVendorTree
determines whether or not
vendor/
directories nested within dependencies should be removed. This should
generally be an aggressive "yes!" - if any code is present in nested vendor
directories, not stripping it out will mean the user isn't actually getting
the selected solution.
The only time vendor stripping should create any kind of problem is if
dependencies are storing either modified or just plain not-vendor code under
their vendor/
directory. There isn't a lot gps
can do about this, really,
which is why one of our foundational assumptions is that the vendor/
directory
is for...uh, upstream vendored code. Nevertheless, this option is available if
the use case demands no stripping, or if the tool has its own version stripping
logic that it prefers to use.