Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Proposal: package management #851

Closed
myitcv opened this issue Mar 23, 2021 · 43 comments
Closed

Proposal: package management #851

myitcv opened this issue Mar 23, 2021 · 43 comments

Comments

@myitcv
Copy link
Contributor

myitcv commented Mar 23, 2021

With extensive inputs from @mpvl.

Proposal summary

We propose adding package management to CUE, using an approach analogous to Go using Minimum Version Selection and semantic versioning. The changes are limited to cmd/cue and the cue/load package, and hence have no bearing on CUE the language.

As an interim measure, we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default for cmd/cue and cue/load, but entirely configurable, just like cmd/go.

The proposed approach is broadly identical to the approach followed by Go modules. Indeed this proposal borrows heavily from Russ Cox's blog posts introducing vgo, the core Go modules documentation, and interactions with Bryan Mills and Jay Conrod from the Go team.

For those who might be less familiar with Go, in particular Go modules, for the more significant parts of this proposal the relevant parts of the Go module reference have been copied and adapted for the CUE context to save jumping between multiple documents. For less significant parts of the proposal, or those for where the concept is truly identical between both approaches, then we generally choose to link to the relevant Go documentation.

I understand Go modules; what's the TL;DR?

Here is an abridged version for those who have a working knowledge of Go modules, presented from the user perspective with various implementation considerations thrown in where relevant.

What will be different?

  • We will rename the current cue get go to cue import go - see cmd/cue: rename 'get go' to 'import go' #646.
  • The root of a CUE module is identified by a cue.mod directory; the root of a Go module is identified by a go.mod file.
  • For a CUE module to be shareable (i.e. act as a dependency for another module), it will also need to include a go.mod file that mirrors the cue.mod/module.cue file.
  • In CUE the main module can be anonymous (a cue.mod directory without a module.cue file); in Go this is not possible.
  • In Go we speak of building a package, whereas in CUE we speak of evaluating a package, or more precisely an instance (because a package can have multiple instances).
  • CUE modules do not have to deal with a pre-module legacy (read: a GOPATH equivalent).
  • CUE modules declare the module path (optional in CUE), requirements, retractions etc in cue.mod/module.cue; this file is semantically equivalent to go.mod, with all directives supported albeit via a different syntax.
  • CUE modules will use cue.mod/sum.cue as its equivalent of go.sum, but in a CUE format (for consistency with cue.mod/module.cue).
  • The vendor directory will be cue.mod/pkg, as it is today.
  • CUE uses the cue/load package for loading CUE instances. Go has no equivalent, and instead relies on golang.org/x/tools/go/packages to wrap cmd/go. Working with CUE modules therefore does not require the user to have cmd/cue installed.
  • The cue/load.Config configuration type is extended with the necessary fields for controlling module resolution and loading behaviour; these are essentially the backends to the frontend flags and environment variables available in cmd/cue.
  • Vanity import paths will be supported through cue-import <meta> HTML tags with a fallback to go-import.
  • There is support for CUE and Go modules coexisting at the same module path within the same source code repository. See "Go modules and CUE modules coexisting" for mod .

What is the same?

  • The definition of a main module is identical in CUE: the main module is the module containing the directory where the loader is invoked.
  • Like Go modules, CUE modules build upon the concept of semantic import versioning and the import compatibility rule.
  • Like Go, minimal version selection (MVS) will be used as the algorithm to select a set of module versions to use when evaluating packages.
  • As an interim measure, the CUE project will use the proxy.golang.org module mirror and sum.golang.org checksum database by default, with configuration to turn off proxy-based loading.
  • CUE follows the Go approach for supporting private modules.
  • CUE will have a module cache like the Go module cache (it has no need for a build/test cache); the location will default to $HOME/cue/modcache, and be configurable via CUEMODCACHE.
  • cmd/cue will have an analogous set of module-related commands as cmd/go::
    • cue get
    • cue list (and cue list -m)
    • cue mod download
    • cue mod edit
    • cue mod graph
    • cue mod init
    • cue mod tidy
    • cue mod vendor
    • cue mod why
    • cue env
  • The various modes and options for loading CUE instances are broadly the same as cmd/go's module-aware command flags and environment variables:
    • GO* environment variables have equivalents of the form CUE*,
    • -mod has a CUE equivalent --mod,
    • -modfile has a CUE equivalent --modpath.
  • --mod=vendor remains an option. The existence of a vendor directory ( cue.mod/pkg) implies --mod=vendor. This approach represents a backwards compatibility mode for how things work today.
  • Versions of CUE modules are created by tagging a repository using a semver version.
  • CUE modules also adopt the concept of a pseudo-version.
  • CUE will use and support the same version queries (@latest et al).

Background

The module and package concepts of CUE are directly inspired by their equivalent in Go. It is natural therefore to consider package versioning following the same model as Go.

One important and pleasant fact to note is that because CUE has already established the concept of a module, it does not have to manage a large legacy pre-modules world, unlike GOPATH in the Go project. This is reflected in some of the decisions in this proposal. As a result, we do not need to distinguish between module-aware mode non-module-aware mode: every cmd/cue command is and will remain module-aware by definition, the same is true for cue/load.

The issues

As noted above, CUE has already established the concept of a module. The root of a module is denoted by a directory that itself contains a cue.mod directory. The contents of this directory are mostly managed by the cue tool. In that sense, cue.mod is analogous to the .git directory marking the root directory of a repo, but where its contents are mostly managed by the git tool.

Here is a minimal example that declares a CUE module example.com/blah, a package blah within the root of that module, such that blah imports a third-party CUE package acme.com/quote that is vendored within cue.mod/pkg (written in the txtar format):

-- cue.mod/module.cue --
module: "example.com/blah"

-- cue.mod/pkg/acme.com/quote/quote.cue --
package quote

Hello: "hello"

-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello

From the root of the module we can cue eval:

$ cue eval
x: "hello"

Because there are no arguments passed to cue eval, the implied argument is ., the package in the current directory. cmd/cue then hands off to cue/load to resolve and load ..

In this example, example.com/blah is referred to as the main module. As we can see from the package example.com/blah, it has a dependency on acme.com/quote, a package that is not part of the main module. cue/load currently uses the simple rule therefore of searching for all dependencies outside of the main module within cue.mod/{pkg,gen,usr} directories, unifying the result.

In this instance, the acme.com/quote package is not part of a module, but it could well be.

So whilst cmd/cue automatically takes care of resolving imports or package paths for us (via cue/load), everything else is left to the CUE developer. Vendoring of packages within cue.mod/pkg has to be done by hand, there is no mechanism by which dependencies can be fetched from source code hosting sites or remote repositories.

For an initial implementation of cmd/cue this was more than sufficient. Indeed, users have adapted shell scripts to help with creating minimal vendors, and the hof tool has support for building such a vendor via hof mod vendor.

But such an approach will neither scale for a larger CUE user base, nor an ecosystem of tools built on top of/with CUE and the cuelang.org/go/... APIs. Specifically such an approach:

  • is incredibly fragile and susceptible to human error;
  • delegates responsibility for definition and implementation of package versioning, a core aspect of working with reusable CUE packages, to the user/a third party tool;
  • does not help us improve the discoverability of CUE modules/packages, not least because such delegation might well lead to a fragmented approach;
  • is not CUE developer friendly: other tools are required for management and fetching of dependencies, indeed CUE developers will not necessarily be speaking a common language of versions and dependencies;
  • there is no clear definition of how a module/import path maps to a version control system that hosts that code in the general case;
  • does not natively support reproducible or verifiable evaluations.

Reimagining our example above, we want to:

  • declare a dependency on a version of the module that contains acme.com/quote, without needing to worry about how and where we fetch that code;
  • ensure the evaluation of example.com/blah is reproducible;
  • use cmd/cue to control the dependencies of the main module, including automatically fetching CUE modules when required;
  • verify that CUE modules we have fetched are authentic and have not been modified by an attacker
  • establish a common language for packages and versions that we can then use in other CUE projects and with other CUE developers/users

Requirements

The following requirements have driven our thoughts on why and how to add package versioning to CUE.

Package versioning in CUE must:

  • enable reproducible evaluations - two people independently evaluating the same CUE configuration should get the same answer;
  • ensure that a CUE configuration evaluates exactly the same way tomorrow as it does today;
  • enable verified evaluations, and support verifiable evaluations;
  • not remove the best parts of cmd/cue or cue/load: simplicity, speed, and understandability. In general, version management work must fade to the background, not be a day-to-day concern;
  • develop a common language for both CUE developers/users and our tools, so that they can all be precise when talking to each other about exactly which configuration should be built, run, or analysed;
  • reflect and support how CUE developers communicate change to their users.

For the remainder of this document we borrow slightly adjusted definitions of the following terms from the vgo proposal:

  • A reproducible evaluation is one that, when repeated, produces the same result.
  • A verifiable evaluation is one that records enough information to be precise about exactly how to repeat it.
  • A verified evaluation is one that checks that it is using the expected source code.

One important difference from the Go modules implementation is that the CUE implementation should not live entirely in cmd/cue (as it does in cmd/go). Instead the bulk of the implementation will lie within cue/load, with a change to cmd/cue effectively being the means by which the main module and its dependencies are controlled from the command line. That way, existing users of the cuelang.org/go/... APIs can continue to load and work with CUE instances, enjoying the same benefits that a seamless module experience will bring to cmd/cue. For tool authors who use the API but also require cmd/cue-esque control over CUE modules and dependencies, then cuelang.org/go/cmd/cue/cmd command instances can be created and run without requiring the user of their tool to have also installed CUE, as is the case today.

A key building block in CUE is the Go compatibility promise:

Packages intended for public use should try to maintain backwards compatibility as they evolve. The Go 1 compatibility guidelines are a good reference here: don't remove exported names, encourage tagged composite literals, and so on. If different functionality is required, add a new name instead of changing an old one. If a complete break is required, create a new package with a new import path.

Correspondingly, this proposal adopts exactly the same concept of semantic import versioning introduced with Go modules, and with it the import compatibility rule for CUE:

If an old package and a new package have the same import path, the new package must be backwards compatible with the old package.

Detailed Proposal

This section gives a brief overview of the proposal. Details are presented in the next section.

Throughout the rest of the proposal, the term "the loader" generally refers to the loading of modules and packages that happens through the use of cmd/cue or cue/load.

Rename cue get go to cue import go

In preparation for full module support in CUE, we will need to repurpose an existing cmd/cue command, renaming cue get go it to cue import go. The detail of this is covered in #646.

This change fits nicely with a recent change to the current semantics of cue get go. Prior to a616925, cue get go attempted to resolve its package arguments that were not fully satisfied by go.{mod,sum} automatically, via the use of go/packages (which itself uses cmd/go). This would result in changes to go.{mod,sum}.

With a616925 we have instead shifted to a model of requiring that a Go dependency of cue get go can be fully resolved via go.{mod,sum} without requiring further changes to either. This aligns with the new Go 1.16 default of cmd/go build commands assuming a read-only default.

This also aligns well with the new name cue import go: unlike cue get go, the command does not imply any fetching or resolution. cue import go will therefore fail in case any of its arguments cannot be fully resolved via the current go.{mod,sum}, and the user will need to run go get -d (or equivalent) to ensure full resolution is possible.

As part of this rename, cue import go will be changed to generate into the cue.mod/imp hierarchy: cue get go incorrectly generates files within the cue.mod/pkg hierarchy.

Declaring module dependencies

At the core of this proposal is the ability for a CUE module to declare dependencies on other CUE modules through versions. This will be done by an extended schema of cue.mod/module.cue files. Expanding our our example from earlier:

-- cue.mod/module.cue --
module: "example.com/blah/v2"

require: {
	"acme.com/quote": "v1.1.0"
}
-- cue.mod/sum.cue --
[
{ path: "acme.com/quote", what: "v1.1.0", sum: "h1:3LFP3629v+1aKXU5Q37mxmRxX/pIu1nijXydLShEq5I="},
{ path: "acme.com/quote", what: "v1.1.0/go.mod", sum: "h1:8Sl8LxpKi29FqWXR16WEFZRNSz3SoPzUzeMeY4+DwBQ=" },
{ path: "acme.com/quote", what: "v1.1.0/cue.mod/module.cue", sum: "h1:ed2f49a15b1743a9c1216ce5355698dc8a9f0b6aef44" },
]
-- go.mod --
module example.com/blah/v2

require (
	acme.com/quote v1.1.0
)
-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello

Note:

  • Our main module is, in this example, defined with the path example.com/blah/v2. Like Go modules, CUE modules adopt the concept of semantic import versioning. In this case, our move to a new major version indicates that we have made breaking changes compared to example.com/blah, and this is reflected in the module path. A consumer of the blah package at the root of this module would therefore write import "example.com/blah/v2".
  • We have dropped the vendor of acme.com/quote from the cue.mod/pkg directory, and instead replaced it with a require field in our module definition. Cryptographic sums in cue.mod/sum.cue allow us to verify the contents of the acme.com/quote module whenever it needs to be downloaded.
  • A go.mod file that looks very similar to the cue.mod/module.cue file now exists at the root of the CUE module. At least initially it is planned to host CUE modules on top of the existing Go modules infrastructure, specifically the module mirror at proxy.golang.org, and the checksum database at sum.golang.org. Declaring such a file is necessary for us to satisfy the GOPROXY protocol and existing checksumming mechanisms. The loader will automatically manage this file, effectively keeping it in sync with cue.mod/module.cue. More on this below

Using cmd/cue to control/list module dependencies

Whilst it would be possible to maintain cue.mod/module.cue, cue.mod/sum.cue and go.mod by hand, it would be an incredibly frustrating and error-prone process. Instead, cmd/cue will be modified to support controlling and listing module dependencies, and updating the files that declare those dependencies.

For example, in the course of developing example.com/blah/v2 we might well have run:

cue get acme.com/quote

This resolves acme.com/quote to the latest version of a module that provides the package acme.com/quote and modifies cue.mod/module.cue, cue.mod/sum.cue and go.mod accordingly to record the dependency. cue get is therefore directly analogous to go get in terms of its role in dependency management. cue get will, like cmd/go, support various version queries to control the dependencies being added, removed, upgraded or downgraded. For example @latest is the version query implied when, as above, no @$version is specified.

The cue list command, directly analogous to go list, will be added to provide information about CUE modules and packages. For example we could request information about the acme.com/quote package in JSON format as follows:

{
        "Dir": "/home/cueckoo/cue/modcache/acme.com/[email protected]",
        "ImportPath": "acme.com/quote",
        "Name": "quote",
        "Doc": "So fun CUE-related quotes",
        "Root": "/home/cueckoo/cue/modcache/acme.com/[email protected]",
        "Module": {
                "Path": "acme.com/quote",
                "Version": "v1.1.0",
                "Time": "2018-05-06T08:33:45Z",
                "Dir": "/home/cueckoo/cue/modcache/acme.com/[email protected]",
                "GoMod": "/home/cueckoo/cue/modcache/cache/download/acme.com/quote/@v/v1.1.0.mod"
                "CUEMod": "/home/cueckoo/cue/modcache/cache/download/acme.com/quote/@v/v1.1.0.cue"
        },
        "CUEFiles": [
                "quote.cue"
        ],
        "Imports": [
                "string"
        ],
        "Deps": [
                "string",
        ]
}

The cue mod command will be expanded with subcommands for more fine-grained control of cue.mod/module.cue, cue.mod/sum.cue (and go.mod indirectly). For example:

$ cue mod edit --replace=acme.com/quote=/path/to/acme.com/quote

would add a replace "directive" to cue.mod/module.cue, which tells the loader to load all versions of acme.com/quote from the directory /path/to/acme.com/quote. The resulting cue.mod/module.cue and go.mod would then look like this:

-- cue.mod/module.cue --
module: "example.com/blah/v2"

require: {
	"acme.com/quote": "v1.1.0"
}

replace: {
	{mod: path: "acme.com/mymod", target: "/path/to/acme.com/mymod"},
}
-- go.mod --
module example.com/blah/v2

require (
	acme.com/quote v1.1.0
)

replace (
	acme.com/mymod => /path/to/acme.com/mymod
)

The --mod flag will, for all cmd/cue commands that load/resolve package import paths, control the behaviour of resolution. For example cue list --mod=vendor $pkg will limit the resolution of the package pattern $pkg to the contents of the cue.mod/pkg directory, matching the behaviour of today. The default will be --mod=readonly, with cue get being the exception because its whole purpose is to modify the main module's dependencies.

Using cue/load and the cuelang.org/go/... API

The Go project entirely abstracts the loading of Go package and module information within cmd/go. go/packages exists as a wrapper for cmd/go to load Go packages for inspection and analysis.

CUE has a dedicated package for loading CUE instances, cue/load. This package is used by cmd/cue and users of the cuelang.org/go/... API. As such, there is no need for a go/packages equivalent, at least not equivalent functionality that wraps cmd/cue.

Users of the cue/load package will benefit from seamless module support, with the cue/load.Config type being expanded with options relevant to controlling module resolution. The following example demonstrates how a custom proxy serving private modules would be set during the load process, with a specification that the sumdb should not be consulted for those private import paths:

package main

import (
	"fmt"

	"cuelang.org/go/cue"
	"cuelang.org/go/cue/load"
)

func main() {
	cfg := &load.Config{
		Proxy:   "https://mycompany.com;https://proxy.golang.org;direct",
		NoSumDB: "*.mycompany.com",
	}
	bps := load.Instance([]string{"mycompany.com/quote"}, cfg)
	is := cue.Build(bps)
	fmt.Printf("%v\n", is[0].Value())

}

Use of a trusted module mirror and checksum database

In August 2019, the Go team at Google launched the Go module mirror and checksum database were launched. As part of the Go 1.13 launch, cmd/go used both by default. There are some important points to note about this setup:

  • It is entirely possible to turn off use of the proxy and checksum database; this is covered in the Go module reference documentation.
  • It is entirely possible (and indeed encouraged) for others to host alternative implementations of the proxy and sumdb protocols. Athens, for example, provides a customisable server for self hosting a Go proxy.
  • There is fully support for the concept of private modules, in various permutations of coexistence with public modules. This is covered in full detail in the Go module reference documentation.
  • There is clear documentation on the privacy aspects of using proxy.golang.org and sum.golang.org in the privacy statement.

The use of a checksum database is a key component of verified and verifiable builds, or evaluations in CUE terms, and hence trusted reproducible builds (evaluations).

As an interim measure, we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default for cmd/cue and cue/load, but entirely configurable, just like cmd/go. We have permission from the Go team for this proposed use of these services.

As covered above, this places one noticeable constraint on the CUE module implementation: that we must include a go.mod file at the root of a CUE module. This is required for our use of proxy.golang.org and sum.golang.org to satisfy the GOPROXY and sumdb protocols, but also crucially to indicate to the proxy (which runs cmd/go) that a repository is Go module-aware, as opposed to not (remember, we don't have such a consideration in CUE). This last point is significant in the case of multi-module repositories, which we intend to support in CUE.

Use of the proxy and checksum database will largely fade into the background for users of cmd/cue and cue/load.

For example, under a default configuration and assuming that acme.com/quote is public, the following command:

$ cue get acme.com/quote

would:

  • query proxy.golang.org for the latest version of acme.com/quote (which we know to be the module providing the package at the same import path);
  • verify checksum of the resulting module against sum.golang.org in the case this is the first time our CUE module cache has seen the module;
  • repeat this process of querying and checking for the transitive dependencies of acme.com/quote as the MVS algorithm proceeds

Using cue env to understand the current cmd/cue configuration

The cue env command will be added to show the current configuration to the user, for example:

CUEENV="/home/cueckoo/.config/go/env"
CUEFLAGS=""
CUEMODCACHE="/home/cueckoo/cue/modcache"
CUENOPROXY=""
CUENOSUMDB=""
CUEPRIVATE=""
CUEPROXY="https://proxy.golang.org,direct"
CUESUMDB="sum.golang.org"
CUETMPDIR=""
CUEVCS=""
CUEVERSION="v0.3.0-beta.6"
CUEMOD="/path/to/my/module"

Much like cmd/go, the setting of an explicit environment variable will have the highest precedence, else if a variable is not set then a configuration default can be set in the file located at cue env CUEENV using the cue env -w command. See the cmd/go environment variable documentation for more details.

Detailed design

Finding a module for a module path

The details in this section largely follow the pattern established with Go module and package resolution.

When the loader needs to resolve an import path to a package and/or module (the concept of a module proxy is fully introduced below), they start by locating the repository that contains the module.

If the module path has a VCS qualifier (one of .bzr, .fossil, .git, .hg, .svn) at the end of a path component, the loader will use everything up to that path qualifier as the repository URL. For example, for the module example.com/foo.git/bar, the loader will download the repository at example.com/foo.git using git, expecting to find the module in the bar subdirectory. The loader will guess the protocol to use based on the protocols supported by the version control tool.

If the module path does not have a qualifier, the loader sends an HTTP GET request to a URL derived from the module path with a ?cue-get=1 query string. For example, for the module acme.com/quote, the loader will send the following request:

https://acme.com/quote?cue-get=1

The loader follows redirects but otherwise ignores response status codes, so the server may respond with a 404 or any other error status. CUE will not support an equivalent of GOINSECURE.

The server must respond with an HTML document containing a <meta> tag in the document's <head>. The <meta> tag should appear early in the document to avoid confusing the loader's restricted parser. In particular, it should appear before any raw JavaScript or CSS. The <meta> tag must have the form:

<meta name="cue-import" content="root-path vcs repo-url">

root-path is the repository root path, the portion of the module path that corresponds to the repository's root directory. It must be a prefix or an exact match of the requested module path. If it's not an exact match, another request is made for the prefix to verify the <meta> tags match.

vcs is the version control system. It must be one of bzr, fossil, git, hg, svn, mod. The mod scheme instructs the loader to download the module from the given URL using the CUEPROXY protocol (see later). This allows developers to distribute modules without exposing source repositories.

repo-url is the repository's URL. If the URL does not include a scheme (either because the module path has a VCS qualifier or because the <meta> tag lacks a scheme), the loader will try each protocol supported by the version control system. For example, with Git, the loader will try https:// then git+ssh://. Insecure protocols (like http:// and git://) are not supported.

As an example, consider acme.com/quote again. The loader sends a request to https://acme.com/quote?cue-get=1. The server responds with an HTML document containing the tag:

<meta name="cue-import" content="acme.com/quote git https://github.com/acme.com/quote">

From this response, the loader will use the Git repository at the remote URL https://github.com/acme.com/quote.

As the CUE modules implementation is based heavily on the Go modules implementation, if a server fails to respond with an appropriate <meta> tag using the query ?cue-get=1, the loader will then attempt to fallback via a query ?go-get=1. This means that any server that needs to distinguish the hosting information for Go and CUE modules can do so, whilst not placing an undue burden on existing infrastructure to add support for ?cue-get=1 queries from day one (the assumption being that generally speaking hosting of CUE and Go modules will generally be aligned in terms of VCS systems). For example, GitHub and other popular hosting services respond to ?go-get=1 queries for all repositories, so no server reconfiguration is necessary for CUE modules hosted at those sites. Over time it is envisaged that such code hosting sites would add support for ?cue-get=1 queries.

After the repository URL is found, the loader will clone the repository into the module cache. In general, the loader tries to avoid fetching unneeded data from a repository. However, the actual commands used vary by version control system and may change over time. For Git, the loader can list most available versions without downloading commits. It will usually fetch commits without downloading ancestor commits, but doing so is sometimes necessary.

Versions of modules

Much like Go, a version identifies an immutable snapshot of a module, which may be either a release or a pre-release. Each version starts with the letter v, followed by a semantic version. See the Go module reference for more detail, and Semantic Versioning 2.0.0 for details on how versions are formatted, interpreted, and compared.

Therefore, as we have seen in cue.mod/module.cue examples above, a module path and version together form the basis of a declared dependency.

Module authors explicitly release new versions by defining a semantic version tag within the repository that hosts the module, a tag that indicates which revision should be checked out for that version. For example, as the author of the example.com/blah/v2 module, we would, in a local clone of the repository behind example.com/blah/v2, do something like:

$ pwd
/path/to/example.com/blah
$ git log --oneline -1
7c5d28e6 (HEAD) deps: upgrade to latest acme.com/quote version
$ cue list -m 
example.com/blah/v2
$ git tag v2.0.1
$ git push origin v2.0.1

Another module looking to depend on example.com/blah/v2 would then be able to run:

$ cue get example.com/blah/v2@latest

and that would resolve to v2.0.1, specifically the revision 7c5d28e.

CUE modules also adopt the concept of a pseudo-version. A pseudo-version is a specially formatted pre-release version that encodes information about a specific revision in a version control repository. For example, v0.0.0-20191109021931-daa7c04131f5 is a pseudo-version. Pseudo versions are used when canonical semantic tagged versions are not available, for example the user wanting to depend on a recent fix pushed to the main branch of a project. Pseudo-versions follow exactly the same model as implemented for Go modules: see the explanation of pseudo versions and details of how pseudo versions map to commits for more specific information.

Declaring dependencies on other modules

The earlier example of the main module example.com/blah/v2 showed a sketch of the schema of cue.mod/module.cue. This is now presented more fully:

#ModuleDef: {
	module: string
	
	// The cue directive sets the expected CUE version for the module
	cue: string

	// A require directive declares a minimum required version of a given module dependency
	require: [...#Require]

	// An exclude directive prevents a module version from being loaded by the loader.
	exclude: [string]: string

	// A replace directive replaces the contents of a specific version of a module, or all 
	// versions of a module, with contents found elsewhere.
	replace: [...#Replace]
	
	// A retract directive indicates that a version or range of versions of the module 
	// defined by go.mod should not be depended upon.
	retract: [...#Retract]
}

#Require: {
	path:     string
	version:  string
	indirect: bool	
}

#Replace: {
	old: #Module
	new: #Module
}

#Module: {
	path:    string
	version: string
}

#Retract: {
	low:       string
	high:      string
	rationale: string
}

With the obvious exception that the representation is different (in CUE a module is defined in CUE itself, whereas in Go go.mod files have their own syntax), each of the directives supported in go.mod files have corresponding fields and meanings in a cue.mod/module.cue file:

Building on the short descriptions in the schema above, the links above cover in more detail what each means and when they should be used.

To support the retract directive we will provide a builtin semver package that allows for the specification of ranges:

module: "example.com/blah"

import "semver"

retract: [
	"v1.1.0", 
	semver.GreaterThanEqual("v1.2.0") & semver.LessThan("v1.3.0"), 
]

Similarly, the cue.mod/sum.cue file would have the following schema:

#Sum: [...#SumEntry]

#SumEntry: {
	// path is the module path
	path: string 

	// what is the aspect of the module for which we have a cryptographic sum
	// e.g. "v1.1.0" means the sum represents the sum of the module itself, 
	// "v1.1.0/go.mod" means the sum refers to the go.mod file only
	what: string  
	
	// sum is the crytographic sum. The format of the sum is described in 
	// https://golang.org/ref/mod#go
	sum: string
}

Question: do we really need/want to have cue.mod/sum.cue in CUE format? Or would the go.sum format suffice?

The format of the go.mod file is described in the Go module reference documentation.

Module proxy

As discussed above, as an interim measure we propose using proxy.golang.org as a module mirror, and the checksum database sum.golang.org for authentication, until such time as the CUE project can host such services itself. Use of these services will be enabled by default in the loader. The following CUE environment variables will control use of the proxy and checksum database (this follows almost identically from the Go module environment variables):

CUENOPROXY

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes that should always be fetched directly from version control repositories, not from module proxies.

If CUENOPROXY is not set, it defaults to CUEPRIVATE.

CUENOSUMDB

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes for which the loader should not verify checksums using the checksum database.

If CUENOSUMDB is not set, it defaults to CUEPRIVATE.

CUEPRIVATE

Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes that should be considered private. CUEPRIVATE is a default value for CUENOPROXY and CUENOSUMDB. CUEPRIVATE also determines whether a module is considered private for CUEVCS (see below).

CUEPROXY

List of module proxy URLs, separated by commas (,) or pipes (|). When the loader looks up information about a module, it contacts each proxy in the list in sequence until it receives a successful response or a terminal error. A proxy may respond with a 404 (Not Found) or 410 (Gone) status to indicate the module is not available on that server.

The loader's error fallback behaviour is determined by the separator characters between URLs. If a proxy URL is followed by a comma, the loader falls back to the next URL after a 404 or 410 error; all other errors are considered terminal. If the proxy URL is followed by a pipe, the loader falls back to the next source after any error, including non-HTTP errors like timeouts.

CUEPROXY URLs may have the schemes https or file. If a URL has no scheme, https is assumed. A module cache may be used directly as a file proxy:

GOPROXY=file://$(cue env GOMODCACHE)/cache/download

Two keywords may be used in place of proxy URLs:

  • off: disallows downloading modules from any source.
  • direct: download directly from version control repositories instead of using a module proxy.

CUEPROXY defaults to https://proxy.golang.org,direct. Under that configuration, the loader first contacts the Go module mirror run by Google, then falls back to a direct connection if the mirror does not have the module. See https://proxy.golang.org/privacy for the mirror's privacy policy. The CYEPRIVATE and CUENOPROXY environment variables may be set to prevent specific modules from being downloaded using proxies.

CUESUMDB

Identifies the name of the checksum database to use and optionally its public key and URL. For example:

CUESUMDB="sum.golang.org"
CUESUMDB="sum.golang.org+<publickey>"
CUESUMDB="sum.golang.org+<publickey> https://sum.golang.org

The loader knows the public key of sum.golang.org and also that the name sum.golang.google.cn (available inside mainland China) connects to the sum.golang.org database; use of any other database requires giving the public key explicitly. The URL defaults to https:// followed by the database name.

CUESUMDB defaults to sum.golang.org, the Go checksum database run by Google. See https://sum.golang.org/privacy for the service's privacy policy.

If CUESUMDB is set to off the checksum database is not consulted, and all unrecognised modules are accepted, at the cost of giving up the security guarantee of verified repeatable downloads for all modules. A better way to bypass the checksum database for specific modules is to use the CUEPRIVATE or CUENOSUMDB environment variables.

Module versioning

(This follows directly from the Go module reference

Starting with major version 2, module paths must have a major version suffix like /v2 that matches the major version. For example, if a module has the path example.com/mod at v1.0.0, it must have the path example.com/mod/v2 at version v2.0.0.

Major version suffixes are not allowed at major versions v0 or v1. There is no need to change the module path between v0 and v1 because v0 versions are unstable and have no compatibility guarantee. Additionally, for most modules, v1 is backwards compatible with the last v0 version; a v1 version acts as a commitment to compatibility, rather than an indication of incompatible changes compared with v0.

Minimal version selection

It is proposed that, like Go, minimal version selection (MVS) will be used as the algorithm to select a set of module versions to use when evaluating packages. MVS is described in detail in Minimal Version Selection by Russ Cox. The detail of the algorithm is not covered here.

MVS operates on a directed graph of modules, specified with go.mod files. Each vertex in the graph represents a module version. Each edge represents a minimum required version of a dependency, specified using a require directive. replace and exclude directives in the main module's go.mod file modify the graph.

Resolving a package to a module

When the loader loads a package using a package path, it needs to determine which module provides the package. CUE will follow exactly the same model for resolving a package to a module in this respect, with the obvious substitution replacing "the go command" with "the loader", and GO* with CUE* environment module-related environment variables. Whilst CUE does not have the equivalent of GOOS or GOARCH, it will similarly ignore file level build constraints of the form @if during this resolution.

Changes to cmd/cue

This section introduces changes that will be made to cmd/cue. Most of these changes are effectively a front-end to changes that will be made to cue/load; those changes are discussed in the next section.

All cmd/cue commands that load information about packages will become module-aware:

  • cue cmd
  • cue def
  • cue eval
  • cue export
  • cue fix
  • cue fmt
  • cue get
  • cue import
  • cue trim
  • cue vet

The --mod flag is understood by module-aware commands and controls the resolution of packages in the following way (like cmd/go):

  • --mod=mod tells the loader to ignore the cue.mod/pkg vendor directory and to automatically update cue.mod/module.cue (and go.mod), for example, when an imported package is not provided by any known module.
  • --mod=readonly tells the loader to ignore the cue.mod/pkg vendor directory and to report an error if cue.mod/module.cue (or go.mod) needs to be updated.
  • --mod=vendor tells the loader to use the cue.mod/pkg vendor directory. In this mode, the loader will not use the network or the module cache.

By default, if a cue.mod/pkg vendor directory is present at the module root, the loader acts as if --mod=vendor were used. Otherwise, the loader acts as if --mod=readonly were used.

Module-aware commands will also understand --modpath as a means of specifying an alternative path at which a cue.mod directory can be found (and correspondingly read cue.mod/module.cue and associated files from).

Like cmd/go, the --modcacherw flag instructs the loader to create new directories in the module cache with read-write permissions instead of making them read-only.

We now move on to talk in more details about changes to cmd/cue commands. For users of the cuelang.org/go/... APIs, programmatically creating and running command instances via cuelang.org/go/cmd/cue/cmd will remain possible and will be fully module-aware.

For any command that talks about modifying cue.mod/module.cue, it should be assumed an identical change will be made to the module go.mod file, unless specified otherwise. The same generally applies for any action of the loader that would modify any of these files.

Much of the proposal regarding cmd/cue commands is, unsurprisingly, heavily based on the cmd/go-equivalent commands.

cue get

cue get [-u] [packages]

cue get updates module dependencies in the cue.mod/module.cue file for the main module.

Unlike cmd/go we have no need to establish the -d flag because there is no concept of building/installing the module/package we have just fetched. Nor do we have a need for -t until cue test. Otherwise the command will behave like go get:

  • The first step is to determine which modules to update. cue get accepts a list of packages, package patterns, and module paths as arguments.
  • If a package argument is specified, cue get updates the module that provides the package.
  • If a package pattern is specified (for example, all or a path with a ... wildcard), cue get expands the pattern to a set of packages, then updates the modules that provide the packages
  • If an argument names a module but not a package (for example, the module acme.com/nopkg has no package in its root directory), cue get will update the module
  • Each argument may include a version query suffix indicating the desired version, as in cue get acme.com/[email protected]. A version query suffix consists of an @ symbol followed by a version query, which may indicate a specific version (v1.1.0), a version prefix (v1.1), a branch or tag name (main), a revision (1234abcd), or one of the special queries latest, upgrade, patch, or none. If no version is given, cue get uses the @upgrade query.
  • A module requirement may be removed using the version suffix @none. This is a special kind of downgrade. Modules that depend on the removed module will be downgraded or removed as needed. A module requirement may be removed even if one or more of its packages are imported by packages in the main module. In this case, the next build command may add a new module requirement.

Once cue get has resolved its arguments to specific modules and versions, cue get will add, change, or remove require directives in the main module's cue.mod/module.cue file to ensure the modules remain at the desired versions in the future. Note that required versions in cue.mod/module.cue files are minimum versions and may be increased automatically as new dependencies are added. See Minimal version selection (MVS) for details on how versions are selected and conflicts are resolved by module-aware commands.

cue get then proceeds along the lines described in the go get documentation.

cue list

cue list [-f format] [-json] [-m] [list flags] [packages]

cue list lists the named packages, one per line. The most commonly-used flags are -f and -json, which control the form of the output printed for each package. Other list flags, documented below, control more specific details.

The default output shows the package import path:

$ cue list acme.com/quote
acme.com/quote

The --f flag specifies an alternate format for the list, using the syntax of package text/template. The default output is equivalent to --f '{{.ImportPath}}'. The struct being passed to the template is:

type Package struct {
    Dir           string   // directory containing package sources
    ImportPath    string   // import path of package in dir
    ImportComment string   // path in import comment on package statement
    Name          string   // package name
    Doc           string   // package documentation string
    Target        string   // install path
    Builtin       bool     // is this package a builtin?
    Module        *Module  // info about package's containing module, if any (can be nil)

    // Source files
    CUEFiles        []string   // .cue source files
    IgnoredCUEFiles []string   // .cue source files ignored due to build constraints

    // Dependency information
    Imports      []string          // import paths used by this package
    ImportMap    map[string]string // map from source import to ImportPath (identity entries omitted)
    Deps         []string          // all (recursively) imported dependencies

    // Error information
    Incomplete bool            // this package or a dependency has an error
    Error      *PackageError   // error loading package
    DepsErrors []*PackageError // errors loading dependencies
}

With error information defined as:

type PackageError struct {
    ImportStack   []string // shortest path from package named on command line to this one
    Pos           string   // position of error (if present, file:line:col)
    Err           string   // the error itself
}

The Module struct type is defined as below for cue list -m.

cue list -m

cue list -m [-u] [-retracted] [-versions] [list flags] [modules]

cue list -m lists information about CUE modules. The --m flag lists information about modules and not packages.

The --json flag prints JSON-encoded output according to the struct type:

type Module struct {
    Path      string       // module path
    Version   string       // module version
    Versions  []string     // available module versions (with -versions)
    Replace   *Module      // replaced by this module
    Time      *time.Time   // time version was created
    Update    *Module      // available update, if any (with -u)
    Indirect  bool         // is this module only an indirect dependency of main module?
    Dir       string       // directory holding files for this module, if any
    GoMod     string       // path to go.mod file for this module
    CUEVersion string       // go version used in module
    Error     *ModuleError // error loading module
}

type ModuleError struct {
    Err string // the error itself
}

As an alternative to --json, the --f flag specifies

--u adds information about available upgrades

The --versions flag causes list to set the module's Versions field to a list of all known versions of that module, ordered according to semantic versioning, lowest to highest.

--retracted flag instructs list to show retracted versions in the list printed with the -versions flag and to consider retracted versions when resolving version queries

cue mod download

cue mod download [-json] [-x] [modules]

The cue mod download command downloads the named modules into the module cache. Arguments can be module paths or module patterns selecting dependencies of the main module or version queries of the form path@version. With no arguments, download applies to all dependencies of the main module.

The loader will automatically download modules as needed during ordinary execution. The cue mod download command is useful mainly for pre-filling the module cache or for loading data to be served by a module proxy.

By default, download writes nothing to standard output. It prints progress messages and errors to standard error.

The --json flag causes download to print a sequence of JSON objects to standard output, describing each downloaded module (or failure), corresponding to this Go struct:

type Module struct {
    Path     string // module path
    Version  string // module version
    Error    string // error loading module
    Info     string // absolute path to cached .info file
    GoMod    string // absolute path to cached .mod file
    Zip      string // absolute path to cached .zip file
    Dir      string // absolute path to cached source root directory
    Sum      string // checksum for path, version (as in go.sum)
    GoModSum string // checksum for go.mod (as in go.sum)
}

The --x flag causes download to print the commands download executes to standard error.

cue mod edit

cue mod edit [editing flags] [-fmt|-print|-json] [go.mod]

Example:

# Add a replace directive.
$ cue mod edit -replace example.com/[email protected]=./a

# Remove a replace directive.
$ cue mod edit -dropreplace example.com/[email protected]

# Set the go version, add a requirement, and print the file
# instead of writing it to disk.
$ cue mod edit -go=1.14 -require=example.com/[email protected] -print

# Format the go.mod file.
$ cue mod edit -fmt

# Format and print a different .mod file.
$ cue mod edit -print tools.mod

# Print a JSON representation of the go.mod file.
$ cue mod edit -json

The cue mod edit command provides a command-line interface for editing and formatting cue.mod/module.cue files, for use primarily by tools and scripts. cue mod edit reads only one cue.mod/module.cue file; it does not look up information about other modules. By default, cue mod edit reads and writes the cue.mod/module.cue file of the main module, but a different target file can be specified after the editing flags. All changes to cue.mod/module.cue files made via cue mod edit will be reflected in a CUE module's go.mod file.

The editing flags specify a sequence of editing operations.

  • The --module flag changes the module's path (the cue.mod/modue.cue file's module directive).
  • The --cue=version flag sets the expected Go language version.
  • The --require=path@version and --droprequire=path flags add and drop a requirement on the given module path and version. Note that --require overrides any existing requirements on path. These flags are mainly for tools that understand the module graph. Users should prefer cue get path@version or cue get path@none, which make other go.mod adjustments as needed to satisfy constraints imposed by other modules
  • The --exclude=path@version and --dropexclude=path@version flags add and drop an exclusion for the given module path and version. Note that --exclude=path@version is a no-op if that exclusion already exists.
  • The --replace=old[@v]=new[@v] flag adds a replacement of the given module path and version pair. If the @v in old@v is omitted, a replacement without a version on the left side is added, which applies to all versions of the old module path. If the @v in new@v is omitted, the new path should be a local module root directory, not a module path. Note that --replace overrides any redundant replacements for old[@v], so omitting @v will drop replacements for specific versions.
  • The --dropreplace=old[@v] flag drops a replacement of the given module path and version pair. If the @v is provided, a replacement with the given version is dropped. An existing replacement without a version on the left side may still replace the module. If the @v is omitted, a replacement without a version is dropped.
  • The --retract=version and --dropretract=version flags add and drop a retraction for the given version, which may be a single version (like v1.2.3) or an interval (like [v1.1.0,v1.2.0]). Note that the --retract flag cannot add a rationale comment for the retract directive. Rationale comments are recommended and may be shown by cue list -m -u and other commands.

The editing flags may be repeated. The changes are applied in the order given.

cue mod graph

cue mod graph

The cue mod graph command prints the module requirement graph (with replacements applied) in text form. For example:

example.com/main example.com/[email protected]
example.com/main example.com/[email protected]
example.com/[email protected] example.com/[email protected]
example.com/[email protected] example.com/[email protected]
example.com/[email protected] example.com/[email protected]
example.com/[email protected] example.com/[email protected]

Each vertex in the module graph represents a specific version of a module. Each edge in the graph represents a requirement on a minimum version of a dependency.

cue mod graph prints the edges of the graph, one per line. Each line has two space-separated fields: a module version and one of its dependencies. Each module version is identified as a string of the form path@version. The main module has no @version suffix, since it has no version.

See Minimal version selection (MVS) for more information on how versions are chosen. See also cue list -m for printing selected versions and cue mod why for understanding why a module is needed.

cue mod init

cue mod init [name]

The cue mod init command initialises and writes a new cue.mod/module.cue file in the current directory, in effect creating a new module rooted at the current directory. The cue.mod directory must not already exist. A go.mod file will also be written to the current directory.

Per the current module docs, the use of a module is optional, but required if one wants to import files. The module name is required if a package within the module needs to import another package within the main module.

cue mod tidy

cue mod tidy [-e] [-v]

cue mod tidy ensures that the cue.mod/module.cue file (and by extension the go.mod file) matches the source code in the module. It adds any missing module requirements necessary to build the current module's packages and dependencies, and it removes requirements on modules that don't provide any relevant packages. It also adds any missing entries to cue.mod/sum.cue and removes unnecessary entries.

The -e flag causes cue mod tidy to attempt to proceed despite errors encountered while loading packages.

The -v flag causes cue mod tidy to print information about removed modules to standard error.

cue mod tidy works by loading all of the packages in the main module and all of the packages they import, recursively. cue mod tidy acts as if all build constraints are enabled, so it will consider @if constrained files even if those source files wouldn't normally be evaluated.

TODO: do we need the equivalent of the ignore build constraint exception?

Like the ... package pattern, cue mod tidy will not consider packages in the main module in directories named testdata or with names that start with . or _ unless those packages are explicitly imported by other packages.

Once cue mod tidy has loaded this set of packages, it ensures that each module that provides one or more packages either has a require directive in the main module's cue.mod/module.cue file or is required by another required module. cue mod tidy will add a requirement on the @latest version on each missing module. cue mod tidy will remove require directives for modules that don't provide any packages in the set described above.

cue mod tidy may also add or remove indirect fields on #Require directives. A #Require directive with indirect: true denotes a module that does not provide packages imported by packages in the main module. These requirements will be present if the module that imports packages in the indirect dependency has an incomplete cue.mod/module.cue file. They may also be present if the indirect dependency is required at a higher version than is implied by the module graph; this usually happens after running a command like cue get -u ./....

cue mod vendor

cue mod vendor [-e] [-v]

The cue mod vendor command constructs a directory named cue.mod/pkg that contains copies of all packages needed to support evaluations of packages in the main module. As with cue mod tidy and other module commands, build constraints (except for ignore ???) are not considered when constructing the cue.mod/pkg vendor directory.

When vendoring is enabled, the loader will load packages from the cue.mod/pkg vendor directory instead of downloading modules from their sources into the module cache and using packages those downloaded copies.

cue mod vendor also creates the file vendor/modules.txt that contains a list of vendored packages and the module versions they were copied from. When vendoring is enabled, this manifest is used as a source of module version information. When the cue command reads vendor/modules.txt, it checks that the module versions are consistent with cue.mod/module.cue (and go.mod). If either cue.mod/module.cue or go.mod changed since vendor/modules.txt was generated, cue mod vendor should be run again.

Note that cue mod vendor removes the cue.mod/pkg vendor directory if it exists before re-constructing it. Local changes should not be made to vendored packages. The cue command does not check that packages in the cue.mod/pkg vendor directory have not been modified, but one can verify the integrity of the cue.mod/pkg vendor directory by running cue mod vendor and checking that no changes were made.

The --e flag causes cue mod vendor to attempt to proceed despite errors encountered while loading packages.

The --v flag causes cue mod vendor to print the names of vendored modules and packages to standard error.

cue mod verify

cue mod verify

cue mod verify checks that dependencies of the main module stored in the module cache have not been modified since they were downloaded. To perform this check, cue mod verify hashes each downloaded module .zip file and extracted directory, then compares those hashes with a hash recorded when the module was first downloaded. cue mod verify checks each module in the evaluation list (which may be printed with cue list -m all.

If all the modules are unmodified, cue mod verify prints "all modules verified". Otherwise, it reports which modules have been changed and exits with a non-zero status.

Note that all module-aware commands verify that hashes in the main module's cue.mod/sum.cue file match hashes recorded for modules downloaded into the module cache. If a hash is missing from cue.mod/sum.cue (for example, because the module is being used for the first time), the loader verifies its hash using the checksum database (unless the module path is matched by CUEPRIVATE or CUENOSUMDB).

In contrast, cue mod verify checks that module .zip files and their extracted directories have hashes that match hashes recorded in the module cache when they were first downloaded. This is useful for detecting changes to files in the module cache after a module has been downloaded and verified. cue mod verify does not download content for modules not in the cache, and it does not use cue.mod/sum.cue files to verify module content. However, cue mod verify may download go.mod files in order to perform minimal version selection. It will use cue.mod/sum.cue to verify those files, and it may add cue.mod/sum.cue entries for missing hashes.

cue mod why

cue mod why [-m] [-vendor] packages...

cue mod why shows a shortest path in the import graph from the main module to each of the listed packages.

The output is a sequence of stanzas, one for each package or module named on the command line, separated by blank lines. Each stanza begins with a comment line starting with # giving the target package or module. Subsequent lines give a path through the import graph, one package per line. If the package or module is not referenced from the main module, the stanza will display a single parenthesised note indicating that fact.

The --m flag causes cue mod why to treat its arguments as a list of modules. cue mod why will print a path to any package in each of the modules. Note that even when --m is used, cue mod why queries the package graph, not the module graph printed by cue mod graph.

By default, cue mod why considers the graph of packages matched by the all pattern, which is the same set of packages matched by go mod vendor.

cue clean -modcache

cue clean [-modcache]

The --modcache flag causes cue clean to remove the entire module cache, including unpacked source code of versioned dependencies.

This is usually the best way to remove the module cache. By default, most files and directories in the module cache are read-only to prevent tests and editors from unintentionally changing files after they've been authenticated. Unfortunately, this causes commands like rm -r to fail, since files can't be removed without first making their parent directories writable.

The --modcacherw flag (accepted by module-aware commands) causes new directories in the module cache to be writable. To pass --modcacherw to all module-aware commands, add it to the GOFLAGS variable. GOFLAGS may be set in the environment or with cue env -w.

--modcacherw should be used with caution; developers should be careful not to make changes to files in the module cache. cue mod verify may be used to check that files in the cache match hashes in the main module's cue.mod/sum.cue file.

cue env

This is covered in the "Proposal" section above.

Required changes to cue/load

The main changes required in cue/load are additions to the cue/load.Config type. These are backwards compatible assuming that users of this type are, as advised by go vet, using keyed struct literals. All field additions below have corresponding "front end" flags/environment variables in cmd/cue.

type Config struct {
	// ***************
	// Existing fields
	// ***************
	
	Context *build.Context
	ModuleRoot string
	Module string
	Package string
	Dir string
	Tags []string
	AllCUEFiles bool
	BuildTags []string
	Tests bool
	Tools bool
	DataFiles bool
	StdRoot string
	ParseFile func(name string, src interface{}) (*ast.File, error)
	Overlay map[string]Source
	Stdin io.Reader
	
	// *************************
	// New module-related fields
	// *************************

	// Mod defines the module resolution mode. ModModeReadonly will be the 
	// default. 
	//
	// Corresponds to the --mod flag understood by module-aware commands.
	Mod ModMode

	// ModPath specifies an alternative path at which a cue.mod directory 
	// can be found (and correspondingly read cue.mod/module.cue and 
	// associated files from).
	//
	// Corresponds to the --modpath flag understood by module-aware commands.
	ModPath string

	// ModCacheRW instructs the loader to create new directories in the 
	// module cache with read-write permissions instead of making them read-only.
	//
	// Corresponds to the --modcacherw flag understood build module-aware commands.
	ModCacheRW bool

	// Proxy defines a list of module proxy URLs, separated by commas (`,`) or 
	// pipes (`|`). 
	Proxy string

	// NoProxy is a comma-separated list of glob patterns (in the syntax of 
	// Go's path.Match) of module path prefixes that should always be fetched 
	// directly from version control repositories, not from module proxies.
	NoProxy string

	// NoSumDB is a comma-separated list of glob patterns (in the syntax of Go's 
	// path.Match) of module path prefixes for which the go should not verify 
	// checksums using the checksum database.
	NoSumDB string
	
	// Private is a comma-separated list of glob patterns (in the syntax of Go's 
	// path.Match) of module path prefixes that should be considered private. 
	// Private is a default value for NoProxy and NoSumDB. Private also determines
	// whether a module is considered private for VCS.
	Private string

	// SumDB identifies the name of the checksum database to use and optionally 
	// its public key and URL
	SumDB string

	// VCS controls the set of version control tools the loader may use to 
	// download public and private modules (defined by whether their paths match a 
	// pattern in CUEPRIVATE) or other modules matching a glob pattern.
	VCS string
}

As is the case today, cmd/cue will use cue/load to load CUE instances, defaulting the values of modules-related cue/load.Config values from flags and environment variables (see "Environment variables").

For users of cue/load who want to mimic the behaviour of cmd/cue, a utility function that sets the modules-related fields of cue/load.Config to cmd/cue defaults (according to the values of environment variables and defaults) will be provided.

Version queries

We re-use the same concept of version queries as Go: https://golang.org/ref/mod#version-queries

A version query may be one of the following:

  • A fully-specified semantic version, such as v1.2.3, which selects a specific version. See Versions for syntax.
  • A semantic version prefix, such as v1 or v1.2, which selects the highest available version with that prefix.
  • A semantic version comparison, such as <v1.2.3 or >=v1.5.6, which selects the nearest available version to the comparison target (the lowest version for > and >=, and the highest version for < and <=).
  • A revision identifier for the underlying source repository, such as a commit hash prefix, revision tag, or branch name. If the revision is tagged with a semantic version, this query selects that version. Otherwise, this query selects a pseudo-version for the underlying commit. Note that branches and tags with names matched by other version queries cannot be selected this way. For example, the query v2 selects the latest version starting with v2, not the branch named v2.
  • The string latest, which selects the highest available release version. If there are no release versions, latest selects the highest pre-release version. If there no tagged versions, latest selects a pseudo-version for the commit at the tip of the repository's default branch.
  • The string upgrade, which is like latest except that if the module is currently required at a higher version than the version latest would select (for example, a pre-release), upgrade will select the current version.
  • The string patch, which selects the latest available version with the same major and minor version numbers as the currently required version. If no version is currently required, patch is equivalent to latest. Since Go 1.16, go get requires a current version when using patch (but the -u=patch flag does not have this requirement).

Release versions are preferred over pre-release versions. For example, if versions v1.2.2 and v1.2.3-pre are available, the latest query will select v1.2.2, even though v1.2.3-pre is higher. The <v1.2.4 query would also select v1.2.2, even though v1.2.3-pre is closer to v1.2.4. If no release or pre-release version is available, the latest, upgrade, and patch queries will select a pseudo-version for the commit at the tip of the repository's default branch. Other queries will report an error.

cmd/cue outside of a module context

Like today, cmd/cue will continue operate outside of a module context, and only fail if its arguments require resolution of non-builtins.

GOPROXY protocol

CUE module proxies will implement the GOPROXY protocol. Anyone looking to provide a CUE module proxy alternative to proxy.golang.org should consult the GOPROXY reference.

Version control systems

The loader may download module source code and metadata directly from a version control repository. Downloading a module from a proxy is usually faster, but connecting directly to a repository is necessary if a proxy is not available or if a module's repository is not accessible to a proxy (frequently true for private repositories). Git, Subversion, Mercurial, Bazaar, and Fossil are supported. A version control tool must be installed in a directory in PATH in order for the loader to use it.

To download specific modules from source repositories instead of a proxy, set the CUEPRIVATE or CUENOPROXY environment variables (or equivalent options in cue/load.Config). To configure the loader to download all modules directly from source repositories, set CUEPROXY to direct. See Environment variables for more information.

See https://golang.org/ref/mod#vcs for more details on the specifics of:

  • Finding a repository for a module path via go-import <meta> tags
  • Mapping versions to commits
  • Mapping pseudo-versions to commits
  • Mapping branches and commits to versions
  • Module directories within a repository

Controlling version control tools with CUEVCS

The loader's ability to download modules with version control commands like git is critical to the decentralized package ecosystem, in which code can be imported from any server. It is also a potential security problem if a malicious server finds a way to cause the invoked version control command to run unintended code.

The CUE module implementation will follow the same model of GOVCS to change the allowed version control systems for specific modules, via the variable CUEVCS. For example:

CUEVCS=github.com:git,evil.com:off,*:git|hg

With this setting, code with a module or import path beginning with github.com/ can only use git; paths on evil.com cannot use any version control command, and all other paths (* matches everything) can use only git or hg.

See the GOVCS reference documentation for more details.

Module zip files

Like Go, CUE module versions are distributed as .zip files. There is rarely any need to interact directly with these files, since the loader creates, downloads, and extracts them automatically from module proxies and version control repositories. However, it's still useful to know about these files to understand cross-platform compatibility constraints or when implementing a module proxy.

The cue mod download command downloads zip files for one or more modules, then extracts those files into the module cache. Depending on CUEPROXY and other environment variables, the loader may either download zip files from a proxy or clone source control repositories and create zip files from them. The --json flag may be used to find the location of download zip files and their extracted contents in the module cache.

CUE modules will be subject to the same constraints as Go modules with respect to file path and size constraints. See https://golang.org/ref/mod#zip-files for more details

Private modules

We establish the same support model for private modules as Go modules, with the following environment variables as substitutions:

  • CUEPROXY — list of module proxy URLs. The go command will attempt to download modules from each server in sequence. The keyword direct instructs the go command to download modules from version control repositories where they're developed instead of using a proxy.
  • CUEPRIVATE — list of glob patterns of module path prefixes that should be considered private. Acts as a default value for CUENOPROXY and CUENOSUMDB.
  • CUENOPROXY — list of glob patterns of module path prefixes that should not be downloaded from a proxy. The go command will download matching modules from version control repositories where they're developed, regardless of CUEPROXY.
  • CUENOSUMDB — list of glob patterns of module path prefixes that should not be checked using the public checksum database, sum.golang.org.
  • CUEINSECURE — list of glob patterns of module path prefixes that may be retrieved over HTTP and other insecure protocols.

The Go module documentation provides a complete set of scenarios covering the various permutations of public/private modules:

  • Private proxy serving all modules
  • Private proxy serving private modules
  • Direct access to private modules
  • Passing credentials to private proxies
  • Passing credentials to private repositories

It also provides a comprehensive explanation of how the loader (cmd/go in the case of the Go modules implementation) handles privacy concerns with respect to proxy requests. See the Go modules Privacy section for a comprehensive explanation.

Modules cache

Like Go, CUE will establish and use a user module cache, a directory where the loader stores downloaded module files. The default location of the module cache is $HOME/cue/modcache. To use a different location, set the CUEMODCACHE environment variable.

The cache may be shared by multiple CUE projects developed on the same machine. The loader will use the same cache regardless of the location of the main module. Multiple instances of the loader may safely access the same module cache at the same time.

For more detail on the module cache implementation, see the Go module cache reference.

Authenticating modules

The approach and implementation of authenticating modules follows exactly from the cmd/go implementation, and uses the checksum databse sum.golang.org by default for public modules.

When the loader downloads a module zip file or go.mod file (which is why this file is necessary for compatibility with the Go proxy and checksum models) into the module cache, it computes a cryptographic hash and compares it with a known value to verify the file hasn't changed since it was first downloaded. The loader reports a security error if a downloaded file does not have the correct hash.

For go.mod files, the loader computes the hash from the file content. For module zip files, the loader computes the hash from the names and contents of files within the archive in a deterministic order. The hash is not affected by file order, compression, alignment, and other metadata. See golang.org/x/mod/sumdb/dirhash for hash implementation details.

The loader compares each hash with the corresponding entry in the main module's cue.mod/sum.cue file. If the hash is different from the hash in cue.mod/sum.cue, the loader reports a security error and deletes the downloaded file without adding it into the module cache.

If the cue.mod/sum.cue file is not present, or if it doesn't contain a hash for the downloaded file, the loader may verify the hash using the checksum database, a global source of hashes for publicly available modules. Once the hash is verified, the loader adds it to cue.mod/sum.cue and adds the downloaded file in the module cache. If a module is private (matched by the CUEPRIVATE or CUENOSUMDB environment variables) or if the checksum database is disabled (by setting CUESUMDB=off), the loader accepts the hash and adds the file to the module cache without verifying it.

The module cache is usually shared by all CUE projects on a system, and each module may have its own cue.mod/sum.cue file with potentially different hashes. To avoid the need to trust other modules, the loader verifies hashes using the main module's cue.mod/sum.cue whenever it accesses a file in the module cache. Zip file hashes are expensive to compute, so the loader checks pre-computed hashes stored alongside zip files instead of re-hashing the files. The cue mod verify command may be used to check that zip files and extracted directories have not been modified since they were added to the module cache.

The format of cue.mod/sum.cue files is described above, and follows directly from the go.sum structure (albeit a different format). For details on the go.sum for and checksum databases, see the corresponding sections in the Go modules reference.

The cue.mod directory

Currently (the CUE world prior to this proposal) the contents of the cue.mod directory have the following function/semantics:

.
└── cue.mod
    ├── module.cue   - the module declaration
    ├── gen          - search path for CUE generated from third-party packages
    ├── pkg          - search path for third-party imports
    └── usr          - search path for user-maintained code to complement third-party packages

Given a non-main module import path acme.com/quote, the loader unifies the contents of the package values cue.mod/{gen,pkg,usr}/acme.com/quote.

Under this proposal, the only change to these semantics is that cue.mod/pkg becomes the equivalent of Go modules' vendor directory:

.
└── cue.mod
    ├── module.cue   - the module declaration
    ├── gen          - search path for CUE generated from third-party packages via cue generate
    ├── imp          - search path for CUE imported from third-party packages via cue import
    ├── pkg          - vendor for third-party imports
    └── usr          - search path for user-maintained code to complement third-party packages

If cue.mod/pkg exists, then the loader will expect to be able to load all non-main module imports from there - this also retains compatibility for the existing loading mechanism, i.e. when no go.mod file exists in the CUE module root. Given a non-main module import path acme.com/quote, the loader will unify the contents of the package values cue.mod/{gen,pkg,usr}/acme.com/quote as it does today.

If cue.mod/pkg does not exist, then the loader will resolve and load package dependencies from the module cache. Given a non-main module import path acme.com/quote, the loader unifies the contents of the package values cue.mod/{gen,usr}/acme.com/quote with, for example, $(cue env CUEMODCACHE)/acme.com/[email protected].

Environment variables

Explanation of the flags and environment variables (and config options in the case of cue/load) that control the loader's behaviour are covered elsewhere in this proposal. A full list is provided here for reference:

  • --mod - controls whether the loader can automatically update cue.mod/module.cue and associated files, or use cue.mod/pkg
  • --modpath - an alternative path at which a cue.mod directory can be found (and correspondingly read cue.mod/module.cue and associated files from)
  • CUEMODCACHE - the directory where the loader will store downloaded modules and related files
  • CUENOPROXY - Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes that should always be fetched directly from version control repositories, not from module proxies. If CUENOPROXY is not set, it defaults to CUEPRIVATE.
  • CUENOSUMDB - Comma-separated list of glob patterns (in the syntax of Go's path.Match) of module path prefixes for which the go should not verify checksums using the checksum database. If CUENOSUMDB is not set, it defaults to CUEPRIVATE.
  • CUEPRIVATE - Comma-separated list of glob patterns (in the syntax of Go's path.Match of module path prefixes that should be considered private. CUEPRIVATE is a default value for CUENOPROXY and CUENOSUMDB. CUEPRIVATE also determines whether a module is considered private for CUEVCS.
  • CUEPROXY - List of module proxy URLs, separated by commas (,) or pipes (|).
  • CUESUMDB - Identifies the name of the checksum database to use and optionally its public key and URL
  • CUEVCS - Controls the set of version control tools the go command may use to download public and private modules (defined by whether their paths match a pattern in CUEPRIVATE) or other modules matching a glob pattern.

The mapping from these flags and environment variables to cue/load.Config options is covered in "Required changes to cue/load".

Go modules and CUE modules coexisting

It is not unreasonable to imagine a CUE module and Go modules co-existing at the same path, sharing the same VCS repository. Indeed, with the native support for exporting CUE to Go, and importing CUE from Go, it seems very likely that these situations will arise. This proposal supports such a setup.

The versioning of both modules would be intrinsically linked by virtue of each module system sharing the same tagging scheme in the same repository. However, assuming a Go module and CUE module exist at the same root, it would not be possible to version the two separately within the same repository. On the assumption that co-existence of CUE and Go code implies a strong relationship between the two, with breaking changes in one almost certainly corresponding to breaking changes in the other, we don't foresee this being a problem.

Versioning the two modules separately would still be possible, but only in separate VCS repositories. This would be achieved by having the go-import and cue-import meta tags return different repository locations. See "Finding a module for a module path" for more information.

Where a Go and CUE module coexist in the same repository, there would be some redundancy insofar as a CUE module would contain Go code, and vice versa. We don't foresee any specific problems beyond the limited inefficiency in CPU, memory and storage terms of this redundancy.

The post-go.mod future

As indicated in the summary of this proposal, use of proxy.golang.org and sum.golang.org is intended as an interim measure until such time as the CUE project can host such services itself. The requirement for a go.mod file in a CUE module is tied to our use of those services.

When the CUE project can host such services itself, we will need to develop a CUEPROXY protocol, similar to the GOPROXY protocol, and hosting a service that speaks that protocol. We would then look to create a number of CUE releases where cmd/cue knows how to speak this protocol, at which point projects would be able to start the process of removing go.mod files from their CUE modules. The only challenge here being that a project would only be able to remove a go.mod if it could be sure there are no consumers who rely on using a cmd/cue version that does not speak the new protocol. However, adopting something akin to the Go release policy would be sensible in this respect: we would only advise go.mod files be removed when there are two major versions of CUE that support the new CUEPROXY protocol.

However, in the context of the previous section - "Go modules and CUE modules co-existing" - splitting out CUE dependencies from Go dependencies does appear to create an issue. Consider the following example, in a post-go.mod world, where CUE dependencies are not added to the go.mod file, and a go.mod file is not required as part of a CUE module:

-- cue.mod/module.cue --
module: "example.com/blah"

require: {
	"acme.com/quote": "v1.1.0"
}
-- go.mod --
module example.com/blah

require (
	acme.com/quote v1.2.0
	other.com/blah v1.5.0
)
-- blah.cue --
package blah

import "acme.com/quote"

x: quote.Hello
-- blah.go --
package main

import (
	"fmt"

	"acme.com/quote"
)

func main() {
	fmt.Println(quote.Hello)
}

(we elide go.sum and cue.mod/sum.cue files for simplicity).

Notes:

  • the go.mod file exists because we have also declared a Go module; hence we have two main modules, one Go, one CUE, and both coincide (have the same module directory root)
  • in this post-go.mod world, the go.mod file only reflects the dependencies of the Go module; the CUE dependencies are reflected in cue.mod/module.cue
  • acme.com/quote is both the path of a Go module and a CUE module
  • Because of the versioning constraint discussed above, both modules are versioned together: a tag of v1.2.0 in the source code repository that contains both creates versions v1.2.0 of both
  • The version of acme.com/quote required by the Go module (v1.2.0) is different that required by the CUE module (v1.1.0)

The problem is that the version of acme.com/quote resolved in go.mod is not guaranteed to be the same version of acme.com/quote resolved in cue.mod/module.cue. We essentially have two different instances of MVS running with different constraints.

Much of the time this might be totally innocuous. But version skew like this will almost certainly lead to issues, for example in the case of additive changes to an API.

Nor is this problem unique to the combination of Go and CUE. It is envisaged that languages other than Go will be supported via cue import and cue export. Each language will have its own (or indeed many) package versioning system equivalent. This scenario of a coincident Language X and CUE main module (or equivalent in Language X's terms) depending on coincident Language X and CUE modules is therefore more widespread: the version resolution algorithm of Language X is not guaranteed to arrive at the same revision as the CUE MVS algorithm.

Notable details/exceptions

Here is a general list of points that don't naturally fit under any other heading:

  • There is currently no need to support any special source code repository resolution cases like gopkg.in. However, such a position does not preclude doing so in the future

CUE & A

Why have you chosen to use {proxy,sum}.golang.org?

The Go module specification and implementation do not depend upon either proxy.golang.org or sum.golang.org. Indeed the GOPROXY and checksum protocols provide the necessary abstraction. Therefore the next phase of CUE module support does not need to be tied to either the module mirror at proxy.golang.org or checksum database at sum.golang.org. However, there are some significant advantages to doing so:

  • Fast, highly available, immutable module mirror
  • Immediate support for verifiable and reproducible evaluations, a key aspect of CUE
  • We can re-use much of the code in cmd/go...
  • We can leverage much of the documentation, experience and expertise from the Go project

These seem to outweigh the disadvantages:

  • needing to maintain a go.mod file at the root of each CUE module (which will be done entirely automatically)
  • potentially confusing that CUE modules should be using Go modules infrastructure

The most crucial point however is that the default of using proxy.golang.org and sum.golang.org within the loader is just that: a default. It is entirely possible to turn off use of both via CUEPROXY=off and CUESUMDB=off.

What about cue test?

As is covered elsewhere in this proposal, we have not concluded a design for cue test (this falls under #209). However, this proposal is entirely compatible with being extended to support _test.cue files, again following the pattern of _test.go files in Go.

What about a //go:embed CUE equivalent?

As of Go 1.16, cmd/go now supports including static files and file trees as part of the final executable, using the new //go:embed directive. See the documentation for the new embed package for details.

We are working on a proposal for how to support the concept of embedding in CUE. Much like cue test, this proposal is entirely compatible with (and indeed depends upon) this proposal, the next phase of modules.

What about a gorelease equivalent?

gorelease is an experimental tool that helps module authors avoid common problems before releasing a new version of a module.

Examples:

# Compare with the latest version and suggest a new version.
gorelease

# Compare with a specific version and suggest a new version.
gorelease -base=v1.2.3

# Compare with the latest version and check a specific new version for compatibility.
gorelease -version=v1.3.0

# Compare with a specific version and check a specific new version for compatibility.
gorelease -base=v1.2.3 -version=v1.3.0

gorelease analyzes changes in the public API and dependencies of the main module. It compares a base version with the currently checked out revision. Given a proposed version to release, gorelease reports whether the changes are consistent with semantic versioning.

Given the very nature of the problems that CUE looks to solve, it will be entirely possible to provide such a command to help CUE module authors. Much like gorelease is intended to become go release, the equivalent in the CUE world would likely be spelled cue release.

Will CUE packages start to pollute pkg.go.dev?

Whilst we have permission to use proxy.golang.org and sum.golang.org from the Go team until such time as the CUE project starts to host instances of a module mirror and checksum database itself, we should look to limit any unintended side effects. One such side effect would that pkg.go.dev (a Go module and package documentation and discovery site) uses index.golang.org (an index which serves a feed of new module versions that become available at proxy.golang.org). CUE modules would therefore start to "leak" into pkg.go.dev results. We will work with the pkg.go.dev team to ensure that the relevant heuristics for determining CUE-only modules.

How does this proposal relate to io/fs.FS?

Go 1.16 introduce a new io/fs package that defines the fs.FS interface, an abstraction for read-only trees of files. This package largely exists to support the new //go:embed feature, but does have other uses.

#607 raises the question of how the existing cue/load.Config.Overlay field might be used to supply an entire file system as input to cue/load. #607 (comment) clarifies that the current Overlay field exists to complement the the operating system file system, rather than replace it. As outlined in that comment, however, adding an io/fs.FS field to cue/load.Config would allow the intended semantics. This modules proposal is fully compatible with this proposal, but necessarily orthogonal to it.

What about verifiable evaluations?

One of the main differences between Go and CUE from a "build" perspective is that Go has complete control over build artefacts. Specifically, binaries that represent the compilation result of a main package. cmd/go includes sufficient module-related information (module path, version and checksum) in those binary artefacts so as to enable verifiable builds. runtime/debug.ReadBuildInfo() gives runtime access to that information; go version -m /path/to/binary allows it to be inspected. See Russ Cox's blog post for more information.

CUE does not have such control over its many output formats (JSON, Yaml, JSONSchema etc). An immediate consideration here would be that comments be used to encode similar module-related information. However, JSON for one does not support comments.

The Required fields and related issues proposal includes a section on cue export, and how that would be repurposed to be the inverse of cue import. The example presented there is as follows.

Given the CUE file:

a: 2 + 3
baz: {
    @export("baz.json")
    b: a
}
bar: {
    @export("jsonschema:/foo/bar/bar.json")
    string
}

cue export would then produce the following txtar output:

// File
import baz ":/foo/bar/baz.json”
import bar "jsonschema:/foo/bar/bar.json”

a: 5
"baz": { baz, @export("baz.json") }
"bar": { bar, @export("jsonschema:bar.yaml") }
-- baz.json --
b: 5
-- bar.yaml --
type: string

One option therefore would be to make a step towards verifiable evaluations by including sufficient module-related information (like that included Go binaries) in the txtar output of cue export.

Are there any alternatives to requiring a go.mod?

If we want to utilise and leverage the existing module mirror and checksum database, we don't see a way around the requirement of declaring a go.mod file in the root of a CUE module. The GOPROXY protocol requires that a regular go.mod file (i.e. a symlink is not sufficient) denote the root of a Go module, and that that file declares the module's requirements, retractions etc.

As noted above however, this is an interim measure until such time as the CUE project can host such services itself.

What about anonymous modules? Will they need a go.mod?

Anonymous modules can be created today via:

cue mod init

This creates a cue.mod/module.cue file as follows:

module: ""

Anonymous modules are useful because as an end user, i.e. a situation where you know a package within the module will never be a dependency of another module, coming up with a module name is an annoying problem.

Go modules do not support anonymous modules: every module must have a path. Hence we simply could not maintain a parallel go.mod file, matching the requirements listed in cue.mod/module.cue.

Therefore, for anonymous modules, cmd/cue and cue/load will not create or maintain a go.mod file.

Why is the module cache in the user's home directory?

Russ Cox provided excellent motivation for this decision in a GitHub discussion about the GOMODCACHE environment variable:

The module cache ($GOPATH/pkg/mod, defaulting to $HOME/go/pkg/mod) is for storing downloaded source code, so that every build does not redownload the same code and does not require the network or the original code to be available. The module cache holds entries that are like "if you need to download [email protected], here are the files you'd get." If the answer is not in the cache, you have to go out to the network. Maybe you don't have a network right now. Maybe the code has been deleted. It's not anywhere near guaranteed that you can redownload the sources and also get the same result. Hopefully you can, but it's not an absolute certainty like for the build cache. (The go.sum file will detect if you get a different answer on re-download, but knowing you got the wrong bits doesn't help you make progress on actually building your code. Also these paths end up in file-line information in binaries, so they show up in stack traces, and the like and feed into tools like text editors or debuggers that don't necessarily know how to trigger the right cache refresh.)

I expect there are cron jobs or other tools that clean $HOME/.cache periodically. If part of the build cache got deleted, it would be no big deal, so it's fine to store the build cache there. But if downloaded source code got deleted unasked, I think that would potentially be quite surprising and problematic in various ways. That's why we store the source code in $GOPATH/pkg/mod, to keep it away from more expendable data.

What is the timeline for CUE modules?

Once the CUE community has had a chance to consider and respond to this proposal, if there is broad agreement with the direction implementation would start immediately. As mentioned elsewhere in this proposal, we hope to reuse much of the cmd/go/internal/... implementation, as well as learning from and leveraging the vast experience of the Go team.

Coming up with a rough timeline and priority ordered list of work will be the first thing we do when starting work on this next phase of CUE module support.

Are submodules and multi-module repositories supported?

Yes. Although the same advice regarding both will exist in the CUE world. As a starting point the following advice from Russ Cox will likely hold true for the vast majority:

For all but power users, you probably want to adopt the usual convention that one repo = one module. It's important for long-term evolution of code storage options that a repo can contain multiple modules, but it's almost certainly not something you want to do by default.

For more details see https://github.com/golang/go/wiki/Modules#faqs--multi-module-repositories

What strategies exist for supporting multiple major versions of a module in parallel?

A corollary of the import compatibility rule:

If an old package and a new package have the same import path, the new package must be backwards compatible with the old package

is that any breaking changes in a module with major version number >=1 must be accompanied by an increase in major version number. This raises the question of how to support users of the now old major version - is it possible to support both at the same time?

Like with Go modules, developers will have two options when it comes to maintaining multiple major versions of a module in parallel:

  • major branch strategy
  • major subdirectory strategy

See the Go modules wiki entry on the topic as well as a mention in the Go modules reference.

Should the CUE version tags be namespaced?

Some of the early discussions about vgo (the Go modules prototype) questioned whether Go should distinguish the VCS used tags used to indicated versions of Go modules. The thinking being that v1.1.0 says nothing about the fact the tag corresponds to a version of a Go module. Alternatives included namespacing those tags, e.g. go:v1.1.0.

Link to those discussions

It is natural and appropriate to consider the same question in the context of designing CUE package versioning and modules.

However, if we choose to base our implementation on Go modules, using proxy.golang.org and sum.golang.org, then by definition we adopt the same approach to creating module versions: tagging with semantic versions, e.g. v1.1.0. Tags indicating Go versions and CUE versions will therefore be indistinguishable.

But as we cover under "Go modules and CUE modules coexisting", we don't see a problem with this "conflict" - indeed it very much aligns with the intentions of the author.

What about CUE that coexists with non-Go module aware Go code?

  • should be a fairly limited scenario
  • hence we will not support this initially
@verdverm
Copy link
Contributor

verdverm commented Mar 23, 2021

Generally like that Go's module system is the inspiration and that much can be copied over. However I have some concerns about reusing the go.mod file. (Which primarily stems from the desire to reuse existing Go infra? Is there another reason for it?)

it will also need to include a go.mod file that mirrors the cue.mod/module.cue file.

This seems like it will prevent many use cases for CUE + Go dependencies in the same project.

If I have a repository which is both CUE and Go, and the set difference of their respective dependencies is non-empty, then wouldn't the respective tidy commands step on each others feet when managing a "shared" go.mod (from only one tool's pov)?

Am I missing something here?

@myitcv
Copy link
Contributor Author

myitcv commented Mar 23, 2021

Generally like that Go's module system is the inspiration and that much can be copied over. However I have some concerns about reusing the go.mod file. (Which primarily stems from the desire to reuse existing Go infra? Is there another reason for it?)

The reasons are exclusively attributable to re-using the proxy.golang.org and sum.golang.org infrastructure. The answer to "Why have you chosen to use {proxy,sum}.golang.org?" has the rationale.

it will also need to include a go.mod file that mirrors the cue.mod/module.cue file.

This seems like it will prevent many use cases for CUE + Go dependencies in the same project.

If I have a repository which is both CUE and Go, and the set difference of their respective dependencies is non-empty, then wouldn't the respective tidy commands step on each others feet when managing a "shared" go.mod (from only one tool's pov)?

Am I missing something here?

At least as I envisage it, cue mod tidy will maintain a build-ignored .go file that captures the CUE imports. That way, a go mod tidy will not undo the work that a cue mod tidy has done. Indeed it will reach the same answer. If CUE code is modified in some way, a cue mod tidy will potentially be required, i.e. a go mod tidy by itself is insufficient (obviously) to resolve both CUE + Go dependencies, whereas a cue mod tidy can (and will) necessary cover both.

Perhaps I should be more explicit about this above? I had hoped to stash this file out of the way in cue.mod.

@nyarly
Copy link

nyarly commented Mar 23, 2021

I can't say that I receive the idea of reusing the Go modules scheme with delight.

In principle, MVS centralizes the control over the version of a package I'll use in my project; I've had a lot of success in lockfile-driven systems being able to exclude releases that were incompatible with my usage. MVS also assumes that package authors want to and and able to perfectly conform to semver; if the best intentioned authors sometimes break interfaces with a release. Finally, MVS makes strong assumptions about the quality of releases, but then defaults to a source-repo-as-package-repo approach.

In practice, I haven't found that vgo has enabled large Go projects the way that it was intended to. More and more, I work with container orchestration, and both the Kubernetes and Docker projects are a challenge to incorporate via gomodules. Incorporating version resolution into every Go task leads to a lot of confusing and frustrating issues. My intuition is that only local changes should influence the outcome of a build, but post gomodules, I find that any of my dependencies can break things.

Personally, I was having great success with the "official experiment" of godeps. It was a familiar and well-worn approach with version satisfaction and lockfiles. The process by which it was overturned felt (as a daily user of Go) authoritarian and ill-considered. It's possible that CUE avoids a number of these issues because it doesn't have a large legacy base to support, but I wanted to give voice to my trepidation.

@verdverm
Copy link
Contributor

In order for CUE to maintain both, will it be reading the Go code and resolving used imports? Is that even something CUE should be doing? Are there examples of other languages where they are crossing dependency management or repurposing dependency files?

Do we need a module proxy for a first module implementation? I'd argue that Go's ecosystem had been quite vibrant long before it had this. Having the sumdb for security & reproducibility is a lot easier. (hof mod for example uses the sum file w/o a remote at a minimum). A sumdb server would require far fewer compute resources as well.

w.r.t. the current sumdb format, it's a line oriented file & parser and super simple, though having it specified in CUE would not impede reusing the existing code

@verdverm
Copy link
Contributor

@nyarly I would say that MVS is a selection algorithm with deterministic results and that Go added some assumptions that align with the Go 1 compatibility guarantees. Go's module system does have exclude and retract clauses (https://golang.org/ref/mod#go-mod-file-exclude).

I think the semver topic is orthogonal to the MVS vs Lockfile, as npm / yarn dependencies are often specified in semver format. There are authors in all languages that will break semver rules. I don't think the dependency selection method helps here.

Anecdotally, as a daily Go user too, I welcomed the addition of the go mod command for not needing an additional tool for dependency management in Go, especially with the fractured ecosystem before go mod, as well as the MVS based system after frustrations with NPM/PIP and the drawbacks of lockfile systems.

@verdverm
Copy link
Contributor

@myitcv in the section "Changes to cmd/cue" there are several references to a vendor directory. To be clear, are you referring to cue.mod/pkg as the "vendor" directory (as stated elsewhere) such that it does not overlap with Go's own vendor directory? (i.e. missed edit after copying from Go's docs?)

@myitcv
Copy link
Contributor Author

myitcv commented Mar 23, 2021

@myitcv in the section "Changes to cmd/cue" there are several references to a vendor directory. To be clear, are you referring to cue.mod/pkg as the "vendor" directory (as stated elsewhere) such that it does not overlap with Go's own vendor directory? (i.e. missed edit after copying from Go's docs?)

Thanks for that catch. The proposal is that CUE's vendor directory will be cue.mod/pkg, yes. Unfortunately this slipped through the editing process: I worked on the draft in the wonderful https://stackedit.io... failed to update the Google Doc where we were reviewing this with this exact change. Updated.

@nyarly
Copy link

nyarly commented Mar 23, 2021

@verdverm Tempted though I am, I don't mean to re-litigate the go modules design here.

With regards to CUE though, my impression is that adopting much of that design is settled. If that's so, I don't want to foster acrimony by questioning those decisions here.

@morlay
Copy link

morlay commented Mar 25, 2021

New ?cue-get=1 may block us to implement. It need private vcs support too, like gitlab.

If we just want to get the repo root, i think we can reuse ?go-get=1, then we could let all vcs support.

at least, we may need ?go-get=1 as fallback when ?cue-get=1 return nothing.

github will return go-import meta

curl "https://github.com/cuelang/cue?cue-get=1"
<meta name="go-import" content="github.com/cuelang/cue git https://github.com/cuelang/cue.git">

gitlab only handle ?go-get=1, ?cue-get=1 will return normal page html

curl "https://gitlab.com/gitlab-org/gitlab-foss?go-get=1" 
<meta name="go-import" content="gitlab.com/gitlab-org/gitlab-foss git https://gitlab.com/gitlab-org/gitlab-foss.git" />

@morlay
Copy link

morlay commented Mar 25, 2021

Another question

Could we add non-cue requirement in module.cue too ?

For example, k8s.io/api/core/v1, the cue code will generated from go source.

With this feature, our workflow could be simple - copy & parse example cue code, and cue eval

My hack https://github.com/octohelm/cuemod add extractors when import. two ways will trigger generating:

Auto detect which extractor should use

When find the import path contains go codes. I will automate generate cue code into cue.mod/gen
https://github.com/octohelm/cuemod/blob/main/pkg/extractor/golang/extractor.go#L36-L42

declare replace directive with attribute @gen("")

replace: "github.com/rancher/local-path-provisioner/deploy/chart": "" @gen("helm")

in this way, may need to support to register custom extractors too.
or just add new directive gen

gen: go: "cue import go" // call `cue import go <source_dir> <gen_dir>` 

@verdverm
Copy link
Contributor

I believe core/v1 is a package within the module k8s.io/api. Dependencies are limited modules in Go and CUE would be the same I assume.

Auto-detect -> gen would break when I have repos which are both CUE and Go modules (I have several already and know that the "Go" files found would not parse (they are actually text/template files with a .go extension.

It does seem nice to have a way to cue import go but that would likely require a go get as well unless the CUE module had vendored the Go module already. This seems a bit much for a cue mod download command to be doing. I'm not sure that using CUE should require the Go binary to be available. cue import output could in theory depend on the CUE version used, so that could break the reproducibility. I would think for widely used libraries like k8s, that a dedicated repo for the generated module ought to be created. Since CUE is unable to generate 100%, there are additions that need to be made in the cue.mod/usr directory (notably port)

@myitcv
Copy link
Contributor Author

myitcv commented Mar 25, 2021

@verdverm. Thanks for your comments/feedback.

In order for CUE to maintain both, will it be reading the Go code and resolving used imports? Is that even something CUE should be doing? Are there examples of other languages where they are crossing dependency management or repurposing dependency files?

Yes, in the case of a Go + CUE module, cue mod tidy would read both. I think it's really in our gift to do what we like here, subject of course to specific concerns/problems/drawbacks with the proposal or its implementation.

Do we need a module proxy for a first module implementation?

This is of course the fundamental question when it comes to this proposal. The arguments in favour and against are laid out in "Why have you chosen to use {proxy,sum}.golang.org?". If there are points you feel are missing, or need stressing more clearly please let me know.

I'd argue that Go's ecosystem had been quite vibrant long before it had this.

This is not disputed. However, there are real benefits to using a proxy and sumdb. As I mention about though, it's whether those benefits outweigh the costs.

Having the sumdb for security & reproducibility is a lot easier. (hof mod for example uses the sum file w/o a remote at a minimum). A sumdb server would require far fewer compute resources as well.

I'm not clear what point/position you are advancing here. sum.golang.org fulfils a specific role, namely to fill in the gaps if a module does not already have a go.sum entry for a module. Not using sum.golang.org is entirely possible (but has implications), and would fall under the option of "do not use proxy.golang.org or sum.golang.org." So I again refer you to the benefits and costs of the decision one way or the other: having those lists as complete as possible will help us reach the "best" answer.

w.r.t. the current sumdb format, it's a line oriented file & parser and super simple, though having it specified in CUE would not impede reusing the existing code

Again I'm not clear whether you're arguing in favour of the current proposal to use a sum.cue file (indeed package), or suggesting the current go.sum format is superior. If you have specific points for/against that will help us reach the "best" answer here too.

@myitcv
Copy link
Contributor Author

myitcv commented Mar 25, 2021

@morlay - thanks for your comments and feedback.

New ?cue-get=1 may block us to implement. It need private vcs support too, like gitlab.

If we just want to get the repo root, i think we can reuse ?go-get=1, then we could let all vcs support.

at least, we may need ?go-get=1 as fallback when ?cue-get=1 return nothing.

I think you might have missed the bit in the proposal that talks about exactly this fallback:

As the CUE modules implementation is based heavily on the Go modules implementation, if a server fails to respond with an appropriate <meta> tag using the query ?cue-get=1, the loader will then attempt to fallback via a query ?go-get=1.

So any existing code host that replies to ?go-get=1 would work.

The loader would also, like cmd/go, have special cases for well-known code hosts.

Hopefully that allays your concerns on this point?

Could we add non-cue requirement in module.cue too ?

I'm not sure this would help, because other package management systems would then have to learn to read cue.mod/module.cue. And it starts to get fairly hairy quite quickly: see some of the discussion under the "The post-go.mod future" heading. That said, I don't think we need/want to start declaring dependencies on packages/equivalent in other languages from CUE.

Instead, per #646, cue get go will be renamed to cue import go. Per the comments in #621 (comment), cue import go will assume that Go dependencies can be resolved via a go.mod. This means that the caller of cue import go must maintain a go.mod for this resolution. We broadly see the same approach for other languages/setup if/when they get added: let the "other" package management system control the dependencies, then use a loader-like the equivalent of go/packages to resolve from an argument to cue import X to a directory on disk that contains the source code to import. This would of course require cmd/go to be available.

The alternative to generating from Go/other source code (and the tooling dependencies that implies) is for those projects to pre-generate CUE, and for the dependency to be on the CUE module. In which case no dependency on tooling etc exists.

add extractors when import

Apologies, I'm not entirely sure I follow what you are saying about extractors. I think (and please correct me if this is wrong) you are suggesting that steps could automatically be run when adding/loading a package dependency. If so, I don't think that's a direction we want to head in. The go generate proposal explicitly separates out the function of code generation from either go get or go build; that's a model I think we should emulate.

@myitcv
Copy link
Contributor Author

myitcv commented Mar 25, 2021

@verdverm

I would think for widely used libraries like k8s, that a dedicated repo for the generated module ought to be created. Since CUE is unable to generate 100%, there are additions that need to be made in the cue.mod/usr directory (notably port)

But in such a scenario of there being a specific CUE module (thereby obviating the need to generate from Go source) then there would be no requirement of the user of that module to have a cue.mod/usr override, correct? Because the CUE module dependency would contain any "additions" (assuming it was largely code-generated from Go source).

@verdverm
Copy link
Contributor

verdverm commented Mar 25, 2021

When it comes down to it, I do not want CUE touching my Go, (go.mod, go.sum, an arbitrary file added where CUE wants to) This is encroachment into my Go code by another language. When I look at my go.mod, is the dependency there because of Go or CUE? As a Go user, I do not like this, I do not like the idea that Go's systems are being polluted by another language (even though I love CUE). In fact, my own dependencies could polluted with non-go modules because one of my dependencies adopted CUE and pulled in a bunch of non-Go (CUE only) modules. Why is my go mod download now fetching non-Go modules? I'd imagine many other Go users will have similar concerns. It should be brought up with the larger Go community and not just the Go team. I have already been talking to non-CUE users and am not getting good feedback on the idea.

That said, I don't think we need/want to start declaring dependencies on packages/equivalent in other languages from CUE.

I fail to see how this statement aligns with the proposals idea for CUE modifying Go module files. Is this not what the go.mod idea is essentially doing?

In early conversations about CUE modules (iirc) @myitcv said that a first implementation would not have all of the bells and whistles. I think that any use of go.mod, GOPROXY, GOSUMDB should not be part of a first implementation and needs a much larger audience to comment on it, including the Go community.

@verdverm
Copy link
Contributor

Some more questions on possible conflicts with CUE's use of go.mod

  • would a go get cause cue.mod/module.cue to become out of sync?

Given a dual module (both Go and CUE), I see some issues with supporting the following:

  • I want to replace a Go module with a different repository, but this replace does not have the CUE module included
  • I want to replace a Go module with a different version, but not the CUE version. This could more typically manifest in local replaces with multiple people working on the same module. We want to use the latest stable release for the portion (Go v CUE) that we are not working on

@verdverm
Copy link
Contributor

Another possible edge case:

  1. My project has dependency D.
  2. D adopts CUE and uses pkg/embed to include their cue.mod/pkg dependencies (so they can ship schema with the library and not need to fetch over the network)
  3. My project updates D and now builds are failing

Essentially there is a dependency which needs a cue mod download even though I am not using CUE in my project. Maybe this is unavoidable if one uses D, or perhaps D has to vendor the directory at that point?

@verdverm
Copy link
Contributor

verdverm commented Mar 25, 2021

Scenario:

  • I have a Go service that runs in Kubernetes and talks to the API. I depend on the k8s.io/{api,apimachinery} Go modules.
  • Ideally, common k8s.io/... CUE modules are maintained. (This is driven by the port issue we see because of the custom marshal in k8s Go code. We'd want to add the correct definition to cue.mod/usr in the CUE module so that when people import this, their ports do not go unvalidated by the generated port: _)
  • My kubernetes service is obviously deployed with kubernetes config, and I'd like to use the official CUE k8s schemas.

But I don't see how the go.mod could support specifying both.


I suppose any CUE official k8s could have a different module path, but not necessarily required given replaces are possible

I think there are more situations around replace than those I outlined above, outside of a "dual module", where both a CUE and Go module share the same path, and the Go module is only using CUE in CI, not for part of the code

@morlay
Copy link

morlay commented Mar 26, 2021

@myitcv Sorry. I missed ?go-get=1 fallback.

I'm not sure this would help, because other package management systems would then have to learn to read cue.mod/module.cue. And it starts to get fairly hairy quite quickly: see some of the discussion under the "The post-go.mod future" heading. That said, I don't think we need/want to start declaring dependencies on packages/equivalent in other languages from CUE.

I understand this concern.

In my hack, i use new attr @vcs("release-1.9") to make it work well with go modules incompatible repos.

require: "github.com/istio/istio": "v0.0.0-20210205215922-b63e1966c245" @vcs("release-1.9")

But this is not work for other non-vcs-based pkg managment, like npm.

I agree, we prefer to use cue import X.
Or consider to push generated cue codes to some code host as you suggestted alternative.

Instead, per #646, cue get go will be renamed to cue import go. Per the comments in #621 (comment), cue import go will assume that Go dependencies can be resolved via a go.mod. This means that the caller of cue import go must maintain a go.mod for this resolution. We broadly see the same approach for other languages/setup if/when they get added: let the "other" package management system control the dependencies, then use a loader-like the equivalent of go/packages to resolve from an argument to cue import X to a directory on disk that contains the source code to import. This would of course require cmd/go to be available.

I implement another extractor https://github.com/octohelm/cuemod/tree/main/pkg/extractor/golang to generated cue defs from go codes without using go/packages.

so go.mod could be is not required, using go/ast & go/types is enough to extract type defs.

Howerver, we need cue import X to support other langurages too.
we may need to add another go.mod or package.json for imported pkgs, may just under cud.mod/,
instead of to polluting root go.mod if root is a go project. as @verdverm mention. (go mod tidy will remove require which not used in go)

cud.mod/   
   imp/
      k8s.io/api
   module.cue
   go.mod  // for imported pkg only, generate from local could use replace to locale the path
   package.json
   requrements.txt 
pkg/api/
go.mod  // root project

I'm not entirely sure I follow what you are saying about extractors. I think (and please correct me if this is wrong) you are suggesting that steps could automatically be run when adding/loading a package dependency. If so, I don't think that's a direction we want to head in. The go generate proposal explicitly separates out the function of code generation from either go get or go build; that's a model I think we should emulate.

Yes. it is. the example flow in my hack tool:

cue/load "k8s.io/api/core/v1"
   ->  resolve &  download "k8s.io/[email protected]" to "${GOMODCACHE}/k8s.io/[email protected]"
       -> scan codes `${GOMODCACHE}/k8s.io/[email protected]/core/v1`
           -> detect go codes here
               -> extract go "${GOMODCACHE}/k8s.io/[email protected]/core/v1` to "cue.mod/gen/k8s.io/api/core/v1"
   -> final load cue codes

Howerver, I think generating & pushing somewhere, and importing generated code is better.

@myitcv Thanks again.

@mpvl
Copy link
Contributor

mpvl commented Mar 26, 2021

@verdverm

  1. D adopts CUE and uses pkg/embed to include their cue.mod/pkg dependencies (so they can ship schema with the library and not need to fetch over the network)

A potential embed functionality should not be allowed to embed cue.mod or any subdirectory that crosses a module boundary (like in Go. If I'm not mistaken, this edge case cannot occur with these restrictions.

@mpvl
Copy link
Contributor

mpvl commented Mar 26, 2021

@verdverm
Co-sharing a go.mod also has a huge advantage: one reason to use a single repo is to get the usual mono-repo benefits. If Go and CUE were to have independent version control, there would be no good way for an external user of both to ensure the two are in sync (using MVS, at least).

If it is desirable for the CUE and Go to be versioned independently for external, it is generally a good idea to put it in separate repos. Note that tagging would otherwise get a bit messy, if nothing else.

Also not that for importing only, that is if the CUE in a module is not to be shared, but if one still wants to include other modules, it is possible to use an "anonymous" module. Anonymous modules do not require a top-level go.mod (IIRC).

@mpvl
Copy link
Contributor

mpvl commented Mar 26, 2021

@nyarly
I see your point about backwards incompatibility slipping in and causing problems with MVS. In general, though, backwards incompatible APIS will cause trouble one way or the other. One advantage with CUE is that we could detect backwards incompatible APIs fairly easily, compared to programming languages, for instance.

I'm not wedded to semver at all, and neither was MVS IIRC: if there are no backwards incompatible changes, having one version is really enough (see also https://www.youtube.com/watch?v=oyLBGkS5ICk). The reasons semver was chosen is because it was a de facto standard in the community and there needed to be something. With the special interpretation of major version 0, it sort of allows any kind of use case, although it may not be the prettiest.

@verdverm
Copy link
Contributor

@mpvl My understanding is that go.mod was temporary and only to make use of PROXY and SUMDB, per the proposal and @myitcv follow on comments. So anything about versioning and advantages there seems to share the same temporary status.

Personally I don't see this as "co-sharing", because Go has no say in this, it is a unidirectional CUE choice. I am unaware of any other situations like this. It is unusual and feels like an anti-pattern or bad hack. This is the opposite of my experience with other CUE decisions, which have taken a thoughtful and paced approach. Go took 10 years to include PROXY / SUMDB. Why are these services required for a first implementation?

Generally, what I don't like as a language A user is some other language B messing with my lang A dependency files to merge in its own. Especially when it has its own already. Then to add an extra file to my project because it wants to do this, so that lang A toolchain doesn't undo lang B's abuse, this is not desirable. There are conditions where this will break and it forces lang B's choices for code organization. The proposal does not state where this extra file will live, but given that people have all sorts of ways they organize code, it's likely to make someone unhappy.

Question wearing a security hat: Could an unneeded lang A dep (think malicious package) be more secretly included via a A -> B -> A dependency fetching system that crosses language barriers repeatedly? SUMDB is like blockchain, it only guarantees something has not been changed, and nothing to do with the secureness of what is correctly fetched. We are seeing far more issues with dependency injection and confusion attacks than MITM style attacks on deps.

I find the proposed benefits / drawbacks incomplete and several complications have not been addressed. The claim that doing this go.mod outweighs not does hold up for me, especially given that it's supposedly temporary and only for PROXY / SUMDB.

Missing tradeoff points:

  • the support effort to explain the go.mod hack (we know people don't read the docs before asking questions)
  • language hopping dependency management
  • security considerations related to the previous

@mpvl
Copy link
Contributor

mpvl commented Mar 27, 2021

@mpvl My understanding is that go.mod was temporary and only to make use of PROXY and SUMDB, per the proposal
@verdverm
That was the original thought. But as was maybe not clear enough in my comments, not commingling the two can actually lead to some serious issues. This is so much the case, that I think we shouldn't even model the dependencies in modules.cue at all. This view should probably be updated in the doc, including a clear example with the problem that exists when not combining the two (@myitcv).

Of course if this is true, then the same issue exists with using any combination of repos. The thing is that for most package managers, reproducibility isn't really a feature anyway, so having a few more version skews seems like the lesser problem, or at least something that justifiably should be tracked manually.

With MVS, we have the possibility of addressing this automatically. It seems to argue even that Go's MVS system should be pluggable.

and @myitcv follow on comments. So anything about versioning and advantages there seems to share the same temporary status.

Personally I don't see this as "co-sharing", because Go has no say in this, it is a unidirectional CUE choice.
FWIW, we have been discussing this with the Go team.

I am unaware of any other situations like this. It is unusual and feels like an anti-pattern or bad hack.

We are looking at the problems at hand, rather than what is conventional. I can't say I'm thrilled with approaches taken by most other languages (there are some notable exceptions, like Nix). MVS seems to fit the constraints of configuration and CUE particularly quite well and piggybacking on Go not only is convenient, but also avoids some bad dependency skews that are otherwise impossible to solve.

Given this reality, though, I do think the design needs to change a bit.

This is the opposite of my experience with other CUE decisions, which have taken a thoughtful and paced approach. Go took 10 years to include PROXY / SUMDB. Why
are these services required for a first implementation?

Those 10 years were not without battles and desperation of not having dependency management. There is no need to repeat that history if we have something good now.

Generally, what I don't like as a language A user is some other language B messing with my lang A dependency files to merge in its own.

There is no need for that. It can be accomplished by generating a Go file capturing cue dependencies hidden in the cue.mod directory. There is no "messing with", all is done with standard Go tooling.

Especially when it has its own already. Then to add an extra file to my project because it wants to do this, so that lang A toolchain doesn't undo lang B's abuse, this is not desirable. There are conditions where this will break and it forces lang B's choices for code organization. The proposal does not state where this extra file will live, but given that people have all sorts of ways they organize code, it's likely to make someone unhappy.

The cue.mod directory is the obvious place to put it. It's organization is dictated by CUE, not the user.

Question wearing a security hat: Could an unneeded lang A dep (think malicious package) be more secretly included via a A -> B -> A dependency fetching system that crosses language barriers repeatedly? SUMDB is like blockchain, it only guarantees something has not been changed, and nothing to do with the secureness of what is correctly fetched. We are seeing far more issues with dependency injection and confusion attacks than MITM style attacks on deps.

I don't really don't see how this setup can influence the outcome of existing Go code other than forcing a newer version.

@verdverm
Copy link
Contributor

verdverm commented Mar 27, 2021

I have a cue.mod/go.mod file so that Go does not process files in there (i.e. cue mod vendor) This would not work for this setup as Go ignores the directory.

Those 10 years were not without battles and desperation of not having dependency management. There is no need to repeat that history if we have something good now.

Remote PROXY / SUMDB are not required for dependency management, they are optimizations for package cache and a "trusted" hash store. My understanding is that reusing these are the sole reason for using a go.mod

That was the original thought. But as was maybe not clear enough in my comments, not commingling the two can actually lead to some serious issues. This is so much the case, that I think we shouldn't even model the dependencies in modules.cue at all. This view should probably be updated in the doc, including a clear example with the problem that exists when not combining the two

Are you saying CUE is now intending to use go.mod permanently?

I have been using MVS dependency management with both CUE and Go in the same repos, successfully, without commingling. (hof has a reference implementation that uses MVS without commingling, that includes sumdb calculations, without the SUMDB remote, no PROXY needed) There are many organizations with polyglot repositories that get along just fine without mixing language dependencies and systems.

I don't really don't see how this setup can influence the outcome of existing Go code other than forcing a newer version.

Besides increasing download and build times, forcing newer versions is an important point and a source of malicious dependencies. This is a security concern. By creating a dependency system that starts in Go, can then jump to processing CUE only dependencies (via a go.mod), and then include a malicious Go library by one of these far down Cue modules bumping a Go version. This is also a super obscure way to do this and if / when it happens people will ask "Why is my Go only program fetching modules from another language?" This question will come up without a security incident, I have this question now and I will anticipate others will as well.

We are looking at the problems at hand, rather than what is conventional. I can't say I'm thrilled with approaches taken by most other languages (there are some notable exceptions, like Nix). MVS seems to fit the constraints of configuration and CUE particularly quite well and piggybacking on Go not only is convenient, but also avoids some bad dependency skews that are otherwise impossible to solve.

I am also a fan of MVS and reusing this scheme in CUE. My disagreeability is specific to CUE using go.mod and adding a Go file. A CUE module system can be MVS and be rebuilt almost completely from Go with copy/paste and sed, while not needing to touch go.mod. We can point to cases where using go.mod (and the secondary requirements it needs) breaks capabilities people expect from Go. (re: replaces and cue.mod/go.mod)

It seems to argue even that Go's MVS system should be pluggable.

This is in fact what hof mod is, without the code introspection, because it is difficult if not impossible (or unreasonable) for one program to inspect all languages for imports.

@verdverm
Copy link
Contributor

verdverm commented Mar 31, 2021

RE: embeds

Go allows embeds within a module, and then a Go binary that imports that module will have them as well.

Trying to work through an example, my Go program has some dependency D. In the case that...

  • D has useful packages (which I import) and also a CLI (i.e. BuildKit, Hof, Cue)
  • D has adopted CUE for validating input, it does this in a utils helper package within the module
  • D embeds that CUE into their Go with embed so that it ships with the binary (also package imports)
  • D's CUE has external modules it imports, so cue mod download is required before a go build will work
  • D releases a new version and we update to said version

Now if the maintainer of D has not vendored their CUE code (including cue.mod), then after updating, my code will no longer build. Maybe this is not something CUE should worry about. I believe D would have to vendor their CUE dependencies so that cue.mod/pkg is available on a go mod download? There doesn't seem to be a way to cue mod download into a Go dependency as that would fail the sumdb check for modifications.

Does that sound correct?

@verdverm
Copy link
Contributor

verdverm commented Apr 1, 2021

RE: language jumping via intermixed dependencies

Maybe a more succinct way to describe my concern here is that two independent Go sources (Module A does not have any dependency path through Go code to Module B) can nonetheless have a version bumped through a CUE module to B, due to a common Go dependency and MVS. I believe only one dual module is required to link the dependency DAGs and force A to include B in its MVS calc. That dual module does not need to use CUE and Go together (CUE for CI only).

Given enough widely used modules in each language start connecting, this could have a significant impact in the number of dependencies accumulated through an intermixed MVS and the subsequent versions.

@mpvl
Copy link
Contributor

mpvl commented Apr 1, 2021

We're working on a new proposal that separates concerns a bit more and explains some of our thinking.

Maybe a more succinct way to describe my concern here is that two independent Go sources (ProgA does not have any dependency path through Go code with ProgB) can nonetheless have version bumps through a CUE only module, due to a common Go dependency and MVS.

Overall I don't share that concern. In fact the opposite. If the bump occurs because Go and CUE are co-versioned, then there is, or at least may be, a dependency. But we're going in circles here.

have been using MVS dependency management with both CUE and Go in the same repos, successfully, without commingling. (hof has a reference implementation that uses MVS without commingling, that includes sumdb calculations, without the SUMDB remote, no PROXY needed) There are many organizations with polyglot repositories that get along just fine without mixing language dependencies and systems.

I have been using MVS dependency management with both CUE and Go in the same repos, successfully, without commingling.

Not saying that is not possible, but there are certainly cases where not commingling will result in badness.

Do you have a documentation of how MVS in hof works?

@verdverm
Copy link
Contributor

verdverm commented Apr 1, 2021

I can't find my good docs at the moment... I think they are somewhere in a git history

The code is pretty much limited to this directory: https://github.com/hofstadter-io/hof/tree/_dev/lib/mod

This was the original, separate program before merging into hof: https://github.com/hofstadter-io/mvs

It should be possible to simulate the intermixed language MVS with a custom hof modder

@verdverm
Copy link
Contributor

verdverm commented Apr 1, 2021

at least may be, a dependency

It's the case where there is not direct dependency, or thus an overlap between two MVS DAGs in GO, but only because CUE has bridged this gap by mixing the dependencies, that they become intertwined ( creates an MVS overlap ) A a common dependency at two versions (a leaf in the MVS DAG) must now be the same, for two independent DAGs (as far as MVS is concerned).

The extra download is one outcome. Breaking builds (or requiring significant code change from a security update downstream dep) is another, because not every project adheres to SemVer.

@shykes
Copy link
Contributor

shykes commented Apr 1, 2021

I understand the practical reasons for wanting to use a go.mod directory, but that effectively makes Cue a satellite of Go, now and forever. It sends the message that a Cue developer is a specialization of a Go developer; and that unless you are a Go developer, you are not a first class citizen in the Cue ecosystem.

That is not necessarily a bad positioning - it can even be a smart way to piggy-back off the Go ecosystem to create a Go+Cue ecosystem. But it is quite different from the stated goal which is to make a universal data language.

In short there are far-reaching implications in terms of the positioning of the project, and defining its indended audience.

@seh
Copy link

seh commented Apr 1, 2021

To echo @shykes's point, were I not already involved deeply with Go development, if I came to CUE freshly and saw this use of Go tooling, I would turn away. If, instead, I found CUE using, say, Node.js tooling, or Java tooling, I would turn away, thinking, "I came for the configuration and data manipulation, not for the integration with a single ecosystem that I don't want to use."

@theckman
Copy link

theckman commented Apr 1, 2021

@myitcv it doesn't seem like the proposal as written extends Go Modules to avoid the social behavior we see emerging in the Go ecosystem, in response to the limitations of Semantic Import Versioning and the constraints they put on Module authors.

I'm talking specifically about the emerging behavior where more and more module authors are refusing to ever release a v1 of their module, due to challenges with tooling and not because they intend to always offer an unstable API. This pattern of behavior has emerged from those authors feeling like the SIV provides too much of barrier and support burden to use semantic versions, especially when moving past v1.

Is that a behavioral pattern, and subsequently a sociotechnical risk, that we're comfortable also adopting here?

@mpvl
Copy link
Contributor

mpvl commented Apr 2, 2021

Everyone: please note that this proposal DOES NOT require Go tooling to be installed.

Not to say there are other points, which we are addressing in a new proposal.

@theckman: as I said earlier, I'm not wedded to semver at all, and even don't like it all that much. But one needs to have something, no? Any alternatives that do not suffer from these issues? @verdverm is suggesting another approach using MVS, which I reckon in your opinion would suffer from the same issues?

We need a hermitic reproducible way to manage modules. Also, it is imperative to unique packages versions across imports. Neither SAT solving nor a Nix-style approach seem to solve this. MVS is a good solution for this. I'm not aware of any other algorithm that achieves this.

@mpvl
Copy link
Contributor

mpvl commented Apr 2, 2021

So in advance of the new/revised proposal, here is a description of the commingling issue. The large groups/yellow blocks represent VCS repositories. At the top is a repo (Base) that has both Go and CUE files. The CUE files define, say, the set of options or a declarative detailed API description of the Go package in that module.

Repo G isn't technically necessary here, but the graph renders more nicely with it.

At the bottom is a repo (MyApp) that creates a Go app from the top packages, and also uses the CUE definition in that repo to define its own API, verify inputs what have you. It, however, also mixes in definitions from repo C.

Now C, in turn, requires v1.1.0 of the CUE package. Let's assume we are resolving Go and CUE independently. We now have a problem at hand: the CUE definitions resolves to a newer version than anticipated, and thus will indicate a more permissive API than can be expected from the Go app created in MyApp, and thus will not correctly represent the features and API.

This issue can be avoided by ensuring that there is only one version of repo Base. Commingling Go and CUE resolution, in this instance, would solve that.

To emphasize, this is not specifically a Go+CUE issues but really an issue for any language or system where CUE is included as a description of an API or other form of contract. The general solution seems ensuring that any resolution within one repo will select only one version of a repo across languages, satisfying all other constraints.

There may be other ways to ensure that repos are unique across different languages. But co-resolving versions seems like a possibility. We have some ideas how to make this more pluggable and even allow resolving multiple languages at once (within limits). But we would love to hear about other solutions that can solve this issue.

@verdverm: can you explain why you think this is not an issue and how you would solve this in your (independently resolved) MVS approach?

mermaid-diagram-20210402150645

@theckman
Copy link

theckman commented Apr 2, 2021

@theckman: as I said earlier, I'm not wedded to semver at all, and even don't like it all that much. But one needs to have something, no? Any alternatives that do not suffer from these issues? @verdverm is suggesting another approach using MVS, which I reckon in your opinion would suffer from the same issues?

@mpvl So the first thing I think I should call out is that Go Modules, as it exists today, marries you to semantic versioning, and makes it extremely difficult to use any other versioning scheme (including calendar based). I think we can address that issue, and the one I raised earlier around the risks of people not releasing v1 modules, by either making Semantic Import Versioning (SIV) optional or removing it entirely. This issue, which is a proposal for optional SIV in Go, may help provide more context: golang/go#44550

The idea would be to not include the major version in the name of the Module (e.g., example.org/module/v2), and to instead make use of the declared version in the .mod file.

@verdverm
Copy link
Contributor

verdverm commented Apr 2, 2021

I'd agree with @theckman, I'm not a fan of the required /vX in the import path which does not match the underlying code base. I think that if a module wants to support importing 2 major versions, it could do this within its own codebase using folders and ensuring it works on their own.

MVS could certainly support using SemVer or Calendar versions, per module, such that both are supported and a module is free to choose either. A module would not be able to use both or switch once it is chosen (I believe).

@mpvl A picture is worth a 1000 words. This is definitely a problem, an in this case, commingling would force my upgrade to Go v1.1.0. Knowing that SemVer is not strictly adhered to, I could be forced into making changes to my own code. This would be more irritating when it happens in downstream dependencies and my Go only program is updating a Go only module, but some other dep breaks things, because it's dep chain was forced to bump versions due to a CUE only module. That being said, these situation look to only happen when choosing versions initially or updating direct dependency versions, and the downstream impacts therein.

My issue is for Go only users (or other languages) who have no attachment to CUE. In this case my own app depends on a (series of) module(s) in some commingled MVS chain, and that a Go version is now required to increase because of a CUE only module, far removed. Supporting N languages together seems like it would result in compounding dependencies, as one multi-language module can force me to version align with and download a huge number of modules in N other languages which are irrelevant to my own application. I will make a picture as well, so that this is clearer.

Generally I think mixing in more languages is going to make this problem even worse and that CUE should only worry about CUE and not N languages. Is it really in CUE's scope to solve dependency management issues across multiple languages? If there are compatibility issues, my opinion is the user should be left to deal with this. CUE forcibly injecting itself into other languages dependency systems is seen as not playing nice with other ecosystems. I think this will hurt adoption (as @seh related). Personally I will have to think about how I would proceed, with the current choices being maintaining a fork to disable this behavior or replacing with something else. I can say I will not use a CUE that has this approach to commingled dependency management.

RE: Go tooling, I think we are talking more generally than in simply requiring the Go binary to be available. I consider using the Go libraries from a CUE binary to achieve the same thing using Go tooling ( reading/writing go.mod, parsing Go code to infer used imports, talking to GOPROXY/GOSUMDB, adding a file to my Go project, and in general inspecting and processing Go for dependency reasons).

@nyarly
Copy link

nyarly commented Apr 2, 2021

We need a hermitic reproducible way to manage modules. Also, it is imperative to unique packages versions across imports. Neither SAT solving nor a Nix-style approach seem to solve this. MVS is a good solution for this. I'm not aware of any other algorithm that achieves this.

Leaving aside Flakes, a Nix style approach would require that every requirement of a package be made explicit. For instance, my current CUE-related workspace is built using an expression that includes:

    cue = { stdenv, buildGoModule, fetchFromGitHub, lib }:
    buildGoModule rec {
      pname = "cue";
      version = "v0.3.0-beta.6";

      src = fetchFromGitHub {
        owner  = "cuelang";
        repo   = "cue";
        rev    = version;

        sha256 = "02fafsl5krvkxc9p5acz38hb9zbp1gz5zvz1cvgr7g5dd9h2j8fv";
      };

      vendorSha256 = "0drnmf8gfj3k74z5zh0032g5kg3nzji5ji1m3qbxfzisp5cvba7m";

      #doCheck = false;

      CGO_ENABLED=0;
      buildFlagsArray = ["-ldflags='-s -w -X cuelang.org/go/cmd/cue/cmd.version=${version}'"];
    };

(oops! I need to update to 0.3.0! 😄 )
Specifically, the rev, sha256 and vendorSha256 fields specify a revision in Git, as well digests of the source code and the go modules fetched to satisfy the go.sum there. How Nix interacts with different module ecosystems varies, but in the Go case, it delegates versioning to the Go system.

On the one hand, I think Nix does provide hermetic, reproducible module collection. If not, maybe I don't understand what you're envisioning as that requirement.

The irritation with Nix (and honestly, it's not a huge issue) is that collecting those digests is frustrating. The pragmatic approach, generally, is to put in a fake digest (nixpkgs includes a lib.fakeSha256 constant...) and then copy the real digests in from the build errors. It's a kind of poor man's TOFU system.

The other thing about Nix is that dependencies tend to be resolved distro-wide. In some way's that's reasonable: no OS distribution really ships multiple versions of glibc, say. Nix is nice in that different versions of the distribution can co-exist, and different applications can use different glibcs if needed.

For language-based distribution systems, the domain of version resolution is different. CUE, I think, wants an application domain, right? Everything is resolved together, and the way unification works puts me in mind of a hypothetical issue with old package.json style NPM resolution: if my app has an indirect dependency to another package via two paths, it's possible the code gets a value from one place (and therefore version X of my dep) and passes it to another that uses version Y. I think it's a subtle enough problem that it may have occurred and not been recognized.

My point is that, a whole-application SAT solution (and here I think of Bundler from Ruby), can solve this issue well. People complain of "dependency hell" (and build NPM), but the upshot is that versions need to be resolved at the application domain in Ruby, and therefore you really can only have one version of any module. Bundler produces a lockfile with exact versions (which are held as source packages rather that VCS revisions), which also gets a lot of the way to reproducibility, and solves the fiddly and error prone process of resolving exact versions manually.

@nyarly
Copy link

nyarly commented Apr 2, 2021

With regards to semver: I think that's a really interesting point for CUE. As defined, semver talks about changes in interface, which I understand as being related to the functions a module exports, basically. CUE is a different kind of language than the ones semver was developed to describe, and I wonder if it really is appropriate to describe a CUE module in terms of its interface.

@mpvl
Copy link
Contributor

mpvl commented Apr 2, 2021

@nyarly yes, nix is indeed hermetic, but it indeed doesn't solve the other problem you mention of enforcing only a single version different instances of the same import.

Semver is still somewhat relevant to CUE, but as long as the import paths distinguish major versions, one could argue that only minor versions are relevant. Then again, the same could be argued for programming languages.

I don't find SAT solvers give satisfactory results and dependency resolution at scale is a nightmare.

CUE being different also opens up possible solutions. It requires all imports of the same package to be the same version, but only barely. For instance, in the above problem, we could expect users to manually "downcast" (#454) versions of there is a possibility for discrepancies. But this doesn't obviate the need to align versions used in one language with an import in CUE (although in this case it would be a simpler exercise).

Also, the commingling issue is to some extent also something that is particular to CUE, or any language that is included alongside some system to describe its API for that matter. Some of the solutions may also come from guidelines and best practices. For instance, the above issue would more or less resolve itself if all CUE that depends on a commingled setup is itself managed within the same package management system. These could still depend on independent CUE repositories, but not vise versa. I'll try to write something up about this, but I believe with such restrictions it actually becomes doable to achieve consistency in a multi-lingual setup.

@verdverm we certainly don't plan to offers integrations for all languages. But I think a solution should be pluggable enough to make such integrations relatively straightforward.

@theckman Yes I know the Go modules depends on Semver. Although, I'm not a fan of semver, I am a fan of standardization, and ultimately it doesn't matter that much. The point of people sticking with v0.x is an interesting one, though. Hard to avoid if one allows for a standard of specifying a ramp-up of backwards incompatible changes, but probably having such a mechanism have more of a wall-of-shame feel to it would help. I think in the end eco-system incentives are a good way to go there, though: automatically detect that the semver guidelines are followed and demote packages that are not at v1 or break semver rules. I know this is not fool proof either.

@mpvl
Copy link
Contributor

mpvl commented Apr 3, 2021

BTW, for those unfamiliar, Rich Hickey's Spec-ulate is an excellent talk about this topic: https://www.youtube.com/watch?v=oyLBGkS5ICk. An interesting thing to keep in mind when watching this is that CUE can actually be used as a "spec" definition, describing and validating backwards compatibility. This gives an idea how CUE could be different.

@verdverm
Copy link
Contributor

verdverm commented Apr 6, 2021

This seem relevant and related: golang/go#36460 (cmd/go: lazy module loading, changes to how Go will process and manage transitive dependencies)

Similar complaint about large numbers of modules being downloaded through dependencies: golang/go#29935

After reading https://go.googlesource.com/proposal/+/master/design/36460-lazy-module-loading.md I can't help but think that trying to maintain parity with Go across versions (and future changes) from CUE would take valuable time away from maintainers who could otherwise make CUE good at what CUE does best. Thoughts?

Would having a command or function to check and warn (optionally error) about mismatched versions from the same repo be a sufficient solution to the problem described by @mpvl without the need to manage the go.mod file from CUE?

@cueckoo
Copy link

cueckoo commented Jul 3, 2021

This issue has been migrated to cue-lang/cue#851.

For more details about CUE's migration to a new home, please see cue-lang/cue#1078.

@cueckoo cueckoo closed this as completed Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants