Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template for design docs #601

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions docs/designs/0001-golang-1-21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Go 1.21 toolchains

## Overview
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Overview/Introduction/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed the change of this section to Introducion based on what I saw from the RFC examples Erik linked. But looking at it again, I think it is even worse to convey the meaning of TL;DR we are expecting from this section.

But point taken, this is currently inconsistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, why not Abstract then? Ultimately, I don't care whether it's called Abstract or Introduction, but I think the content should be different, something on the lines of
"Add support for automatic Go toolchain fetching which was introduced in Go 1.21 to address the issue of frequent micro version bumps by some projects using cachi2 to build their software as well as address Go's new 'minimum required version' policy during builds."

Copy link
Contributor Author

@brunoapimentel brunoapimentel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract sounds very adequate, although when I look the word up in online dictionaries, they describe it as a Summary, so that could be another option.


Go 1.21 introduced a new feature called _toolchains_ which allow usage of different Go toolchains to be used for different modules simultaneously instead of the default bundled Go toolchain (i.e. the bundled `go` binary). The toolchain feature is enabled by the `toolchain` keyword which can either be in the `go.mod` file or the `go.work` (workspaces) file [[1]](#references).

This document describes how this feature works and how it impacts the current implementation in Cachi2, and proposes a few ways the implementation can be approached.

## Context

[What was the context for this doc? I can't think of anything important to mention here, and that's why I marked this section as optional in the template].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because I believe what you currently have in the previous section is the context. For the intro, I think the whole thing needs to be reworded in the sense of why this document even exists which the current intro doesn't do, it just says what Go did in 1.21. It's missing though how this project is impacted requiring some changes this doc will describe.


## How it works
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too free-form. I believe we need to standardize on this H2 heading. Also, I'm still fairly convinced that the paragraphs that follow this heading are the overview of the 1.21 changes that are relevant to the cause, it doesn't say anything about how we're going to solve it - that's good, because that will come later in Proposed solutions!


### Important things to keep in mind
- the `go` line in `go.mod` denotes the **minimum required** Go version to use to compile a module
- the `toolchain` line denotes the **suggested** go toolchain version to be used
- however, the suggested toolchain version **must not be older than the minimum required Go version** [[2]](#references)
- different modules and workspaces can denote different versions of toolchains
- Go always ignores the `toolchain` keyword if that version is less than the bundled Go's version

### Controlling toolchain behaviour
When it comes to toolchains Go's behaviour depends primarily on the `GOTOOLCHAIN` environment variable which can be set in a few different places. Go's lookup for environment variables goes as follows:

1. the running process' environment is checked for variables (e.g. `GOTOOLCHAIN=<selector>` go command)
2. `$GOENV` variable is checked for user's environment configuration file [[3]](#references) (e.g. it may point to `/home/user/.config/go/env`)
3. `$GOROOT/go.env` is checked; this is the global installation environment file and its location is platform-dependent
- distro repackaged Go releases may freely set variables differently from the official vendored releases [[4]](#references)
5. `GOTOOLCHAIN=local` is used as the default

### `GOTOOLCHAIN` selector values
The selector is what determines Go's behaviour when it comes to respecting the `toolchain` setting in `go.mod` or `go.work` files.
The value `GOTOOLCHAIN` accepts could be roughly described as `GOTOOLCHAIN=<selector>(+<selector>)?` where
`<selector>` can be one of:
- `local` - always run the default bundled toolchain; it's often paired with another selector, e.g. `local+path` which means that whenever possible Go should prefer the alternative toolchain
- `name` - always runs toolchain with that specific name; Go will look into its `PATH` and if such toolchain binary is missing it'll download it from the internet)
- `path` - same as `name` but DOES NOT download the toolchain if Go couldn't find it in `PATH`; in its bare form it's a shortcut for `local+path`
- `auto` - shortcut for `local+auto`, i.e download new toolchains as needed

**For our purposes though we should only ever be concerned with `GOTOOLCHAIN=auto` and pre-fetch whatever toolchain version Go encounters during processing.**

### Downloading toolchains
Luckily toolchains are nothing else than a gomod dependency which is cached like any other Go dependency. Unfortunately for us:
1. Go by default never looks into the mod cache for toolchains (not entirely true, read on...) and so **it always downloads it anew**
2. Go always wants to perform (not entirely true, read on...) checksum validation on the downloaded toolchains
- checksum validation is controlled by the `GOSUMDB` variable (see [Other Affecting Variables](#other-affecting-variables))

#### Go always performs checksum validation on toolchains
Starting with the checksums first as the downloads may be more clear later. So `GOSUMDB` was mentioned to control the checksum validation, so if `GOSUMDB=off` (Fedora default) what happens with either `go mod download` or `go build` while suggesting a `toolchain` to be used is that Go goes and fetches the toolchain from the internet, but then fails the checksum verification and so it refuses to proceed further with anything (e.g. downloading other dependencies or building anything) (see [Fetching toolchain with checksum disabled](#fetch-toolchain-no-checksum)). Therefore, to support Go 1.21 toolchains `GOSUMDB` needs to **be always set explicitly** by us and point to a valid checksum server as long as `GOTOOLCHAIN` is set to anything but `local` which forces Go to always use the bundled toolchain (see [GOTOOLCHAIN selectors](#selectors)).

_Note there's also the `GONOSUMDB` variable (see [Other Affecting Variables](#other-affecting-variables)) which might give the impression that if we specify a pattern for toolchains, we should be able to either download toolchains insecurely (not that we want to!). Turns out toolchains are exempt from any checksum verification settings [[5]](#references) and so we need to account for this._

#### Go always downloads the toolchain
With setting `GOSUMDB` explicitly we only takes care of a part of the puzzle, because it only ensures that we'll pre-fetch a toolchain correctly, but will not make sure in any way that such toolchain is actually going to be used during the project's container build or even that such a container build would succeed in general, because Go will ultimately try to download it again (except it won't, because builds may be hermetic).
To solve the fetch/cache/use issue, one needs to can set the `GOPROXY` variable as well so that it points to the cache on the local file system [[6]](#references). That way Go will continue using its original logic and will fetch the toolchain from the pre-fetched dependencies cache (into the same cache btw, fill in your favourite meme...) instead of pulling it from the internet. So far so good, back to checksums. So, we already know that Go does checksum verification on downloaded toolchains by pointing it to a valid checksum server, so how do we get them verified during hermetic builds? Turns out, Go only performs toolchain checksum verification when pulling from the internet, but not when using the `file://` URL scheme inside `GOPROXY` \o/. In other words, Go only tries to verify remote resources, but somehow trusts local resources (see [Fetching with GOMODCACHE and GOPROXY](#fetch-with-gomodcache-and-goproxy)).

And now the "Crème de la crème" of Go toolchain downloads and verification. Remember the section intro about Go seemingly never looking into `GOMODCACHE` for toolchains? It actually does, but guess what, it needs to verify the toolchains checksum (see [Use toolchain with an offline cache](#use-toolchain-with-an-offline-cache).

### Example invocations
This section serves just as a reference to various invocations of Go forcing the toolchain usage.

#### Fetching toolchain with checksum disabled
```
$ GOTOOLCHAIN=auto go mod download
go: downloading go1.21.2 (linux/amd64)
go: download go1.21.2: golang.org/[email protected]: verifying module: checksum database disabled by GOSUMDB=off

# try again to showcase Go will download the toolchain again (but let it pass the verification)
$ GOTOOLCHAIN=auto GOSUMDB=sum.golang.org go mod download
go: downloading go1.21.2 (linux/amd64)
...
go: downloading golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c
# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip
# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip: 200 OK (0.007s)
# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod
# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod: 200 OK (0.005s)
...
```

#### Use toolchain with an offline cache
```
$ GOTOOLCHAIN=auto GOSUMDB=off GOMODCACHE=/tmp/gomod/ go mod download --json
go: golang.org/[email protected]: verifying module: checksum database disabled by GOSUMDB=off
```

#### Fetching with GOMODCACHE and GOPROXY
```
$ GOTOOLCHAIN=auto GOSUMDB=off GOMODCACHE=/tmp/gomod/ GOPROXY=file:///tmp/gomod/cache/download go mod download --json
{
"Path": "golang.org/x/text",
"Version": "v0.0.0-20170915032832-14c0d48ead0c",
"Info": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.info",
"GoMod": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod",
"Zip": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip",
"Dir": "/tmp/gomod/golang.org/x/[email protected]",
"Sum": "h1:qgOY6WgZOaTkIIMiVjBQcw93ERBE4m30iBm00nkL0i8=",
"GoModSum": "h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ="
}
...
```

## Proposed solutions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some standardization is needed also for ^this H2 heading, but I agree the rest of H3+ headings can be left free form and not limit the writer.


### 1. Ignore toolchains altogether
Since we only need to control Go's behaviour using environment variables, these are the necessary settings:

#### Dependency pre-fetch Go configuration
Set the following on top of our existing env settings:
`GOTOOLCHAIN=local`

#### User container build Go configuration
Set the following on top of our existing env settings:
`GOTOOLCHAIN=local`

#### Advantages:
- pretty much no real work needed on our side

#### Disadvantages:
- may not shield us properly from users wanting the toolchain feature in future, i.e. someone may come complaining (they will!)
- we'll have to keep up with frequent Go releases and update the container image promptly, in other words continue what we do now

### 2. Pre-fetch ALL affecting toolchains
Let Go download all toolchains it may need (for all modules). Since we only need to control Go's behaviour using environment variables, these are the necessary settings:

#### Dependency pre-fetch Go configuration
Set the following on top of our existing env settings:
`GOTOOLCHAIN=auto`
`GOSUMDB=sum.golang.org` (we already set this one...)

#### User container build Go configuration
Set the following on top of our existing env settings:
`GOTOOLCHAIN=auto`
`GOSUMDB=off`
`GOPROXY=file://${GOMODCACHE}/cache/download`

#### Advantages:
- we'll be properly supporting Go 1.21 features going in the future
- in the very unprobable theory we might not need to keep bumping Go versions in our container image **that often** if users make use of toolchains properly, because if they set `go 1.21` and then `toolchain 1.24.0` (provided 1.24 is out) then the 1.21 binary will be able to download a newer toolchain and use it for compiling; however, this whole idea crumbles like a house of cards when it comes to features that include new keywords

#### Disadvantages:
- potentially failing builds due to transitively incompatible (i.e. with indirect dependencies) `go` and `toolchain` versions set - but this is a user problem in general though, not cachi2's (mentioning it just in case)

## Decision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of Decision!


Since the [second proposed solution](#2-pre-fetch-all-affecting-toolchains) allows us to both account for the proper support of the toolchains feature, and also to dimish the amount of work needed to support newer versions, the team has opted for it.

## References
[1] standard Go rules apply when it comes to evaluating modules vs workspaces
[2] there are some strict rules when it comes to versioning of suggested toolchains which applies transitively too across all dependencies, but luckily that's not our problem to figure out! As noted in https://go.dev/doc/toolchain:

> A module’s go line must declare a version greater than or equal to the go version declared by each of the modules listed in require statements. A workspace’s go line must declare a version greater than or equal to the go version declared by each of the modules listed in use statements.

[3] the `go.env` file has a simple `KEY=VALUE` structure
[4] Fedora repackaged _go.env_:
```
$ cat /usr/lib/golang/go.env

# This file contains the initial defaults for go command configuration.
# Values set by 'go env -w' and written to the user's go/env file override these.
# The environment overrides everything else.

# Use the Go module mirror and checksum database by default.
# See https://proxy.golang.org for details.
GOPROXY=direct
GOSUMDB=off

# Automatically download newer toolchains as directed by go.mod files.
# See https://go.dev/doc/toolchain for details.
GOTOOLCHAIN=local
```
Go's default vendored _go.env_ :
```
# This file contains the initial defaults for go command configuration.
# Values set by 'go env -w' and written to the user's go/env file override these.
# The environment overrides everything else.

# Use the Go module mirror and checksum database by default.
# See https://proxy.golang.org for details.
GOPROXY=https://proxy.golang.org,direct
GOSUMDB=sum.golang.org

# Automatically download newer toolchains as directed by go.mod files.
# See https://go.dev/doc/toolchain for details.
GOTOOLCHAIN=auto
```

[5] Snippet from the docs:
> toolchain downloads fail for lack of verification if GOSUMDB=off. GOPRIVATE and GONOSUMDB patterns do not apply to the toolchain downloads.

[6] Snippet from the docs:
> A module cache may be used directly as a file proxy: `GOPROXY=file://$(go env GOMODCACHE)/cache/download`

## Resources
- https://go.dev/doc/toolchain
- https://go.dev/ref/mod#environment-variables
Comment on lines +193 to +195
Copy link
Member

@eskultety eskultety Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wrote the content, in retrospec ^this should have somehow been part of References, no need for another section, admittedly I was trying to be unnecessarily formal, just an FYI.


## Apendix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, this should have been part of the broader context sections, or overview or whatever we end up calling it.

### Other affecting variables
There are a couple of other variables that further affect module checksum verification and the download process:
- `GOSUMDB` - Identifies the name of the checksum database to use and optionally its public key and URL, normally defaults to `sum.golang.org`, <ins>on Fedora it's `"off"`</ins> [[5]](#references)
- `GONOSUMDB` - Comma-separated list of glob patterns of module path prefixes for which the go should not verify checksums using the checksum database
- `GOPROXY` - a (comma separated) list of URLs pointing to module proxies (instead of downloading modules directly from VCS); it supports the following URL schemes:
- `https/http`
- `file`
- `GOPRIVATE` - list of glob patterns of module path prefixes that should be considered private. Acts as a default value for `GONOPROXY` and `GONOSUMDB`. This is useful when no proxy serves private modules