From 46e89a81d206ee3221e7a6a1bf0926408a9dae4e Mon Sep 17 00:00:00 2001 From: Bruno Pimentel Date: Thu, 15 Aug 2024 10:35:47 -0300 Subject: [PATCH 1/4] WIP: Initial attempt of a template, plus adapting Ruby design to it Signed-off-by: Bruno Pimentel --- docs/designs/rubygems-001.md | 547 +++++++++++++++++++++++++++++++++++ docs/designs/template-000.md | 9 + 2 files changed, 556 insertions(+) create mode 100644 docs/designs/rubygems-001.md create mode 100644 docs/designs/template-000.md diff --git a/docs/designs/rubygems-001.md b/docs/designs/rubygems-001.md new file mode 100644 index 000000000..a2bfad134 --- /dev/null +++ b/docs/designs/rubygems-001.md @@ -0,0 +1,547 @@ +# Initial Design for the Rubygems Package Manager + +- [Overview](#overview) +- [Context](#context) +- [Description](#description) + - [Ruby ecosystem overview](#i-ruby-ecosystem-overview) + - [Current implementation overview (Cachito)](#ii-overview-of-the-current-implementation-in-cachito) + - [Design for the Cachi2 implementation](#iii-design-for-the-implementation-in-cachi2) +- [Decision](#decision) + +## Overview + +Design that covers the initial implementation of the Rubygem/Bundler package manager in Cachi2. It takes inspiration from the original implementation done in [Cachito](https://github.com/containerbuildsystem/cachito). + +| | | +|---------|---------| +|Author |[Michal Šoltis](msoltis@redhat.com) | +|Co-author |[Bruno Pimentel](bpimente@redhat.com) | +|Proposed on |August 01, 2024 | + +## Context +In the effort to evolve Cachi2 to be a full solution for the prefetching feature of Cachito, we have decided to implement support to the currently missing package managers: Rubygems and Yarn v1. This design covers only the implementation of Rubygems, and the Yarn v1 design will follow. + +## Description + +### I. Ruby ecosystem overview + +#### Development prerequisites +In order to execute the commands in the examples below, make sure you have the following packages installed in your +environment: + +```bash +sudo dnf install rubygems rubygems-bundler +``` + +Or use the official Ruby image from Docker hub: +```bash +podman run --rm -it docker.io/library/ruby:3.3.3 bash +``` + +#### Project structure +```bash +bundle init # creates Gemfile in the current directory +bundle lock # creates Gemfile.lock in the current directory +``` + +```bash +├── .bundle +│ └── config +├── Gemfile +├── Gemfile.lock +├── vendor/cache +``` + +#### Glossary +- **Gemfile**: A file that specifies the gems that your project depends on and their versions. Bundler uses this file +to install the correct versions of gems for your project. + + ```ruby + source "https://rubygems.org" + + gem "rails", "= 6.1.7" + ``` + +- **Gemfile.lock**: A file that locks the versions of gems that are installed for your project. Bundler uses this file +to ensure that the correct versions of gems are installed consistently across different environments. + +- **RubyGems**: General package manager for Ruby. Manages installation, updating, and removal of gems globally on your +system. + + ```bash + gem --help + ``` + +- **Bundler**: Dependency management tool for Ruby projects. +Ensures that the correct versions of gems are installed for your project and maintains consistency with `Gemfile.lock`. + + ```bash + bundler --help + ``` + +- **Gem**: A package that can be installed and managed by Rubygems. A gem is a self-contained format that includes Ruby +code, documentation, and a gemspec file that describes the gem's metadata. + +- **{gem}.gemspec**: A file that contains metadata about a gem, such as its name, version, description, authors, etc. +RubyGems uses it to install, update, and uninstall gems. + + ```ruby + Gem::Specification.new do |spec| + spec.name = "example" + spec.version = "0.1.0" + spec.authors = ["Nobody"] + spec.email = ["ruby@example.com"] + spec.summary = "Write a short summary, because RubyGems requires one." + end + ``` + +#### Dependency types +There are four types of +[sources](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L48) for dependencies +in the `Gemfile.lock` file: + +##### Gem dependencies +Regular gem dependencies are located at the source URL, in our case, always . Each gem can be +accessed by its name and version - rubygems.org/gems/``-``.gem + +Example of a gem dependency in the `Gemfile.lock` file: + +```bash +GEM + remote: https://rubygems.org/ + specs: + rails (6.1.4) + # transitive dependencies + actioncable (= 6.1.4) + actionmailbox (= 6.1.4) + actionmailer (= 6.1.4) + actionpack (= 6.1.4) + actiontext (= 6.1.4) + actionview (= 6.1.4) + activejob (= 6.1.4) + activemodel (= 6.1.4) + activerecord (= 6.1.4) + activestorage (= 6.1.4) + activesupport (= 6.1.4) + bundler (>= 1.15.0) + railties (= 6.1.4) + sprockets-rails (>= 2.0.0) +``` + +##### Git dependencies +Example of a Git dependency in the `Gemfile.lock` file: + +``` +GIT + remote: https://github.com/porta.git + revision: 779beabd653afcd03c4468e0a69dc043f3bbb748 + branch: main + specs: + porta (2.14.1) +``` + +##### Path dependencies +Example of a path dependency in the `Gemfile.lock` file: + +``` +PATH + remote: some/pathgem + specs: + pathgem (0.1.0) +``` + +All path dependencies must be in the project directory. Bundler +[does not copy](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/source/path.rb#L83) those +dependencies that are already within the root directory of the project. + +##### Plugins +Installing a plugin, even when on a folder that is a Bundler project, doesn't seem to affect the `Gemfile.lock`. The +plugin seems to be installed by default in the `$PWD/.bundle/`. The `Gemfile.lock` does have a section for plugins, +though, so further investigation would be needed. This initial investigation was done with the plugins listed under +[Known Plugins](https://bundler.io/guides/plugins.html). + +*Don't confuse Bundler plugins with [RubyGems plugins](https://guides.rubygems.org/plugins/). The latter are meant to +extend the functionality of `gem` itself, and don't seem to have any impact on Bundler directly.* + + +#### Platforms +Some gems may contain pre-compiled binaries that provide native extensions to the Ruby package. Any gem declared in the +`Gemfile` can be limited to specific +[platforms](https://bundler.io/v2.5/man/gemfile.5.html#PLATFORMS), making Bundler ignore it in case the project is +being built on a non-matching platform: + +```ruby +gem "nokogiri", platforms: [:windows_31, :jruby] +``` + +Here's an example of how a the `PLATFORM` section looks like in the `Gemfile.lock`: + +``` +PLATFORMS + arm64-darwin-20 + arm64-darwin-21 + arm64-darwin-22 + ruby + x86_64-darwin-18 + x86_64-darwin-20 + x86_64-darwin-21 + x86_64-darwin-22 + x86_64-linux +``` + +In case a user wants to force all the binaries to be compiled from source, the `BUNDLE_FORCE_RUBY_PLATFORM` environment +variable can be used. + +#### Dev dependencies +When adding a Gem into a Gemfile, the user might opt to nest them under a specific +[group](https://bundler.io/guides/groups.html). The name of the group can be any string, but the usual groups tend to +be common labels such as `:test`, `:development` or `:production`. + +Here's how it looks like in a `Gemfile`: + +```ruby +# :default group +gem 'nokogiri' + +group :test do + gem 'faker' + gem 'rspec' +end +``` + +Another way to declare a dependency in the `:development` group is to +[add it](https://guides.rubygems.org/specification-reference/#add_development_dependency) to the `Gem::Specification`, +which is usually declared in the `.gemspec` file. This means we can safely assume that all dependencies under +`:development` are dev dependencies. + +#### Dependency checksums +The support to checksums in the `Gemfile.lock` is still in development, and currently is an +[opt-in feature](https://github.com/rubygems/rubygems/pull/7217). To enable it, we need to manually add a `CHECKSUMS` +section in the `Gemfile.lock`: + +```shell +# manually add `CHECKSUMS` section somewhere in the Gemfile.lock +vim Gemfile.lock +# install any gem +bundle add rails --version "6.1.7" +# check the Gemfile.lock /o\ +cat Gemfile.lock +``` + +Example of a checksum section in the `Gemfile.lock`: + +``` +CHECKSUMS + actioncable (6.1.7) sha256=ee5345e1ac0a9ec24af8d21d46d6e8d85dd76b28b14ab60929c2da3e7d5bfe64 + actionmailbox (6.1.7) sha256=c4364381e724b39eee3381e6eb3fdc80f121ac9a53dea3fd9ef687a9040b8a08 + actionmailer (6.1.7) sha256=5561c298a13e6d43eb71098be366f59be51470358e6e6e49ebaaf43502906fa4 + actionpack (6.1.7) sha256=3a8580e3721757371328906f953b332d5c95bd56a1e4f344b3fee5d55dc1cf37 + actiontext (6.1.7) sha256=c5d3af4168619923d0ff661207215face3e03f7a04c083b5d347f190f639798e + actionview (6.1.7) sha256=c166e890d2933ffbb6eb2a2eac1b54f03890e33b8b7269503af848db88afc8d4 +``` + +This feature is available since Bundler [v2.5.0](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L55), +from this [PR](https://github.com/rubygems/rubygems/pull/6374) being merged on Oct 21, 2023. + +### II. Overview of the current implementation in Cachito + +[cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py) + +Most work is already done by parsing the `Gemfile.lock` file, which pins all dependencies to exact versions. The only +supported source for gem dependencies to be fetched from is . Git dependencies are specified +using a repo URL and pinned to a commit hash. Path dependencies are specified using a local path. + +To avoid arbitrary code execution, Bundler **is not used** to download dependencies. Instead, as stated above, Cachito +parses `Gemfile.lock` file directly and download the gems from . + +**Note**: parsing `Gemfile.lock` is done via [gemlock-parser](https://github.com/containerbuildsystem/gemlock-parser), +which is vendored from +[scancode-toolkit](https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/gemfile_lock.py). + +Source code for "official" Bundler lockfile parsing in Ruby: + + +### III. Design for the implementation in Cachi2 + +#### Prefetching + +Running a bundler command to fetch the dependencies always executes the `Gemfile`, which is arbitrary Ruby code. +Executing arbitrary code is a security risk and makes it impossible to assert that the resulting SBOM is accurate +(since any random package can be fetched from the Internet during the prefetch). This means that we need to implement +custom code to fetch the dependencies. + +In the `Gemfile.lock`, all Gems that come from the same remote URL are grouped under the same block: +``` +GEM + remote: https://rubygems.org/ + specs: + rails (6.1.4) + json-schema (1.2.1) +``` + +A Gem can be fetched from its original location by using the following template: +```ruby +"https://#{remote}/gems/#{name}-#{version}.gem" +``` + +We should also leverage the existing code used to perform parallel downloads based on `asyncio` to download the necessary +Gems from the internet. + +##### Output folder structure + +Bundler has a built-in feature to cache all dependencies locally. This is done with the `bundle cache --all` command or +`bundle package --all` alias. In order to make bundler use the prefetched dependencies during the build, Cachi2 needs +to recreate the exact same folder structure as bundler does. + +Here's an example of how the output folder should look like: + +```bash +$ ls vendor/cache + +actioncable-6.1.7.gem +date-3.3.4.gem +json-schema-26487618a684 +nokogiri-1.16.6.gem +tzinfo-2.0.6.gem +``` + +Notice that all the `.gem` dependencies are kept in their original format, and Git dependencies are just plain clones +of the repository placed in a folder. For Git dependencies, the folder name must match this specific +[format](https://github.com/rubygems/rubygems/blob/3da9b1dda0824d1d770780352bb1d3f287cb2df5/bundler/lib/bundler/source/git.rb#L130): + +```ruby +"#{base_name}-#{shortref_for_path(revision)}" +``` + +The name of the directory **must come from the Git URL**, not the actual name of the gem, and the cloned folder must +contain unpacked source code. Any other format will cause bundler to try to re-download the repository, causing the +build to fail. + +###### Multiple Gems in a single repository + +A single repository can hold multiple Gems, and those can be imported as dependencies. When this happens, Bundler still +expects a single clone to be made. Here's an example of how multiple gems imported from a single repository+revision +looks like in the `Gemfile.lock`: + +``` +GIT + remote: https://github.com/chatwoot/azure-storage-ruby + revision: 9957cf899d33a285b5dfe15bdb875292398e392b + branch: chatwoot + specs: + azure-storage-blob (2.0.3) + azure-storage-common (~> 2.0) + nokogiri (~> 1, >= 1.10.8) + azure-storage-common (2.0.4) + faraday (~> 2.0) + faraday-follow_redirects (~> 0.3.0) + faraday-net_http_persistent (~> 2.0) + net-http-persistent (~> 4.0) + nokogiri (~> 1, >= 1.10.8) +``` + +#### Out of scope + +##### Plugins +Bundler has support for using [plugins](https://bundler.io/guides/bundler_plugins.html), which allows users to extend +Bundler's functionality in any way that they seem fit. Since this can open the possibility for security issues, plugins +will not be supported by Cachi2. + +Since we're not proposing the direct usage of Bundler to fetch the dependencies, no other actions are needed in the +prefetch phase, existing plugin definitions will be ignored. + +##### Pre-compiled binaries +For the initial implementation, we're aiming to provide support only for plain Ruby gems (which are idenfied as `ruby` +in the `PLATFORMS` section of the `Gemfile.lock`). Platforms that relate to specific architectures will contain +binaries that were pre-compiled for that architecture (see [Platforms](#platforms)). + +The URL schema in the default rubygems registry seems to follow this format: + +```ruby +# Plain Ruby Gem +"https://rubygems.org/gems/#{name}-#{version}.gem" + +# Platform-specific Gem +"https://rubygems.org/gems/#{name}-#{version}-#{platform}.gem" +``` + +This means that we can easily just download the plain Ruby Gems, and skip all platform-specific Gems altogether. From +the testing I did, this did not seem to affect the ability to perform a hermetic build in any way, Bundler just +compiled everything from source. + +To avoid confusing users if a corner case shows up, we can add a `WARNING` log in case extra platforms are detected, +and also document it properly. + +Proper support for pre-compiled binaries should be probably left as a follow-up feature, similarly to what was done +with [pip wheels](https://github.com/containerbuildsystem/cachi2/blob/main/docs/pip.md#distribution-formats). + +##### Checksum verification +Since checksums in the `Gemlock.file` is still a feature in development (see [checksums](#dependency-checksums)), we +can postpone implementing support for it until the feature is delivered. + +We need to decide if we will report all dependencies as having missing checksums in the SBOM, or not. + +##### Dev dependencies +Bundler declares all dev dependencies under the `:development` +[group](#dependency-groups-or-how-bundler-deals-with-dev-dependencies). Unfortunately, groups declared in the `Gemfile` +are not reflected in the `Gemfile.lock`. + +To implement proper reporting of dev dependencies, we'll very likely need to also parse the `Gemfile`. It can be done +as a follow-up if the need arises. + +##### Prefetching Bundler +When running `bundle install`, Bundler will always try to fetch the exact version that is pinned in the `Gemfile.lock` +to perform the install. When doing an offline install from cache, a warning message is instead printed, but Bundler +will usually perform the install as expected. + +To allow users to use the pinned version instead of only relying on the Bundler version present in the base image, +Cachi2 could also prefetch the specific Bundler version needed for that project. This is easy to achieve, since Bundler +is treated as an ordinary Gem: https://rubygems.org/gems/bundler. + +This feature, however, is out of scope for the initial implementation, and could be added if there's user demand for +it. + +#### Providing the content for the hermetic build + +##### Setting the Bundler configuration + +The order of precedence for Bundler configuration options is as follows: + +1. Local config (`/.bundle/config or $BUNDLE_APP_CONFIG/config`) +2. Environment variables (ENV) +3. Global config (`~/.bundle/config`) +4. Bundler default config + +Since the local configuration takes higher precedence than the environment variables (except `BUNDLE_APP_CONFIG`), we +need to set the Bundler configuration options to make the build work. + +In order to do this, we can either use `inject-files` to overwrite the `.bundle/config` directory in the source folder, +or use `BUNDLE_APP_CONFIG` to point Bundler to a config directory within the Cachi2 output directory. The latter has +the benefit of not needing to dirty the cloned sources, but it wouldn't be able to support a multiple Ruby project per +repository scenario (since we would need to keep multiple configuration files). + +##### Relevant configuration for the build + +``` +BUNDLE_CACHE_PATH=${output_dir}/deps/rubygems +BUNDLE_DEPLOYMENT=true +BUNDLE_NO_PRUNE=true +``` + +- **BUNDLE_CACHE_PATH**: The directory that Bundler will place cached gems in when running bundle package, and that +Bundler will look in when installing gems. Defaults to `vendor/cache`. + +- **BUNDLE_DEPLOYMENT**: Disallow changes to the Gemfile. This also has the side effect of forcing Bundler to use the +local cache instead of trying to reach out for the Internet. This allows the hermetic build to work without forcing the +users to add the `--local` flag to the `bundler install` command. + +- **BUNDLE_NO_PRUNE**: Whether Bundler should leave outdated gems unpruned when caching. Since we're potentially using +a single cache folder for multiple Gems ("input packages" in Cachi2's terms), we need to make sure that the first +install won't prune any cached dependencies that are unrelated to it. + +For more information, see Bundler's [documentation](https://bundler.io/v2.5/man/bundle-config.1.html). + +###### Other configuration that was considered + +- `BUNDLE_ALLOW_OFFLINE_INSTALL` is not working either with `bundle install` for some reason, which could be probably +the most logical solution in this case. + +#### Generating the SBOM + +##### Main package metadata + +Ruby uses [Gem::Specification](https://guides.rubygems.org/specification-reference/) as a means of defining a Gem's +metadata, and it is usually defined in a `{gem-name}.gemspec` file. This file is not mandatory, though, and when it +exists, it needs to be explicitly imported in the `Gemfile`: + +``` +source "https://rubygems.org" + +gemspec +``` + +When the `.gemspec` file exists and is properly imported, it will be listed in the `Gemfile.lock` as a `PATH` +dependency: + +``` +PATH + remote: . + specs: + tmp (0.1.2) +``` + +Since the `remote` will always point to `.` in case of the main package [^main-package], we can safely use it to get +the `name` and `version` for the SBOM component. In case this block is absent, we will need to fallback to the +repository's remote `origin` to retrieve the main package's name, and leave the version empty, since it is not a +mandatory field. + +[^main-package]: In Cachi2's terms, the **main package** is the path in the repository that is currently being + processed. + +##### PURLs + +Also check the Ruby PURL [specification](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem). + +###### Standard Gem +```txt +pkg:gem/my-gem-name@0.1.1 +``` + +###### Git dependency + +```txt +pkg:gem/my-git-dependency?vcs_url=git%2Bhttps://github.com/my-org/mygem.git%26487618a68443e94d623bb585cb464b07d36702 +``` + +The metadata for a Git dependency can be read from the `Gemfile.lock`: + +``` +GIT + remote: https://github.com/my-org/mygem.git + revision: 26487618a68443e94d623bb585cb464b07d36702 + specs: + json-schema (3.0.0) + addressable (>= 2.4) +``` + +###### Path dependency + +```txt +pkg:gem/my-path-dependency?vcs_url=git%2Bhttps://github.com/my-org/mygem.git%40b6f47bd07e669c8d2eced8015c4bfb06db49949#subpath +``` + +The PURL is formed by the current repository remote origin URL and ref, and the subpath that is specified in the +`Gemfile.lock`: + +```ruby +PATH + remote: subpath + specs: + my-path-dependency (1.0.0) +``` + +## Decision + +The support for Rubygems will be implemented as currently described in this design document. + +### Scoping of the initial implementation +- design high-level code structure into multiple modules +- create a test repository that contains all the relevant use cases +- define models for RubyGems as the new package manager +- parse all gems from `Gemfile.lock` +- implement metadata parsing for the "main package" +- download all gems from rubygems.org +- download all gems from Git repositories +- validate path dependencies are relative to the project root +- inject the Bundler configuration needed for the offline install +- generate PURLs for all dependencies +- add integration and e2e tests +- add documentation + +### Potential follow-up features +- implement checksum parsing and validation when prefetching from the registry +- downloading the Bundler version specified in the `Gemfile.lock` +- support for pre-compiled binaries (platforms other than `ruby`) +- Gemfile.lock checksum validation (blocked by pending official support) +- reporting dev dependencies +- proper support for plugins \ No newline at end of file diff --git a/docs/designs/template-000.md b/docs/designs/template-000.md new file mode 100644 index 000000000..a3b4e63d3 --- /dev/null +++ b/docs/designs/template-000.md @@ -0,0 +1,9 @@ +# Title + +## Overview + +## Context + +## Description + +## Decision From eea2b4dd60bac3cc9bca3ba2562f49ebc74466f7 Mon Sep 17 00:00:00 2001 From: Bruno Pimentel Date: Thu, 15 Aug 2024 10:38:23 -0300 Subject: [PATCH 2/4] WIP: Getting rid of the Description section Allowing the middle of the doc to be composed of free-form sections greatly helps with reducing the nesting of titles. Signed-off-by: Bruno Pimentel --- docs/designs/rubygems-001.md | 75 +++++++++++++++++------------------- docs/designs/template-000.md | 2 +- 2 files changed, 37 insertions(+), 40 deletions(-) diff --git a/docs/designs/rubygems-001.md b/docs/designs/rubygems-001.md index a2bfad134..0fd677dbe 100644 --- a/docs/designs/rubygems-001.md +++ b/docs/designs/rubygems-001.md @@ -2,10 +2,9 @@ - [Overview](#overview) - [Context](#context) -- [Description](#description) - - [Ruby ecosystem overview](#i-ruby-ecosystem-overview) - - [Current implementation overview (Cachito)](#ii-overview-of-the-current-implementation-in-cachito) - - [Design for the Cachi2 implementation](#iii-design-for-the-implementation-in-cachi2) +- [Ruby ecosystem overview](#i-ruby-ecosystem-overview) +- [Current implementation overview (Cachito)](#ii-overview-of-the-current-implementation-in-cachito) +- [Design for the Cachi2 implementation](#iii-design-for-the-implementation-in-cachi2) - [Decision](#decision) ## Overview @@ -21,11 +20,9 @@ Design that covers the initial implementation of the Rubygem/Bundler package man ## Context In the effort to evolve Cachi2 to be a full solution for the prefetching feature of Cachito, we have decided to implement support to the currently missing package managers: Rubygems and Yarn v1. This design covers only the implementation of Rubygems, and the Yarn v1 design will follow. -## Description +## I. Ruby ecosystem overview -### I. Ruby ecosystem overview - -#### Development prerequisites +### Development prerequisites In order to execute the commands in the examples below, make sure you have the following packages installed in your environment: @@ -38,7 +35,7 @@ Or use the official Ruby image from Docker hub: podman run --rm -it docker.io/library/ruby:3.3.3 bash ``` -#### Project structure +### Project structure ```bash bundle init # creates Gemfile in the current directory bundle lock # creates Gemfile.lock in the current directory @@ -52,7 +49,7 @@ bundle lock # creates Gemfile.lock in the current directory ├── vendor/cache ``` -#### Glossary +### Glossary - **Gemfile**: A file that specifies the gems that your project depends on and their versions. Bundler uses this file to install the correct versions of gems for your project. @@ -95,12 +92,12 @@ RubyGems uses it to install, update, and uninstall gems. end ``` -#### Dependency types +### Dependency types There are four types of [sources](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L48) for dependencies in the `Gemfile.lock` file: -##### Gem dependencies +#### Gem dependencies Regular gem dependencies are located at the source URL, in our case, always . Each gem can be accessed by its name and version - rubygems.org/gems/``-``.gem @@ -128,7 +125,7 @@ GEM sprockets-rails (>= 2.0.0) ``` -##### Git dependencies +#### Git dependencies Example of a Git dependency in the `Gemfile.lock` file: ``` @@ -140,7 +137,7 @@ GIT porta (2.14.1) ``` -##### Path dependencies +#### Path dependencies Example of a path dependency in the `Gemfile.lock` file: ``` @@ -154,7 +151,7 @@ All path dependencies must be in the project directory. Bundler [does not copy](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/source/path.rb#L83) those dependencies that are already within the root directory of the project. -##### Plugins +#### Plugins Installing a plugin, even when on a folder that is a Bundler project, doesn't seem to affect the `Gemfile.lock`. The plugin seems to be installed by default in the `$PWD/.bundle/`. The `Gemfile.lock` does have a section for plugins, though, so further investigation would be needed. This initial investigation was done with the plugins listed under @@ -164,7 +161,7 @@ though, so further investigation would be needed. This initial investigation was extend the functionality of `gem` itself, and don't seem to have any impact on Bundler directly.* -#### Platforms +### Platforms Some gems may contain pre-compiled binaries that provide native extensions to the Ruby package. Any gem declared in the `Gemfile` can be limited to specific [platforms](https://bundler.io/v2.5/man/gemfile.5.html#PLATFORMS), making Bundler ignore it in case the project is @@ -192,7 +189,7 @@ PLATFORMS In case a user wants to force all the binaries to be compiled from source, the `BUNDLE_FORCE_RUBY_PLATFORM` environment variable can be used. -#### Dev dependencies +### Dev dependencies When adding a Gem into a Gemfile, the user might opt to nest them under a specific [group](https://bundler.io/guides/groups.html). The name of the group can be any string, but the usual groups tend to be common labels such as `:test`, `:development` or `:production`. @@ -214,7 +211,7 @@ Another way to declare a dependency in the `:development` group is to which is usually declared in the `.gemspec` file. This means we can safely assume that all dependencies under `:development` are dev dependencies. -#### Dependency checksums +### Dependency checksums The support to checksums in the `Gemfile.lock` is still in development, and currently is an [opt-in feature](https://github.com/rubygems/rubygems/pull/7217). To enable it, we need to manually add a `CHECKSUMS` section in the `Gemfile.lock`: @@ -243,7 +240,7 @@ CHECKSUMS This feature is available since Bundler [v2.5.0](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L55), from this [PR](https://github.com/rubygems/rubygems/pull/6374) being merged on Oct 21, 2023. -### II. Overview of the current implementation in Cachito +## II. Overview of the current implementation in Cachito [cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py) @@ -261,9 +258,9 @@ which is vendored from Source code for "official" Bundler lockfile parsing in Ruby: -### III. Design for the implementation in Cachi2 +## III. Design for the implementation in Cachi2 -#### Prefetching +### Prefetching Running a bundler command to fetch the dependencies always executes the `Gemfile`, which is arbitrary Ruby code. Executing arbitrary code is a security risk and makes it impossible to assert that the resulting SBOM is accurate @@ -287,7 +284,7 @@ A Gem can be fetched from its original location by using the following template: We should also leverage the existing code used to perform parallel downloads based on `asyncio` to download the necessary Gems from the internet. -##### Output folder structure +#### Output folder structure Bundler has a built-in feature to cache all dependencies locally. This is done with the `bundle cache --all` command or `bundle package --all` alias. In order to make bundler use the prefetched dependencies during the build, Cachi2 needs @@ -317,7 +314,7 @@ The name of the directory **must come from the Git URL**, not the actual name of contain unpacked source code. Any other format will cause bundler to try to re-download the repository, causing the build to fail. -###### Multiple Gems in a single repository +##### Multiple Gems in a single repository A single repository can hold multiple Gems, and those can be imported as dependencies. When this happens, Bundler still expects a single clone to be made. Here's an example of how multiple gems imported from a single repository+revision @@ -340,9 +337,9 @@ GIT nokogiri (~> 1, >= 1.10.8) ``` -#### Out of scope +### Out of scope -##### Plugins +#### Plugins Bundler has support for using [plugins](https://bundler.io/guides/bundler_plugins.html), which allows users to extend Bundler's functionality in any way that they seem fit. Since this can open the possibility for security issues, plugins will not be supported by Cachi2. @@ -350,7 +347,7 @@ will not be supported by Cachi2. Since we're not proposing the direct usage of Bundler to fetch the dependencies, no other actions are needed in the prefetch phase, existing plugin definitions will be ignored. -##### Pre-compiled binaries +#### Pre-compiled binaries For the initial implementation, we're aiming to provide support only for plain Ruby gems (which are idenfied as `ruby` in the `PLATFORMS` section of the `Gemfile.lock`). Platforms that relate to specific architectures will contain binaries that were pre-compiled for that architecture (see [Platforms](#platforms)). @@ -375,13 +372,13 @@ and also document it properly. Proper support for pre-compiled binaries should be probably left as a follow-up feature, similarly to what was done with [pip wheels](https://github.com/containerbuildsystem/cachi2/blob/main/docs/pip.md#distribution-formats). -##### Checksum verification +#### Checksum verification Since checksums in the `Gemlock.file` is still a feature in development (see [checksums](#dependency-checksums)), we can postpone implementing support for it until the feature is delivered. We need to decide if we will report all dependencies as having missing checksums in the SBOM, or not. -##### Dev dependencies +#### Dev dependencies Bundler declares all dev dependencies under the `:development` [group](#dependency-groups-or-how-bundler-deals-with-dev-dependencies). Unfortunately, groups declared in the `Gemfile` are not reflected in the `Gemfile.lock`. @@ -389,7 +386,7 @@ are not reflected in the `Gemfile.lock`. To implement proper reporting of dev dependencies, we'll very likely need to also parse the `Gemfile`. It can be done as a follow-up if the need arises. -##### Prefetching Bundler +#### Prefetching Bundler When running `bundle install`, Bundler will always try to fetch the exact version that is pinned in the `Gemfile.lock` to perform the install. When doing an offline install from cache, a warning message is instead printed, but Bundler will usually perform the install as expected. @@ -401,9 +398,9 @@ is treated as an ordinary Gem: https://rubygems.org/gems/bundler. This feature, however, is out of scope for the initial implementation, and could be added if there's user demand for it. -#### Providing the content for the hermetic build +### Providing the content for the hermetic build -##### Setting the Bundler configuration +#### Setting the Bundler configuration The order of precedence for Bundler configuration options is as follows: @@ -420,7 +417,7 @@ or use `BUNDLE_APP_CONFIG` to point Bundler to a config directory within the Cac the benefit of not needing to dirty the cloned sources, but it wouldn't be able to support a multiple Ruby project per repository scenario (since we would need to keep multiple configuration files). -##### Relevant configuration for the build +#### Relevant configuration for the build ``` BUNDLE_CACHE_PATH=${output_dir}/deps/rubygems @@ -441,14 +438,14 @@ install won't prune any cached dependencies that are unrelated to it. For more information, see Bundler's [documentation](https://bundler.io/v2.5/man/bundle-config.1.html). -###### Other configuration that was considered +##### Other configuration that was considered - `BUNDLE_ALLOW_OFFLINE_INSTALL` is not working either with `bundle install` for some reason, which could be probably the most logical solution in this case. -#### Generating the SBOM +### Generating the SBOM -##### Main package metadata +#### Main package metadata Ruby uses [Gem::Specification](https://guides.rubygems.org/specification-reference/) as a means of defining a Gem's metadata, and it is usually defined in a `{gem-name}.gemspec` file. This file is not mandatory, though, and when it @@ -478,16 +475,16 @@ mandatory field. [^main-package]: In Cachi2's terms, the **main package** is the path in the repository that is currently being processed. -##### PURLs +#### PURLs Also check the Ruby PURL [specification](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem). -###### Standard Gem +##### Standard Gem ```txt pkg:gem/my-gem-name@0.1.1 ``` -###### Git dependency +##### Git dependency ```txt pkg:gem/my-git-dependency?vcs_url=git%2Bhttps://github.com/my-org/mygem.git%26487618a68443e94d623bb585cb464b07d36702 @@ -504,7 +501,7 @@ GIT addressable (>= 2.4) ``` -###### Path dependency +##### Path dependency ```txt pkg:gem/my-path-dependency?vcs_url=git%2Bhttps://github.com/my-org/mygem.git%40b6f47bd07e669c8d2eced8015c4bfb06db49949#subpath diff --git a/docs/designs/template-000.md b/docs/designs/template-000.md index a3b4e63d3..cbfe7aab9 100644 --- a/docs/designs/template-000.md +++ b/docs/designs/template-000.md @@ -4,6 +4,6 @@ ## Context -## Description +## [Free-form sections] ## Decision From ac3767254b592b6dddb93cdba96ffd060fea1349 Mon Sep 17 00:00:00 2001 From: Bruno Pimentel Date: Mon, 19 Aug 2024 21:24:43 -0300 Subject: [PATCH 3/4] WIP: Improve the template and adapt the Ruby design to match it - dropped the author/date table - added updated by/obsoleted by examples - renamed 'Overview' to 'Introduction' - marked 'Context' as optional - added a description to each section Signed-off-by: Bruno Pimentel --- docs/designs/0000-template.md | 23 ++++++++++++ ...001.md => 0002-rubygems-initial-design.md} | 35 +++++++++---------- docs/designs/template-000.md | 9 ----- 3 files changed, 39 insertions(+), 28 deletions(-) create mode 100644 docs/designs/0000-template.md rename docs/designs/{rubygems-001.md => 0002-rubygems-initial-design.md} (96%) delete mode 100644 docs/designs/template-000.md diff --git a/docs/designs/0000-template.md b/docs/designs/0000-template.md new file mode 100644 index 000000000..bcef18b8c --- /dev/null +++ b/docs/designs/0000-template.md @@ -0,0 +1,23 @@ +# Title + +Updated by: [0000-this-doc-does-not-exist]() + +Obsoleted by: [0000-this-doc-does-not-exist]() + +*(these sections should only be added when necessary, and should always include the links to the relevant documents)* + +## Introduction + +This section should bring a short summary of what this document is covering, the main points it is going to describe, and why it is important. Ideally should be kept under two paragraphs. + +## Context [optional] + +Explains the context outside the mere technical aspects of the design that existed when it was written. It must make clear what other influences where at play that affected the decisions that were taken. In case there is nothing relevant to mention about the context, this section can be skipped. + +## [Free-form sections] + +The actual content of the design should be placed under one or more sections that can be named freely. Note that they should be level 2 headings (H2), but each section can also be subdivided by nested headings (H3, H4, H5 or H6) if needed. Only a single level 1 heading (H1) should exist, and it is strictly reserved for the title. + +## Decision/Outcome [pick one] + +Document here what decision was made, or what outcome is expected out of the design that was created. In case several options were presented in the design, this section must clearly state which one was chosen and the reasons for it. In case of a new package manager design, or a completely new feature to an existing package manager, this section can only include a brief summary of what the implementation will cover, since it is implied that the decision. \ No newline at end of file diff --git a/docs/designs/rubygems-001.md b/docs/designs/0002-rubygems-initial-design.md similarity index 96% rename from docs/designs/rubygems-001.md rename to docs/designs/0002-rubygems-initial-design.md index 0fd677dbe..c60c888d2 100644 --- a/docs/designs/rubygems-001.md +++ b/docs/designs/0002-rubygems-initial-design.md @@ -1,26 +1,25 @@ # Initial Design for the Rubygems Package Manager -- [Overview](#overview) +- [Introduction](#introduction) - [Context](#context) -- [Ruby ecosystem overview](#i-ruby-ecosystem-overview) -- [Current implementation overview (Cachito)](#ii-overview-of-the-current-implementation-in-cachito) -- [Design for the Cachi2 implementation](#iii-design-for-the-implementation-in-cachi2) -- [Decision](#decision) +- [Ruby ecosystem overview](#ruby-ecosystem-overview) +- [Current implementation overview (Cachito)](#overview-of-the-current-implementation-in-cachito) +- [Design for the Cachi2 implementation](#design-for-the-implementation-in-cachi2) +- [Outcomes](#outcomes) -## Overview +## Introduction -Design that covers the initial implementation of the Rubygem/Bundler package manager in Cachi2. It takes inspiration from the original implementation done in [Cachito](https://github.com/containerbuildsystem/cachito). +Design that covers the initial implementation of the Rubygem/Bundler package manager in Cachi2. It takes inspiration from the original implementation done in [Cachito](https://github.com/containerbuildsystem/cachito). -| | | -|---------|---------| -|Author |[Michal Šoltis](msoltis@redhat.com) | -|Co-author |[Bruno Pimentel](bpimente@redhat.com) | -|Proposed on |August 01, 2024 | +This document has three main parts: +- An overview on how the Ruby ecosystem works, touching only the parts that are relevant to the Cachi2 implementation +- A quick overview on how the implementation was done in Cachito +- The actual design for the implementation in Cachi2 ## Context In the effort to evolve Cachi2 to be a full solution for the prefetching feature of Cachito, we have decided to implement support to the currently missing package managers: Rubygems and Yarn v1. This design covers only the implementation of Rubygems, and the Yarn v1 design will follow. -## I. Ruby ecosystem overview +## Ruby ecosystem overview ### Development prerequisites In order to execute the commands in the examples below, make sure you have the following packages installed in your @@ -240,7 +239,7 @@ CHECKSUMS This feature is available since Bundler [v2.5.0](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L55), from this [PR](https://github.com/rubygems/rubygems/pull/6374) being merged on Oct 21, 2023. -## II. Overview of the current implementation in Cachito +## Overview of the current implementation in Cachito [cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py) @@ -258,7 +257,7 @@ which is vendored from Source code for "official" Bundler lockfile parsing in Ruby: -## III. Design for the implementation in Cachi2 +## Design for the implementation in Cachi2 ### Prefetching @@ -517,9 +516,7 @@ PATH my-path-dependency (1.0.0) ``` -## Decision - -The support for Rubygems will be implemented as currently described in this design document. +## Outcomes ### Scoping of the initial implementation - design high-level code structure into multiple modules @@ -535,7 +532,7 @@ The support for Rubygems will be implemented as currently described in this desi - add integration and e2e tests - add documentation -### Potential follow-up features +### Out of scope - implement checksum parsing and validation when prefetching from the registry - downloading the Bundler version specified in the `Gemfile.lock` - support for pre-compiled binaries (platforms other than `ruby`) diff --git a/docs/designs/template-000.md b/docs/designs/template-000.md deleted file mode 100644 index cbfe7aab9..000000000 --- a/docs/designs/template-000.md +++ /dev/null @@ -1,9 +0,0 @@ -# Title - -## Overview - -## Context - -## [Free-form sections] - -## Decision From 6d8d4a965893901925b35aaace99cd0199dbd8b9 Mon Sep 17 00:00:00 2001 From: Bruno Pimentel Date: Mon, 19 Aug 2024 21:26:14 -0300 Subject: [PATCH 4/4] WIP: Added the Go 1.21 toolchains design as an example Signed-off-by: Bruno Pimentel --- docs/designs/0001-golang-1-21.md | 206 +++++++++++++++++++++++++++++++ 1 file changed, 206 insertions(+) create mode 100644 docs/designs/0001-golang-1-21.md diff --git a/docs/designs/0001-golang-1-21.md b/docs/designs/0001-golang-1-21.md new file mode 100644 index 000000000..f59fd24fc --- /dev/null +++ b/docs/designs/0001-golang-1-21.md @@ -0,0 +1,206 @@ +# Go 1.21 toolchains + +## Overview + +Go 1.21 introduced a new feature called _toolchains_ which allow usage of different Go toolchains to be used for different modules simultaneously instead of the default bundled Go toolchain (i.e. the bundled `go` binary). The toolchain feature is enabled by the `toolchain` keyword which can either be in the `go.mod` file or the `go.work` (workspaces) file [[1]](#references). + +This document describes how this feature works and how it impacts the current implementation in Cachi2, and proposes a few ways the implementation can be approached. + +## Context + +[What was the context for this doc? I can't think of anything important to mention here, and that's why I marked this section as optional in the template]. + +## How it works + +### Important things to keep in mind +- the `go` line in `go.mod` denotes the **minimum required** Go version to use to compile a module +- the `toolchain` line denotes the **suggested** go toolchain version to be used + - however, the suggested toolchain version **must not be older than the minimum required Go version** [[2]](#references) +- different modules and workspaces can denote different versions of toolchains +- Go always ignores the `toolchain` keyword if that version is less than the bundled Go's version + +### Controlling toolchain behaviour +When it comes to toolchains Go's behaviour depends primarily on the `GOTOOLCHAIN` environment variable which can be set in a few different places. Go's lookup for environment variables goes as follows: + +1. the running process' environment is checked for variables (e.g. `GOTOOLCHAIN=` go command) +2. `$GOENV` variable is checked for user's environment configuration file [[3]](#references) (e.g. it may point to `/home/user/.config/go/env`) +3. `$GOROOT/go.env` is checked; this is the global installation environment file and its location is platform-dependent + - distro repackaged Go releases may freely set variables differently from the official vendored releases [[4]](#references) +5. `GOTOOLCHAIN=local` is used as the default + +### `GOTOOLCHAIN` selector values +The selector is what determines Go's behaviour when it comes to respecting the `toolchain` setting in `go.mod` or `go.work` files. +The value `GOTOOLCHAIN` accepts could be roughly described as `GOTOOLCHAIN=(+)?` where +`` can be one of: +- `local` - always run the default bundled toolchain; it's often paired with another selector, e.g. `local+path` which means that whenever possible Go should prefer the alternative toolchain +- `name` - always runs toolchain with that specific name; Go will look into its `PATH` and if such toolchain binary is missing it'll download it from the internet) +- `path` - same as `name` but DOES NOT download the toolchain if Go couldn't find it in `PATH`; in its bare form it's a shortcut for `local+path` +- `auto` - shortcut for `local+auto`, i.e download new toolchains as needed + +**For our purposes though we should only ever be concerned with `GOTOOLCHAIN=auto` and pre-fetch whatever toolchain version Go encounters during processing.** + +### Downloading toolchains +Luckily toolchains are nothing else than a gomod dependency which is cached like any other Go dependency. Unfortunately for us: +1. Go by default never looks into the mod cache for toolchains (not entirely true, read on...) and so **it always downloads it anew** +2. Go always wants to perform (not entirely true, read on...) checksum validation on the downloaded toolchains + - checksum validation is controlled by the `GOSUMDB` variable (see [Other Affecting Variables](#other-affecting-variables)) + +#### Go always performs checksum validation on toolchains +Starting with the checksums first as the downloads may be more clear later. So `GOSUMDB` was mentioned to control the checksum validation, so if `GOSUMDB=off` (Fedora default) what happens with either `go mod download` or `go build` while suggesting a `toolchain` to be used is that Go goes and fetches the toolchain from the internet, but then fails the checksum verification and so it refuses to proceed further with anything (e.g. downloading other dependencies or building anything) (see [Fetching toolchain with checksum disabled](#fetch-toolchain-no-checksum)). Therefore, to support Go 1.21 toolchains `GOSUMDB` needs to **be always set explicitly** by us and point to a valid checksum server as long as `GOTOOLCHAIN` is set to anything but `local` which forces Go to always use the bundled toolchain (see [GOTOOLCHAIN selectors](#selectors)). + +_Note there's also the `GONOSUMDB` variable (see [Other Affecting Variables](#other-affecting-variables)) which might give the impression that if we specify a pattern for toolchains, we should be able to either download toolchains insecurely (not that we want to!). Turns out toolchains are exempt from any checksum verification settings [[5]](#references) and so we need to account for this._ + +#### Go always downloads the toolchain +With setting `GOSUMDB` explicitly we only takes care of a part of the puzzle, because it only ensures that we'll pre-fetch a toolchain correctly, but will not make sure in any way that such toolchain is actually going to be used during the project's container build or even that such a container build would succeed in general, because Go will ultimately try to download it again (except it won't, because builds may be hermetic). +To solve the fetch/cache/use issue, one needs to can set the `GOPROXY` variable as well so that it points to the cache on the local file system [[6]](#references). That way Go will continue using its original logic and will fetch the toolchain from the pre-fetched dependencies cache (into the same cache btw, fill in your favourite meme...) instead of pulling it from the internet. So far so good, back to checksums. So, we already know that Go does checksum verification on downloaded toolchains by pointing it to a valid checksum server, so how do we get them verified during hermetic builds? Turns out, Go only performs toolchain checksum verification when pulling from the internet, but not when using the `file://` URL scheme inside `GOPROXY` \o/. In other words, Go only tries to verify remote resources, but somehow trusts local resources (see [Fetching with GOMODCACHE and GOPROXY](#fetch-with-gomodcache-and-goproxy)). + +And now the "Crème de la crème" of Go toolchain downloads and verification. Remember the section intro about Go seemingly never looking into `GOMODCACHE` for toolchains? It actually does, but guess what, it needs to verify the toolchains checksum (see [Use toolchain with an offline cache](#use-toolchain-with-an-offline-cache). + +### Example invocations +This section serves just as a reference to various invocations of Go forcing the toolchain usage. + +#### Fetching toolchain with checksum disabled +``` +$ GOTOOLCHAIN=auto go mod download +go: downloading go1.21.2 (linux/amd64) +go: download go1.21.2: golang.org/toolchain@v0.0.1-go1.21.2.linux-amd64: verifying module: checksum database disabled by GOSUMDB=off + +# try again to showcase Go will download the toolchain again (but let it pass the verification) +$ GOTOOLCHAIN=auto GOSUMDB=sum.golang.org go mod download +go: downloading go1.21.2 (linux/amd64) +... +go: downloading golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c +# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip +# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip: 200 OK (0.007s) +# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod +# get https://proxy.golang.org/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod: 200 OK (0.005s) +... +``` + +#### Use toolchain with an offline cache +``` +$ GOTOOLCHAIN=auto GOSUMDB=off GOMODCACHE=/tmp/gomod/ go mod download --json +go: golang.org/toolchain@v0.0.1-go1.21.2.linux-amd64: verifying module: checksum database disabled by GOSUMDB=off +``` + +#### Fetching with GOMODCACHE and GOPROXY +``` +$ GOTOOLCHAIN=auto GOSUMDB=off GOMODCACHE=/tmp/gomod/ GOPROXY=file:///tmp/gomod/cache/download go mod download --json + { + "Path": "golang.org/x/text", + "Version": "v0.0.0-20170915032832-14c0d48ead0c", + "Info": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.info", + "GoMod": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.mod", + "Zip": "/tmp/gomod/cache/download/golang.org/x/text/@v/v0.0.0-20170915032832-14c0d48ead0c.zip", + "Dir": "/tmp/gomod/golang.org/x/text@v0.0.0-20170915032832-14c0d48ead0c", + "Sum": "h1:qgOY6WgZOaTkIIMiVjBQcw93ERBE4m30iBm00nkL0i8=", + "GoModSum": "h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=" +} +... +``` + +## Proposed solutions + +### 1. Ignore toolchains altogether +Since we only need to control Go's behaviour using environment variables, these are the necessary settings: + +#### Dependency pre-fetch Go configuration +Set the following on top of our existing env settings: +`GOTOOLCHAIN=local` + +#### User container build Go configuration +Set the following on top of our existing env settings: +`GOTOOLCHAIN=local` + +#### Advantages: +- pretty much no real work needed on our side + +#### Disadvantages: +- may not shield us properly from users wanting the toolchain feature in future, i.e. someone may come complaining (they will!) +- we'll have to keep up with frequent Go releases and update the container image promptly, in other words continue what we do now + +### 2. Pre-fetch ALL affecting toolchains +Let Go download all toolchains it may need (for all modules). Since we only need to control Go's behaviour using environment variables, these are the necessary settings: + +#### Dependency pre-fetch Go configuration +Set the following on top of our existing env settings: +`GOTOOLCHAIN=auto` +`GOSUMDB=sum.golang.org` (we already set this one...) + +#### User container build Go configuration +Set the following on top of our existing env settings: +`GOTOOLCHAIN=auto` +`GOSUMDB=off` +`GOPROXY=file://${GOMODCACHE}/cache/download` + +#### Advantages: +- we'll be properly supporting Go 1.21 features going in the future +- in the very unprobable theory we might not need to keep bumping Go versions in our container image **that often** if users make use of toolchains properly, because if they set `go 1.21` and then `toolchain 1.24.0` (provided 1.24 is out) then the 1.21 binary will be able to download a newer toolchain and use it for compiling; however, this whole idea crumbles like a house of cards when it comes to features that include new keywords + +#### Disadvantages: +- potentially failing builds due to transitively incompatible (i.e. with indirect dependencies) `go` and `toolchain` versions set - but this is a user problem in general though, not cachi2's (mentioning it just in case) + +## Decision + +Since the [second proposed solution](#2-pre-fetch-all-affecting-toolchains) allows us to both account for the proper support of the toolchains feature, and also to dimish the amount of work needed to support newer versions, the team has opted for it. + +## References +[1] standard Go rules apply when it comes to evaluating modules vs workspaces +[2] there are some strict rules when it comes to versioning of suggested toolchains which applies transitively too across all dependencies, but luckily that's not our problem to figure out! As noted in https://go.dev/doc/toolchain: + +> A module’s go line must declare a version greater than or equal to the go version declared by each of the modules listed in require statements. A workspace’s go line must declare a version greater than or equal to the go version declared by each of the modules listed in use statements. + +[3] the `go.env` file has a simple `KEY=VALUE` structure +[4] Fedora repackaged _go.env_: +``` +$ cat /usr/lib/golang/go.env + +# This file contains the initial defaults for go command configuration. +# Values set by 'go env -w' and written to the user's go/env file override these. +# The environment overrides everything else. + +# Use the Go module mirror and checksum database by default. +# See https://proxy.golang.org for details. +GOPROXY=direct +GOSUMDB=off + +# Automatically download newer toolchains as directed by go.mod files. +# See https://go.dev/doc/toolchain for details. +GOTOOLCHAIN=local +``` +Go's default vendored _go.env_ : +``` +# This file contains the initial defaults for go command configuration. +# Values set by 'go env -w' and written to the user's go/env file override these. +# The environment overrides everything else. + +# Use the Go module mirror and checksum database by default. +# See https://proxy.golang.org for details. +GOPROXY=https://proxy.golang.org,direct +GOSUMDB=sum.golang.org + +# Automatically download newer toolchains as directed by go.mod files. +# See https://go.dev/doc/toolchain for details. +GOTOOLCHAIN=auto +``` + +[5] Snippet from the docs: +> toolchain downloads fail for lack of verification if GOSUMDB=off. GOPRIVATE and GONOSUMDB patterns do not apply to the toolchain downloads. + +[6] Snippet from the docs: +> A module cache may be used directly as a file proxy: `GOPROXY=file://$(go env GOMODCACHE)/cache/download` + +## Resources +- https://go.dev/doc/toolchain +- https://go.dev/ref/mod#environment-variables + +## Apendix +### Other affecting variables +There are a couple of other variables that further affect module checksum verification and the download process: +- `GOSUMDB` - Identifies the name of the checksum database to use and optionally its public key and URL, normally defaults to `sum.golang.org`, on Fedora it's `"off"` [[5]](#references) +- `GONOSUMDB` - Comma-separated list of glob patterns of module path prefixes for which the go should not verify checksums using the checksum database +- `GOPROXY` - a (comma separated) list of URLs pointing to module proxies (instead of downloading modules directly from VCS); it supports the following URL schemes: + - `https/http` + - `file` +- `GOPRIVATE` - list of glob patterns of module path prefixes that should be considered private. Acts as a default value for `GONOPROXY` and `GONOSUMDB`. This is useful when no proxy serves private modules +