Skip to content

Commit

Permalink
cachi2 rubygems / bundler design document
Browse files Browse the repository at this point in the history
Signed-off-by: Michal Šoltis <[email protected]>
  • Loading branch information
slimreaper35 committed Jul 15, 2024
1 parent f57bb26 commit 516c402
Showing 1 changed file with 390 additions and 0 deletions.
390 changes: 390 additions & 0 deletions docs/design/rubygems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,390 @@
# Design document for RubyGems/Bundler package manager

## Development prerequisites

```bash
sudo dnf install rubygems rubygems-bundler
```

## Main files

```bash
bundle init # creates Gemfile in the current directory
bundle lock # creates Gemfile.lock in the current directory
```

```bash
├── .bundle
│ └── config
├── Gemfile
├── Gemfile.lock
├── vendor/cache
```

### Glossary

- **Gemfile**: A file that specifies the gems that your project depends on and their versions.
Bundler uses this file to install the correct versions of gems for your project.

```ruby
source "https://rubygems.org"

gem "rails", "= 6.1.7"
```

- **Gemfile.lock**: A file that locks the versions of gems that are installed for your project.
Bundler uses this file to ensure that the correct versions of gems are installed consistently across different environments.

```ruby
GIT
...
PATH
...
GEM
...
PLUGIN
...
PLATFORMS
...
DEPENDENCIES
...
CHECKSUMS
...
BUNDLED WITH
...
```

See dependencies [section](#dependencies) for specfic types of dependencies.

- **RubyGems**: General package manager for Ruby. Manages installation, updating, and removal of gems globally on your system.

```bash
gem --help
```

- **Bundler**: Dependency management tool for Ruby projects.
Ensures that the correct versions of gems are installed for your project and maintains consistency with `Gemfile.lock`.

```bash
bundler --help
```

- **Gem**: A package that can be installed and managed by Rubygems.
A gem is a self-contained format that includes Ruby code, documentation, and a gemspec file that describes the gem's metadata.

- **{gem}.gemspec**: A file that contains metadata about a gem, such as its name, version, description,
authors, etc. It is used by RubyGems to install, update, and uninstall gems.

```ruby
Gem::Specification.new do |spec|
spec.name = "example"
spec.version = "0.1.0"
spec.authors = ["Nobody"]
spec.email = ["[email protected]"]
spec.summary = "Write a short summary, because RubyGems requires one."
end
```

## cachito implementation

[cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py)

The majority of work is already done by parsing the `Gemfile.lock` file, which pins all dependencies to exact versions.
The only source for gem dependencies to be fetched from is <https://rubygems.org>.
Git dependencies are specified using a repo URL and pinned to a commit hash.
Path dependencies are specified using a local path.

Bundler always executes the `Gemfile`, which is arbitrary ruby code.
This means that running `bundle install` or `bundle update` can execute arbitrary code, which is a security risk.
That's why bundler **is not used** to download dependencies.
Instead, as stated above, cachito parses `Gemfile.lock` file directly and download the gems from <https://rubygems.org>.

**Note**: parsing `Gemfile.lock` is done via [gemlock-parser](https://github.com/containerbuildsystem/gemlock-parser),
which is vendored from [scancode-toolkit](https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/gemfile_lock.py).

`Gemfile` example:

```ruby
source "https://rubygems.org"

gem "rails", "= 6.1.7"

system("echo 'Hello, world!'")
system("sudo rm -rf /")
```

Source code for "official" bundler lockfile parsing in Ruby:

<https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb>

### Missing features

Bundler is not pinned as a dependency with version in the `Gemfile.lock` (even if it is pinned in the `Gemfile`).
It only appears in the `BUNDLED WITH` section in the `Gemfile.lock` file.
But it is important for the same version of bundler to be installable and used for resolving dependencies.
Using the bundler from the build image usually does not fit.

## cachi2 implementation - TBD

_The old way of implementing new package managers to one big module is no longer prefered._
_New package managers should split the logic into more self-contained modules wrapped in a package._

### Vendoring solution

Bundler has a built-in feature to cache all dependencies locally. This is done with the `bundle cache` command or `bundle package` alias.
Default cache directory is `vendor/cache`.
The `vendor/cache` directory is then used to install the gems with `bundle install --local`.
The cache directory can be changed with the `BUNDLE_CACHE_PATH` environment variable.

### Dependencies

There four types of [sources](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L48) for dependencies in the `Gemfile.lock` file:

#### Gem dependencies

Regular gem dependencies located at source url, in our case always <https://rubygems.org>.
Each gem can be accessed by its name and version - rubygems.org/gems/`<name>`-`<version>`.gem

Example of a gem dependency in the `Gemfile.lock` file:

```Gemfile.lock
...
GEM
remote: https://rubygems.org/
specs:
...
rails (6.1.4)
# transitive dependencies
actioncable (= 6.1.4)
actionmailbox (= 6.1.4)
actionmailer (= 6.1.4)
actionpack (= 6.1.4)
actiontext (= 6.1.4)
actionview (= 6.1.4)
activejob (= 6.1.4)
activemodel (= 6.1.4)
activerecord (= 6.1.4)
activestorage (= 6.1.4)
activesupport (= 6.1.4)
bundler (>= 1.15.0)
railties (= 6.1.4)
sprockets-rails (>= 2.0.0)
...
```

#### Git dependencies

Example of a git dependency in the `Gemfile.lock` file:

```Gemfile.lock
...
GIT
remote: https://github.com/porta.git
revision: 779beabd653afcd03c4468e0a69dc043f3bbb748
branch: main
specs:
porta (2.14.1)
...
```

Bundler uses this [format](https://github.com/rubygems/rubygems/blob/3da9b1dda0824d1d770780352bb1d3f287cb2df5/bundler/lib/bundler/source/git.rb#L130) to cache git repositories:

```ruby
"#{base_name}-#{shortref_for_path(revision)}"
```

Any other format will cause bundler to re-download the repository -> cache invalidation -> the build will fail.

#### Path dependencies

Example of a path dependency in the `Gemfile.lock` file:

```Gemfile.lock
...
PATH
remote: some/pathgem
specs:
pathgem (0.1.0)
...
```

All path dependencies must be in the project directory, anything else does not make sense.
Bundler [does not copy](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/source/path.rb#L83)
those dependencies that are already within the root directory of the project.

#### Plugins

Not supported by cachi2.

### Platforms

Some gems may contain pre-compiled binaries that provide native extensions to the Ruby package.
One of the goals of cachi2 is to enforce building from source as much as possible
([pip wheels](https://github.com/containerbuildsystem/cachi2/blob/main/docs/pip.md#distribution-formats) are an exception).

To satisfy this goal, we need some way of avoiding dependencies that contain binaries.
This can be achieved through the `BUNDLE_FORCE_RUBY_PLATFORM` environment variable.
See environment variables [section](#environment-variables).

For example - all versions (platforms) of nokogiri gem:

<https://rubygems.org/gems/nokogiri/versions/>

### Checksums

Checksum validation is enabled by default.
It can be disabled with the `BUNDLE_DISABLE_CHECKSUM_VALIDATION` environment variable.

There is also an option to generate checksums in `Gemfile.lock`, but in very weird way.
(At least, I have not found any other way to do it.)

```shell
# manually add `CHECKSUMS` section somewhere to the Gemfile.lock
vim Gemfile.lock
# install any gem
bundle add rails --version "6.1.7"
# check the Gemfile.lock /o\
cat Gemfile.lock
```

Example of a checksum section in the `Gemfile.lock` file from my custom [repository](https://github.com/slimreaper35/cachi2-rubygems.git):

```Gemfile.lock
...
DEPENDENCIES
rails (= 6.1.7)
CHECKSUMS
actioncable (6.1.7) sha256=ee5345e1ac0a9ec24af8d21d46d6e8d85dd76b28b14ab60929c2da3e7d5bfe64
actionmailbox (6.1.7) sha256=c4364381e724b39eee3381e6eb3fdc80f121ac9a53dea3fd9ef687a9040b8a08
actionmailer (6.1.7) sha256=5561c298a13e6d43eb71098be366f59be51470358e6e6e49ebaaf43502906fa4
actionpack (6.1.7) sha256=3a8580e3721757371328906f953b332d5c95bd56a1e4f344b3fee5d55dc1cf37
actiontext (6.1.7) sha256=c5d3af4168619923d0ff661207215face3e03f7a04c083b5d347f190f639798e
actionview (6.1.7) sha256=c166e890d2933ffbb6eb2a2eac1b54f03890e33b8b7269503af848db88afc8d4
...
BUNDLED WITH
2.5.14
```

I believe this feature is available since bundler [v2.5.0](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L55)
from this [PR](https://github.com/rubygems/rubygems/pull/6374) being merged on Oct 21, 2023.

### Environment variables

The order of precedence for bundler configuration options is as follows:

1. Local config (`<project_root>/.bundle/config or $BUNDLE_APP_CONFIG/config`)
2. Environmental variables (ENV)
3. Global config (`~/.bundle/config`)
4. Bundler default config

#### Relevant environment variables

```txt
BUNDLE_FORCE_RUBY_PLATFORM=true
BUNDLE_CACHE_ALL=true
BUNDLE_CACHE_PATH=${output_dir}/deps/rubygems
```

**BUNDLE_CACHE_ALL**: Cache all gems, including path and git gems.
This needs to be explicitly configured on bundler 1 and bundler 2, but will be the default on bundler 3.

**BUNDLE_CACHE_PATH**: The directory that bundler will place cached gems in when running bundle package,
and that bundler will look in when installing gems. Defaults to `vendor/cache`.

**BUNDLE_FORCE_RUBY_PLATFORM**: Ignore the current machine's platform and install only ruby platform gems.
As a result, gems with native extensions will be compiled from source.

See bundle config [documentation](https://bundler.io/v2.5/man/bundle-config.1.html).

Since the local configuration takes higher precedence than the environment variables (except BUNLDE_APP_CONFIG),
we need to set the bundler configuration options to make the build work.
If the local configuration file does not exist, we can easily set the environment variables.

##### Copy

Copy the local configuration file from the user repository to {output_dir} and set BUNDLE_APP_CONFIG to the new location.
Then just append all the environment variables needed to the "new" copy of the user configuration file.
Bundler will rewrite previous values with the new ones when installing gems.

#### Inject

The other solution would be inject the config file directly and rewrite the values.

### Metadata

### git repository URL

- git repository URL is used in other package managers as well
- no version information available
- gems in the repository are basically path dependencies in the `Gemfile.lock` ?!

### `{gem}.gemspec` file

- the file is optional / gems in the repository are basically path dependencies in the `Gemfile.lock` ?!
- complete metadata about the gem

Gemfile must contain a _gemspec_ line + the `{gem}.gemspec` file must be present in the repository.
Bundle will add the gem as path dependency to the `Gemfile.lock` file.
This could be done via gemlock-parser by checking the path.

```ruby
source "https://rubygems.org"

gemspec
...
```

```Gemfile.lock
...
PATH
remote: .
specs:
tmp (0.1.2)
...
```

### PURL

Examples from [github.com/purl-spec](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem).
The platform qualifiers key is used to specify an alternative platform, such as java for JRuby, (not relevant for cachi2).

```txt
pkg:gem/[email protected]
pkg:gem/[email protected]?platform=java
```

- **name:** gem name
- **namespace:** N/A
- **qualifiers:** vcs_url (GIT dependencies), checksum
- **subpath:** subpath from the root (PATH dependencies)
- **type:** "gem"
- **version:** gem version

### Summary

- define models for rubygems as the new package manager
- design high level code structure into multiple modules
- parse all gems from `Gemfile.lock`
- implement metadata parsing either from git origin url or `Gemfile.lock`
- download all gems from rubygems.org including bundler
- download all gems from git repositories
- validate path dependencies are relative to the project root
- generate PURLs for all dependencies
- add integration and e2e tests
- add documentation
- implement checksums parsing and validation when prefetching

### Testing repositories

#### Integration tests

- [cachito-rubygems-without-deps](https://github.com/cachito-testing/cachito-rubygems-without-deps.git)
- [cachito-rubygems-with-dependencies](https://github.com/cachito-testing/cachito-rubygems-with-dependencies.git)
- [cachito-rubygems-multiple](https://github.com/cachito-testing/cachito-rubygems-multiple.git)
- [3scale/porta](https://github.com/3scale/porta.git)

#### E2E tests (custom repository)

- [slimreaper35/cachi2-rubygems](https://github.com/slimreaper35/cachi2-rubygems.git)

0 comments on commit 516c402

Please sign in to comment.