From 8bc45fca9bdf195ff1ee06ea2fe1384fca607c6b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20=C5=A0oltis?= Date: Thu, 11 Jul 2024 13:19:00 +0200 Subject: [PATCH] cachi2 rubygems / bundler design document MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Michal Šoltis --- docs/design/rubygems.md | 391 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 391 insertions(+) create mode 100644 docs/design/rubygems.md diff --git a/docs/design/rubygems.md b/docs/design/rubygems.md new file mode 100644 index 000000000..7085c0b6e --- /dev/null +++ b/docs/design/rubygems.md @@ -0,0 +1,391 @@ +# Design document for RubyGems/Bundler package manager + +## Development prerequisites + +```bash +sudo dnf install rubygems rubygems-bundler +``` + +## Main files + +```bash +bundle init # creates Gemfile in the current directory +bundle lock # creates Gemfile.lock in the current directory +``` + +```bash +├── .bundle +│ └── config +├── Gemfile +├── Gemfile.lock +├── vendor/cache +``` + +### Glossary + +- **Gemfile**: A file that specifies the gems that your project depends on and their versions. +Bundler uses this file to install the correct versions of gems for your project. + +```ruby +source "https://rubygems.org" + +gem "rails", "= 6.1.7" +``` + +- **Gemfile.lock**: A file that locks the versions of gems that are installed for your project. +Bundler uses this file to ensure that the correct versions of gems are installed consistently across different environments. + +```ruby +GIT + ... +PATH + ... +GEM + ... +PLUGIN + ... +PLATFORMS + ... +DEPENDENCIES + ... +CHECKSUMS + ... +BUNDLED WITH + ... +``` + +See dependencies [section](#dependencies) for specfic types of dependencies. + +- **RubyGems**: General package manager for Ruby. Manages installation, updating, and removal of gems globally on your system. + +```bash +gem --help +``` + +- **Bundler**: Dependency management tool for Ruby projects. +Ensures that the correct versions of gems are installed for your project and maintains consistency with `Gemfile.lock`. + +```bash +bundler --help +``` + +- **Gem**: A package that can be installed and managed by Rubygems. +A gem is a self-contained format that includes Ruby code, documentation, and a gemspec file that describes the gem's metadata. + +- **{gem}.gemspec**: A file that contains metadata about a gem, such as its name, version, description, +authors, etc. It is used by RubyGems to install, update, and uninstall gems. + +```ruby +Gem::Specification.new do |spec| + spec.name = "example" + spec.version = "0.1.0" + spec.authors = ["Nobody"] + spec.email = ["ruby@example.com"] + spec.summary = "Write a short summary, because RubyGems requires one." +end +``` + +## cachito implementation + +[cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py) + +The majority of work is already done by parsing the `Gemfile.lock` file, which pins all dependencies to exact versions. +The only source for gem dependencies to be fetched from is . +Git dependencies are specified using a repo URL and pinned to a commit hash. +Path dependencies are specified using a local path. + +Bundler always executes the `Gemfile`, which is arbitrary ruby code. +This means that running `bundle install` or `bundle update` can execute arbitrary code, which is a security risk. +That's why bundler **is not used** to download dependencies. +Instead, as stated above, cachito parses `Gemfile.lock` file directly and download the gems from . + +**Note**: parsing `Gemfile.lock` is done via [gemlock-parser](https://github.com/containerbuildsystem/gemlock-parser), +which is vendored from [scancode-toolkit](https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/gemfile_lock.py). + +`Gemfile` example: + +```ruby +source "https://rubygems.org" + +gem "rails", "= 6.1.7" + +system("echo 'Hello, world!'") +system("sudo rm -rf /") +``` + +Source code for "official" bundler lockfile parsing in Ruby: + + + +### Missing features + +Bundler is not pinned as a dependency with version in the `Gemfile.lock` (even if it is pinned in the `Gemfile`). +It only appears in the `BUNDLED WITH` section in the `Gemfile.lock` file. +But it is important for the same version of bundler to be installable and used for resolving dependencies. +Using the bundler from the build image usually does not fit. + +## cachi2 implementation - TBD + +_The old way of implementing new package managers to one big module is no longer prefered._ +_New package managers should split the logic into more self-contained modules wrapped in a package._ + +### Vendoring solution + +Bundler has a built-in feature to cache all dependencies locally. This is done with the `bundle cache` command or `bundle package` alias. +Default cache directory is `vendor/cache`. +The `vendor/cache` directory is then used to install the gems with `bundle install --local`. +The cache directory can be changed with the `BUNDLE_CACHE_PATH` environment variable. + +### Dependencies + +There four types of [sources](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L48) for dependencies in the `Gemfile.lock` file: + +#### Gem dependencies + +Regular gem dependencies located at source url, in our case always . +Each gem can be accessed by its name and version - rubygems.org/gems/``-``.gem + +Example of a gem dependency in the `Gemfile.lock` file: + +```Gemfile.lock +... +GEM + remote: https://rubygems.org/ + specs: + ... + rails (6.1.4) + # transitive dependencies + actioncable (= 6.1.4) + actionmailbox (= 6.1.4) + actionmailer (= 6.1.4) + actionpack (= 6.1.4) + actiontext (= 6.1.4) + actionview (= 6.1.4) + activejob (= 6.1.4) + activemodel (= 6.1.4) + activerecord (= 6.1.4) + activestorage (= 6.1.4) + activesupport (= 6.1.4) + bundler (>= 1.15.0) + railties (= 6.1.4) + sprockets-rails (>= 2.0.0) +... +``` + +#### Git dependencies + +Example of a git dependency in the `Gemfile.lock` file: + +```Gemfile.lock +... +GIT + remote: https://github.com/porta.git + revision: 779beabd653afcd03c4468e0a69dc043f3bbb748 + branch: main + specs: + porta (2.14.1) +... +``` + +Bundler uses this [format](https://github.com/rubygems/rubygems/blob/3da9b1dda0824d1d770780352bb1d3f287cb2df5/bundler/lib/bundler/source/git.rb#L130) to cache git repositories: + +```ruby +"#{base_name}-#{shortref_for_path(revision)}" +``` + +Any other format will cause bundler to re-download the repository -> cache invalidation -> the build will fail. + +#### Path dependencies + +Example of a path dependency in the `Gemfile.lock` file: + +```Gemfile.lock +... +PATH + remote: some/pathgem + specs: + pathgem (0.1.0) +... +``` + +All path dependencies must be in the project directory, anything else does not make sense. +Bundler [does not copy](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/source/path.rb#L83) +those dependencies that are already within the root directory of the project. + +#### Plugins + +Not supported by cachi2. + +### Platforms + +Some gems may contain pre-compiled binaries that provide native extensions to the Ruby package. +One of the goals of cachi2 is to enforce building from source as much as possible +([pip wheels](https://github.com/containerbuildsystem/cachi2/blob/main/docs/pip.md#distribution-formats) are an exception). + +To satisfy this goal, we need some way of avoiding dependencies that contain binaries. +This can be achieved through the `BUNDLE_FORCE_RUBY_PLATFORM` environment variable. +See environment variables [section](#environment-variables). + +For example - all versions (platforms) of nokogiri gem: + + + +### Checksums + +Checksum validation is enabled by default. +It can be disabled with the `BUNDLE_DISABLE_CHECKSUM_VALIDATION` environment variable. + +There is also an option to generate checksums in `Gemfile.lock`, but in very weird way. +(At least, I have not found any other way to do it.) + +```shell +# manually add `CHECKSUMS` section somewhere to the Gemfile.lock +vim Gemfile.lock +# install any gem +bundle add rails --version "6.1.7" +# check the Gemfile.lock /o\ +cat Gemfile.lock +``` + +Example of a checksum section in the `Gemfile.lock` file from my custom [repository](https://github.com/slimreaper35/cachi2-rubygems.git): + +```Gemfile.lock +... +DEPENDENCIES + rails (= 6.1.7) + +CHECKSUMS + actioncable (6.1.7) sha256=ee5345e1ac0a9ec24af8d21d46d6e8d85dd76b28b14ab60929c2da3e7d5bfe64 + actionmailbox (6.1.7) sha256=c4364381e724b39eee3381e6eb3fdc80f121ac9a53dea3fd9ef687a9040b8a08 + actionmailer (6.1.7) sha256=5561c298a13e6d43eb71098be366f59be51470358e6e6e49ebaaf43502906fa4 + actionpack (6.1.7) sha256=3a8580e3721757371328906f953b332d5c95bd56a1e4f344b3fee5d55dc1cf37 + actiontext (6.1.7) sha256=c5d3af4168619923d0ff661207215face3e03f7a04c083b5d347f190f639798e + actionview (6.1.7) sha256=c166e890d2933ffbb6eb2a2eac1b54f03890e33b8b7269503af848db88afc8d4 + ... + +BUNDLED WITH + 2.5.14 +``` + +I believe this feature is available since bundler [v2.5.0](https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb#L55) +from this [PR](https://github.com/rubygems/rubygems/pull/6374) being merged on Oct 21, 2023. + +### Environment variables + +The order of precedence for bundler configuration options is as follows: + +1. Local config (`/.bundle/config or $BUNDLE_APP_CONFIG/config`) +2. Environmental variables (ENV) +3. Global config (`~/.bundle/config`) +4. Bundler default config + +#### Relevant environment variables + +```txt +BUNDLE_FORCE_RUBY_PLATFORM=true +BUNDLE_CACHE_ALL=true +BUNDLE_CACHE_PATH=${output_dir}/deps/rubygems +``` + +**BUNDLE_CACHE_ALL**: Cache all gems, including path and git gems. +This needs to be explicitly configured on bundler 1 and bundler 2, but will be the default on bundler 3. + +**BUNDLE_CACHE_PATH**: The directory that bundler will place cached gems in when running bundle package, +and that bundler will look in when installing gems. Defaults to `vendor/cache`. + +**BUNDLE_FORCE_RUBY_PLATFORM**: Ignore the current machine's platform and install only ruby platform gems. +As a result, gems with native extensions will be compiled from source. + +See bundle config [documentation](https://bundler.io/v2.5/man/bundle-config.1.html). + +Since the local configuration takes higher precedence than the environment variables (except BUNLDE_APP_CONFIG), +we need to set the bundler configuration options to make the build work. +If the local configuration file does not exist, we can easily set the environment variables. + +##### Copy + +Copy the local configuration file from the user repository to {output_dir} and set BUNDLE_APP_CONFIG to the new location. +Then just append all the environment variables needed to the "new" copy of the user configuration file. +Bundler will rewrite previous values with the new ones when installing gems. + +#### Inject + +The other solution would be inject the config file directly and rewrite the values. + +### Metadata + +### git repository URL + +- git repository URL is used in other package managers as well +- no version information available +- gems in the repository are basically path dependencies in the `Gemfile.lock` ?! + +### `{gem}.gemspec` file + +- the file is optional / gems in the repository are basically path dependencies in the `Gemfile.lock` ?! +- complete metadata about the gem + +Gemfile must contain a _gemspec_ line + the `{gem}.gemspec` file must be present in the repository. +Bundle will add the gem as path dependency to the `Gemfile.lock` file. +This could be done via gemlock-parser by checking the path. + +```ruby +source "https://rubygems.org" + +gemspec +... +``` + +```Gemfile.lock +... +PATH + remote: . + specs: + tmp (0.1.2) +... +``` + +### PURL + +Examples from [github.com/purl-spec](https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem). +The platform qualifiers key is used to specify an alternative platform, such as java for JRuby, (not relevant for cachi2). + +```txt +pkg:gem/ruby-advisory-db-check@0.12.4 +pkg:gem/jruby-launcher@1.1.2?platform=java +``` + +- **name:** gem name +- **namespace:** N/A +- **qualifiers:** vcs_url (GIT dependencies), checksum +- **subpath:** subpath from the root (PATH dependencies) +- **type:** "gem" +- **version:** gem version + +### Summary + +TODO + +- design high level code structure into multiple modules +- define models for rubygems as the new package manager +- parse all gems from `Gemfile.lock` +- implement metadata parsing either from git origin url or `Gemfile.lock` +- download all gems from rubygems.org including bundler +- download all gems from git repositories +- validate path dependencies are relative to the project root +- generate PURLs for all dependencies +- add integration and e2e tests +- implement checksums parsing and validation when prefetching + +### Testing repositories + +#### Integration tests + +- [cachito-rubygems-without-deps](https://github.com/cachito-testing/cachito-rubygems-without-deps.git) +- [cachito-rubygems-with-dependencies](https://github.com/cachito-testing/cachito-rubygems-with-dependencies.git) +- [cachito-rubygems-multiple](https://github.com/cachito-testing/cachito-rubygems-multiple.git) +- [3scale/porta](https://github.com/3scale/porta.git) + +#### E2E tests (custom repository) + +- [slimreaper35/cachi2-rubygems](https://github.com/slimreaper35/cachi2-rubygems.git)