Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundler design document #565

Merged
merged 1 commit into from
Aug 27, 2024
Merged

Conversation

slimreaper35
Copy link
Member

@slimreaper35 slimreaper35 commented Jul 2, 2024

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

cachi2/core/package_managers/rubygems.py Outdated Show resolved Hide resolved
cachi2/core/models/input.py Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
@slimreaper35 slimreaper35 force-pushed the ruby branch 3 times, most recently from 999eefb to b06b948 Compare July 4, 2024 11:56
Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review (made it to the Checksums section) 🙂

My general observation is that it's kind of hard to derive implementation guidance from the doc because it's not always clear which parts are just general commentary on RubyGems/Bundler and which parts describe the implementation decisions

docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
@slimreaper35 slimreaper35 force-pushed the ruby branch 2 times, most recently from 67d7eca to 9b22d17 Compare July 8, 2024 14:14
@slimreaper35 slimreaper35 changed the title cachi2 ruby PoC cachi2 bundler PoC Jul 8, 2024
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
@slimreaper35 slimreaper35 force-pushed the ruby branch 2 times, most recently from 7089777 to 9064ba5 Compare July 10, 2024 12:28
@slimreaper35 slimreaper35 changed the title cachi2 bundler PoC cachi2 bundler design Jul 10, 2024
@slimreaper35 slimreaper35 changed the title cachi2 bundler design cachi2 rubygems / bundler design Jul 10, 2024
docs/design/rubygems.md Outdated Show resolved Hide resolved
### Checksums

Checksum validation is enabled by default.
It can be disabled with the `BUNDLE_DISABLE_CHECKSUM_VALIDATION` environment variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand it correctly, bundler does do checksum validation even if there are no CHECKSUMS in the lockfile, but it just validates that the downloaded archive matches the checksum reported by the server (similar to how PyPI reports the expected checksums for archives and we use that to filter out non-matching wheels)

I realize now that we lose even that small validation in Cachito, where we just don't validate any checksums at all. Which is not ideal, but we don't have to fix it right away in cachi2. A follow-up story to implement this checksum validation would be nice though.

docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
@brunoapimentel
Copy link
Contributor

I noticed that it is necessary to add --local to the bundle install in order to successfully perform a offline install. The problem is, this is something we can't really solve just by injecting project files or source env variables.

This would force the user to modify his Dockerfile, which is still something we don't require with Cachi2. I wonder if there's a way to work around that.

@brunoapimentel
Copy link
Contributor

brunoapimentel commented Jul 15, 2024

I did a small test trying to build Porta with this PoC here (while disabling the Yarn deps 🙈).

The prefetch was successful, but the build failed because it was trying to reach out to the Internet to grab the git dependencies. Were you able to successfully prefetch and build any project that has git deps? I think we should enhance our cachi2-rubygems repo to include at least one git dep.

@slimreaper35
Copy link
Member Author

slimreaper35 commented Jul 16, 2024

The prefetch was successful, but the build failed because it was trying to reach out to the Internet to grab the git dependencies.

I know; that was copied from cachito. Git dependencies must have this format (I did not implement it in the PoC):
https://github.com/containerbuildsystem/cachi2/pull/565/files#diff-cf42ff83f7bd882bd043479db1ad05961c5ef7218a816f2640347dd50c08fec8R193

EDIT:

I wonder if there's a way to work around that.

I tried to set export BUNDLE_ALLOW_OFFLINE_INSTALL=true and remove --local, but it ain't work.
I found this workaround => commenting out source "https://rubygems.org" in Gemfilem, but I see a deprecation warning.

@brunoapimentel
Copy link
Contributor

brunoapimentel commented Jul 17, 2024

I tried to set export BUNDLE_ALLOW_OFFLINE_INSTALL=true and remove --local, but it ain't work.
I found this workaround => commenting out source "https://rubygems.org" in Gemfilem, but I see a deprecation warning.

The deprecation warning worries me a little. Also, the official docs seem to indicate that this is a required setting. In any case, I think we can add it as an option in the design doc and leave it up for discussion.

I wonder if turning all dependencies into path dependencies would work? This is a much more invasive change, and I would not like to turn to it, though.

I also noticed that --prefer-local seems to always try to reach out for the internet, although the docs say otherwise. It could be a less invasive change to the Dockerfile, in case it worked.

@brunoapimentel
Copy link
Contributor

I know; that was copied from cachito. Git dependencies must have this format (I did not implement it in the PoC):
https://github.com/containerbuildsystem/cachi2/pull/565/files#diff-cf42ff83f7bd882bd043479db1ad05961c5ef7218a816f2640347dd50c08fec8R193

Oh, I also did a small test with a git dependency and I was able to build the image. The thing that was not clear to me immediatelly was that the repository files need to be unpacked in a folder following the naming format you mentinoned.

Since what the prefetch does is to download a tarball of the repo+ref, it took me some time to figure it out. Maybe we can add a note in the design doc about this?

@slimreaper35 slimreaper35 force-pushed the ruby branch 2 times, most recently from c6401e8 to 7c40dc7 Compare July 18, 2024 09:53
@slimreaper35
Copy link
Member Author

So I dropped PoC from this PR, and with @brunoapimentel, we figured out that using BUNDLE_DEPLOYMENT will solve the issue of relying on users to use the --local option when installing gems. PTAL

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add steps for how to locally try out the proposed solution?

There have been some fairly significant changes since the last state and I would like to test them

@slimreaper35
Copy link
Member Author

Could you add steps for how to locally try out the proposed solution?

You can play around with my repo:
https://github.com/slimreaper35/cachi2-rubygems

docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
docs/design/rubygems.md Outdated Show resolved Hide resolved
@ben-alkov
Copy link
Member

@slimreaper35 & @brunoapimentel; Seems like we should also have a

#### Out of scope

- plugins
- pre-compiled binaries
- checksum verification
- dev dependencies
- prefetching Bundler

section at the end of the doc, to complete the summary.

docs/design/rubygems.md Outdated Show resolved Hide resolved
- **Gemfile.lock**: A file that locks the versions of gems that are installed for your project. Bundler uses this file
to ensure that the correct versions of gems are installed consistently across different environments.

- **RubyGems**: General package manager for Ruby. Manages installation, updating, and removal of gems globally on your
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: is it correct to call this RubyGem support if only intend to talk to bundler at best (with good chunk of logic done in our code directly)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can simply change this to Bundler. What we are proposing does not work with gem (the RubyGems CLI) standalone.

@slimreaper35 @a-ovchinnikov If you both agree, I can make the changes right away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will rename rubygems to bundler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then I'll update the design doc as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MartinBasti FYI for the cachi2 OSBS compatibility layer

Comment on lines 229 to 245
## II. Overview of the current implementation in Cachito

[cachito/workers/pkg_mangers/rubygems.py](https://github.com/containerbuildsystem/cachito/blob/master/cachito/workers/pkg_managers/rubygems.py)

Most work is already done by parsing the `Gemfile.lock` file, which pins all dependencies to exact versions. The only
supported source for gem dependencies to be fetched from is <https://rubygems.org>. Git dependencies are specified
using a repo URL and pinned to a commit hash. Path dependencies are specified using a local path.

To avoid arbitrary code execution, Bundler **is not used** to download dependencies. Instead, as stated above, Cachito
parses `Gemfile.lock` file directly and download the gems from <https://rubygems.org>.

**Note**: parsing `Gemfile.lock` is done via [gemlock-parser](https://github.com/containerbuildsystem/gemlock-parser),
which is vendored from
[scancode-toolkit](https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/gemfile_lock.py).

Source code for "official" Bundler lockfile parsing in Ruby:
<https://github.com/rubygems/rubygems/blob/master/bundler/lib/bundler/lockfile_parser.rb>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding this to cachi2 and yes, much of the logic will be ported, but we don't need to mention cachito here, we're trying hard to drop all references to it ultimately to just mentioning in the main README that "origin of this project was cachito, link here..." but I don't think this should be in the design doc since we probably do end up making different decisions and the implementation will also be different in many cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this specific section brings too much value besides linking the main reference we had, the implementation done in Cachito. So I'd be fine with droping it, but I still think there's value to link previous work, specially considering that there was much research done back then.

Why do you think it's bad to reference Cachito? I think it's ok to say we are taking a different path with Cachi2, and the work done before is merely a reference and a starting point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This,

there was much research done back then

and the fact that we often borrow starting code from Cachito, make it a good idea to at least provide links.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original project is to be sunset in the future with this one being its replacement. I don't see value in putting many references to the old project we don't really want to maintain live connections to, so while I'm okay stating that parts of the code were borrowed, linking at too many places doesn't IMO achieve what we're ultimately trying to do, because the code, yet borrowed, has to be often significantly adapted to this very code base, so providing links to code which will be 75% reworked is IMO misleading and confusing. It should only ever be nothing more than a kudos/credit to where we got inspired and that's it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess we disagree on this point. I do agree that this section as it was did not bring too much value, so I moved the content to a "References" section in the end of the document, and made it shorter to just convey the credits and links to the original project.

@brunoapimentel brunoapimentel force-pushed the ruby branch 2 times, most recently from ca13a30 to 7ad1e79 Compare August 22, 2024 15:06
@brunoapimentel
Copy link
Contributor

New push:

  • fixes typos
  • renames the ending section from "potential follow-ups" to "out of scope"
  • renames the package manager from "rubygems" to "bundler"

@slimreaper35
Copy link
Member Author

Drop the gitignore commit since it is already in a separate PR.

slimreaper35 added a commit to slimreaper35/cachi2 that referenced this pull request Aug 26, 2024
This is the first commit that starts implementation
of bundler package manager introduced in:
containerbuildsystem#565

By adding a package manager to dev section, we expose
it from the user for a period of time until the package
manager is fully supported.

Signed-off-by: Michal Šoltis <[email protected]>
@slimreaper35 slimreaper35 changed the title cachi2 rubygems / bundler design Bundler design document Aug 26, 2024
@brunoapimentel
Copy link
Contributor

/retest

docs/design/bundler.md Outdated Show resolved Hide resolved
Design document that covers the planned implementation of Bundler¹
support in Cachi2. Note that this design doc draws inspiration for the
implementation done in Cachi2, but takes a different approach when it
comes to providing the content for performing the hermetic builds and
for the SBOM data generation.

Since design documents are meant to help all contributors to understand
the implementation and logic behind any feature, we have decided to add
them as part of the repository's documentation, starting from this one.

¹ https://bundler.io/

Signed-off-by: Michal Šoltis <[email protected]>
Co-authored-by: Bruno Pimentel <[email protected]>
@brunoapimentel brunoapimentel added this pull request to the merge queue Aug 27, 2024
Merged via the queue into containerbuildsystem:main with commit 20eb5f0 Aug 27, 2024
14 checks passed
@slimreaper35 slimreaper35 deleted the ruby branch August 28, 2024 06:59
slimreaper35 added a commit to slimreaper35/cachi2 that referenced this pull request Aug 28, 2024
This is the first commit that starts implementation
of bundler package manager introduced in:
containerbuildsystem#565

By adding a package manager to dev section, we expose
it from the user for a period of time until the package
manager is fully supported.

Signed-off-by: Michal Šoltis <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Aug 28, 2024
This is the first commit that starts implementation
of bundler package manager introduced in:
#565

By adding a package manager to dev section, we expose
it from the user for a period of time until the package
manager is fully supported.

Signed-off-by: Michal Šoltis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants