Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support flat repos #90

Conversation

jjmaestro
Copy link
Contributor

@jjmaestro jjmaestro commented Sep 19, 2024

Note

Stacked on top of #89

Fixes issue #56

Follow-up and credit to @alexconrey (PR #55), @ericlchen1 (PR #64) and @benmccown (PR #67) for their work on similar PRs that I've reviewed and drawn some inspiration to create "one 💍 PR to merge them all" 😅

Problem

Debian has two types of repos: "canonical" and "flat". Each has a different sources.list syntax:

"canonical": (see https://wiki.debian.org/DebianRepository/Format#Overview)

deb uri distribution [component1] [component2] [...]

flat: (see https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format)

deb uri directory/

Per the spec,

A flat repository does not use the dists hierarchy of directories, and instead places meta index and indices directly into the archive root (or some part below it)

Thus, the URL logic in _fetch_package_index() is incorrect for these repos and it always fails to fetch the Package index.

Solution

Just use the Debian sources.list convention in the sources section of the manifest to add canonical and flat repos. Depending on whether the channel has one directory that ends in '/' or a (dist, component, ...) structure the _fetch_package_index () and other internal logic will know whether the source is a canonical or a flat repo.

For example:

version: 1

sources:
  # canonical repo
  - channel: bullseye main contrib
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  # flat repos, note the trailing '/' and the lack of distribution or components
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian
  - channel: ubuntu2404/x86_64/
    url: https://developer.download.nvidia.com/compute/cuda/repos

archs:
  - amd64

packages:
  - bash
  - r-mathlib
  - nvidia-container-toolkit-base

Disregarding the "mixing" of Ubuntu and Debian repos for the purpose of the example, this manifest shows that you can mix canonical and flat repos and you can mix multiarch and single-arch flat repos and canonical repos.

You will still have the same problems as before with packages that only exist for one architecture and/or repos that only support one architecture. In those cases, simply separate the repos and packages into their own manifests.


Note

This PR also fixes an issue with NVIDIA CUDA flat repos that don't follow the Debian repo spec and have invalid 'Filename' paths.

The Debian repo spec for 'Filename' says:

The mandatory Filename field shall list the path of the package archive relative to the base directory of the repository. The path should be in canonical form, that is, without any components denoting the current or parent directory ("." or ".."). It also should not make use of any protocol-specific components, such as URL-encoded parameters.

However, there are cases where this is not honored. In those cases we try to work around this by assuming 'Filename' is relative to the sources.list directory/ so we combine them and normalize the new 'Filename' path.

Note that, so far, only the NVIDIA CUDA repos needed this workaround so maybe this heuristic will break for other repos that don't conform to the Debian repo spec.

See bazelbuild/bazel#20369

While working on refactoring, I kept hitting rebase conflicts due to
issues with the MODULE lock. I guess it's because the Bazel version in
e2e tests is much lower than the current one with the fix (7.2) but
still, I don't think it adds much to have the lock in e2e testing.
PR GoogleContainerTools#73 added the `_resolve` and this breaks the buildozer fix / autofix
It seemed like 75afff9 in GoogleContainerTools#47 added the new locks but as new files, that
is, the old ones were left behind.
Add support for MODULE.bazel to the lock script and avoid printing an
unnecessary (and annoying 😅) error when building in a "modern repo".
Add a `nolock` attribute to avoid getting annoying DEBUG messages for
repos that we explicitly want to run without a lock.
* make apt/tests more readable by factoring out the parameters

* add a "test suite macro" in each test file that group all of the unit
  tests in the file and prepends a "test suite prefix". IMHO this is
  better than using `unittest.suite` because we provide better naming
  than the automated `_test<NUMBER>` plus these better names are actual
  targets that can be executed one-by-one by name.
Add other nolock tests to exercise the package repos (the templates,
etc).
* separate the script into a template file so it's easier to shellcheck
  and syntax highlight in editors.

* shellcheck the script and remove all SC2086 warnings
  ("Double quote to prevent globbing and word splitting")

* improve the buildozer help messages in the copy.sh script:
  * reduce duplication of buildozer command
  * add a more clear autofix bazel run command that can be easily
    copy-pasted

* change some of the variable names in the copy.sh template for longer,
  easier to understand names (repo_name >> name, lock_label >> label)

* move repo_name and workspace_relative_path into variables to reduce
  line length and improve readability
While working with some flaky mirrors and trying to figure out why they
were failing I found the _fetch_package_index code a bit hard to follow
so here's my attempt at streamlining it a bit:

* Change the general flow of the for-loop so that we can directly set
  the reasons for failure in failed_attempts

* Remove both integrity as an argument and as a return value since
  neither is ever used.

* return the content of the Packages index instead of the path, since we
  already have the repository_context. This way we don't need rctx
  anywhere else.

* Reword failure messages adding more context and debug information

* Shorter lines and templated strings, trying to make the code easier to
  read and follow.
* remove the `state` "intermediary `struct`" in `package_resolution.bzl`
  since it wasn't used / needed.

* refactor and move _set_dict from `util.bzl` to a `_package_set` method

* use `dict .get()` with default values instead of the "nested `if`s"

* renamed `package()` to `package_get` and make it return all package
  versions when the version is not specified so we can remove
  `_package_versions`

* reordeder `(name, version, arch)` args to match the order of the index
  keys `(arch, name, version)`
Add testing for package_index mocking the external / side effects
(downloads, decompression, etc).
The package resolution debugging that e.g. checks the package
dependencies should all be within package_resolution.bzl _resolve_all()
and not "leak out" returning the information that's only needed for
debugging / logging.

Also:

* reduce the verbosity of the optional dependencies warning by just
  printing one message per root package instead of one per package.

* break up the long lines to build the error messages and remove the
  "# buildifier: disable=print".
* move version constraint parsing from package_resolution to its own
  _parse_version_and_constraint method in version.bzl

* refactor _version_relop into a compare method in version.bzl plus a
  VERSION_OPERATORS dict so that (1) we use the operator strings
  everywhere and (2) we can use the keys to validate the operators.
Previously we had:
```starlark
    pkgindex = package_index.new(rctx, sources = sources, archs = manifest["archs"])
    pkgresolution = package_resolution.new(index = pkgindex)
```
And none of the code of package_resolution was used anywhere but in
resolve.bzl and after initializing the `pkgindex`.

Also, it makes sense since we are building the index from the manifest
and once we have the index we use it to resolve the packages and
populate the lock.
Cleanup resolve.bzl and package_index.bzl by moving all of the manifest
functionality to a separate manifest.bzl file where we now do all of the
work to generate the lock: manifest parsing, validation and the package
index and resolution. IMHO this is how it should be because the lock is
the "frozen state" of the manifest.

* _parse() parses the YAML

* _from_dict validates the manifest dict and does the rest of the
  changes that we need to produce a manifest struct

* add extra validation for e.g. duplicated architectures

* _lock is the only method that's exposed to the outside and it
  encapsulates all of the other parts, calling _from_dict and all of the
  package index and resolution, to produce the lock file.

* move get_dupes to util.bzl

* refactor the "source" struct into the new manifest where we can now
  centralize a lot of the structure and logic spread across multiple
  parts of the code.

* remove yq_toolchain_prefix since it's always "yq" and, looking at GH
  code search, this seems to be a copy-paste leftover from rules_js (or
  the other way around)... the code is always the same and it never
  receives a string different from "yq".
* move all of the "package logic" to pkg.bzl

* The v2 lockfile format:

  * doesn't need the fast_package_lookup dict because it's already using
    a dict to store the packages.

  * has the dependencies sorted so the lockfile now has stable
    serialization and the diffs of the lock are actually usable and
    useful to compare with the changes to the manifest.

  * removes the package and dependency key from the lockfile, now it's
    done via an external function (make_deb_import_key in
    deb_import.bzl)

* Remove add_package_dependency from the lockfile API. Now, the package
  dependencies are passed as an argument to add_package. This way, the
  lockfile functionality is fully contained in lockfile.bzl and e.g. we
  can remove the "consistency checks" that were only needed because
  users could forget to add the dependency as a package to the lockfile.

* Ensure backwards-compatibility by internally converting lock v1 to v2.
  Also, when a lock is set and it's in v1 format, there's a reminder
  that encourages the users to run @repo//:lock to update the lockfile
  format.
By separating the migration from the previous commit we get to

1. in the previous commit, run all tests with the new code while locks
   are still v1

2. update the locks n this commit to V2 so we can then re-run all tests
   in the final state.
…t.bzl

Refactor the package repo templates into their own methods and massively
cleanup the `for`-loop in `_deb_package_index_impl`.

IMHO overall now there's a much better and clear separation of concerns
between the "index repo" (`apt/private/index.bzl`) and the "package
repos" (`apt/private/deb_import.bzl`).
@jjmaestro jjmaestro force-pushed the refactor-mega-refactor-plus-feat-support-flat-repos branch from d678224 to c920433 Compare September 19, 2024 16:24
Fixes issue GoogleContainerTools#56

Follow-up and credit to @alexconrey (PR GoogleContainerTools#55), @ericlchen1 (PR GoogleContainerTools#64) and
@benmccown (PR GoogleContainerTools#67) for their work on similar PRs that I've reviewed and
drawn some inspiration to create "one 💍 PR to merge them all" 😅

Problem:

Debian has two types of repos: "canonical" and "flat". Each has a
different sources.list syntax:

"canonical":
```
deb uri distribution [component1] [component2] [...]
```
(see https://wiki.debian.org/DebianRepository/Format#Overview)

flat:
```
deb uri directory/
```
(see https://wiki.debian.org/DebianRepository/Format#Flat_Repository_Format)

A flat repository does not use the dists hierarchy of directories, and
instead places meta index and indices directly into the archive root (or
some part below it)

Thus, the URL logic in _fetch_package_index() is incorrect for these
repos and it always fails to fetch the Package index.

Solution:

Just use the Debian sources.list convention in the 'sources' section of
the manifest to add canonical and flat repos. Depending on whether the
channel has one directory that ends in '/' or a (dist, component, ...)
structure the _fetch_package_index and other internal logic will
know whether the source is a canonical or a flat repo.

For example:
```
version: 1

sources:
  # canonical repo
  - channel: bullseye main contrib
    url: https://snapshot-cloudflare.debian.org/archive/debian/20240210T223313Z
  # flat repos, note the trailing '/' and the lack of distribution or components
  - channel: bullseye-cran40/
    url: https://cloud.r-project.org/bin/linux/debian
  - channel: ubuntu2404/x86_64/
    url: https://developer.download.nvidia.com/compute/cuda/repos

archs:
  - amd64

packages:
  - bash
  - r-mathlib
  - nvidia-container-toolkit-base
```

Disregarding the "mixing" of Ubuntu and Debian repos for the purpose of
the example, this manifest shows that you can mix canonical and flat
repos and you can mix multiarch and single-arch flat repos and canonical
repos.

You will still have the same problems as before with packages that only
exist for one architecture and/or repos that only support one
architecture. In those cases, simply separate the repos and packages
into their own manifests.

NOTE:
The NVIDIA CUDA repos don't follow Debian specs and have issues with the
package filenames. This is addressed in a separate commit.
Although the Debian repo spec for 'Filename' (see
https://wiki.debian.org/DebianRepository/Format#Filename) clearly says
that 'Filename' should be relative to the base directory of the repo and
should be in canonical form (i.e. without '.' or '..') there are cases
where this is not honored.

In those cases we try to work around this by assuming 'Filename' is
relative to the sources.list directory/ so we combine them and normalize
the new 'Filename' path.

Note that, so far, only the NVIDIA CUDA repos needed this workaround so
maybe this heuristic will break for other repos that don't conform to
the Debian repo spec.
@jjmaestro jjmaestro force-pushed the refactor-mega-refactor-plus-feat-support-flat-repos branch from c920433 to bbf5d62 Compare September 19, 2024 16:29
@jjmaestro jjmaestro mentioned this pull request Sep 19, 2024
@jjmaestro jjmaestro closed this Sep 19, 2024
@jjmaestro jjmaestro deleted the refactor-mega-refactor-plus-feat-support-flat-repos branch September 19, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant