Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Git in x-opam-monorepo-opam-repositories #317

Merged
merged 10 commits into from
Jul 5, 2022

Conversation

Leonidas-from-XIV
Copy link
Member

The main issue with this code is that it is difficult to test, since we probably don't want our tests to do IO with the outside world, since it creates issues with sandboxing and dependencies on other services being up and reachable.

I tested it by adding https://opam.ocaml.org to the x-opam-monorepo-repositories in vendored.opam and it indeed outputs a lock file that has e.g. a newer dune than what we expect in the test.

Fixes: #284

@NathanReb
Copy link
Contributor

Thanks for looking into this! The main problem with this PR at the moment is that by using pull_tree directly, it will use the opam cache and will polute it with repositories.

I think here we need an alternate version of pull_tree that does not use the opam source cache to prevent it from growing unreasonnably due to the size of repositories.

I think we should instead have a thin and simple cache layer in /tmp. Something similar to what you did except that if the directory already exists we use the repo in there instead of pulling it.

For this to work properly, I think it would be reasonable to only accept remote VCS urls (e.g. no straight tarballs) and to resolve all VCS URLs to a commit hash, similarly to what we do for packages sources already.

What do you think?

@NathanReb
Copy link
Contributor

Also we could eventually store archives in that temporary cache instead of the full source tree, similarly to what opam does but that can be done later!

@Leonidas-from-XIV
Copy link
Member Author

I looked at the OPAM API and it supports specifying a dedicated cache_dir so we should be able to redirect where OPAM downloads things to.

The issue with not supporting archives is that to support #315 by reading the list of repos from the switch we need to support archives since this is the most common way to retrieve the opam-repository for most users. So it would need to be done either as part of this PR or a PR for that and I'd argue that adding this support in this PR makes more sense.

That said, I'm currently trying to figure out how OPAM actually uses the cache_dir because at least with the index.tar.gz OPAM doesn't seem to put the archive into the cache folder which has me worried whether it even does what I think it does.

@NathanReb
Copy link
Contributor

The issue with not supporting archives is that to support #315 by reading the list of repos from the switch we need to support archives since this is the most common way to retrieve the opam-repository for most users. So it would need to be done either as part of this PR or a PR for that and I'd argue that adding this support in this PR makes more sense.

We had a discussion about this exact problem when you first suggested to move to a single solver and I think we agreed that having a special rule to turn https://opam.ocaml.org into git+https://github.com/ocaml/opam-repository (which we would then resolve to a commit). I think this would be good as we'd ensure the good properties of using git only. That would ensure that each lock file can be reproduced (which we can't with tarballs, especially the likes of ocaml.org which change over time), allow us to easily cache thing using the commit hashes and overall I think is a better approach to dealing with opam repositories.

I even remember hearing @dra27 mentioning they were considering dropping the use of tarballs for repos in opam.

What do you think?

@Leonidas-from-XIV Leonidas-from-XIV changed the title Support using repositories from HTTP sources Support Git in x-opam-monorepo-opam-repositories Jun 17, 2022
@Leonidas-from-XIV
Copy link
Member Author

@NathanReb Sorry, took a while to get back to this but I finally managed to continue on this (mostly testing that things work as desired). I've implemented it to support Git repos specifically.

What it does is the following:

  1. Creates a folder, $TMP_DIR/opam-monorepo. In my case that was /tmp/opam-monorepo but it respects the TMP_DIR variable so when tests run this is overridden into a dedicated folder (by, I assume, dune)
  2. If it encounters a `http repo that points to opam.ocaml.org it rewrites it to the git repository.
  3. In this folder it creates cache and repos folders: the cache folder is used by the Opam Git implementation to create a bare repo with the revisions. The repos folder is used to check out the repos by MD5 of their URL (to avoid sad surprises with valid characters in file systems)
  4. This list of now-file:// paths is passed to the remaining code which then runs the solver. It also rewrites the URLs of the repos in Source_config to include the commit-hash of the checked out repo for reproducability, so the lock file contains specific snapshots of the git URLs. I think that's a pretty neat feature.

There is a small test where it asserts that the checkout works and the lockfile URLs are updated. I didn't add tests for the cache because that is supposed to be functionality of OpamGit.B. In my manual testing it did work, the initial lock took a while (we might consider adding a log output) while it was cloning the repository, while a subsequent lock was fast because it used the repo. Testing this in a blackbox is somewhat difficult and I didn't want to depend too much on how OpamGit.B implements the cache.

I also tested this on opam-monorepo itself, adding these lines to the .opam(.template) file lead it to use the repos and successfully lock:

x-opam-monorepo-opam-repositories: [
  "https://opam.ocaml.org"
  "git+https://github.com/dune-universe/opam-overlays.git"
]

Leaving out the overlays caused the usual solver failures with missing dune-versions, exactly how it works at the moment when the overlays repo is not added to the switch.

Copy link
Contributor

@NathanReb NathanReb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for your hard work on this!

I'm happy that we can reuse opam's git cache implementation, saves us a lot of troubles.

I have no comments about the code and the tests, it all looks great.

My only concern is how we handle resolution of repo URLs to an exact revision. It seems that we pull the sources from the user provided URL and then look at what revision we got.
I can see two problems with this approach:

  1. we might cache based on an ambiguous URL, e.g. git+https://github.com/ocaml/opam-repository#master is what gets hashed and therefore I'm not entirely sure whether the current implementation will ever try to fetch it again as long as it is cached, meaning a user might wonder why they're not getting the latest version
  2. It mix the two concerns of pulling the sources and making the source config unambiguous.

I'm not exactly sure about 1 as I don't fully understand how the cache works yet so please correct me if I misinterpreted how things were run here. I think either way, this should be fairly easy to add to the test by pushing a new version of a package on the default branch of the local git repo in the cram test.

@Leonidas-from-XIV
Copy link
Member Author

  1. This is not a problem, because there is a split between the cache folder (always the same) and the repo folder (this gehts hashed). The cache in this case contains a single git repo with a rather specific setup - essentially a bag of commits from all kind of repos it is asked to cache, which then get checked out into the repo folders. The whole setup is a bit non-standard, but seems to work pretty well. Your concern about the cache possibly not being updated with new revisions is however a valid one - the test never verified that updates in the repo are noticed. I added the test and it worked out of the box. Definitely useful to have utilized the OPAM Git cache, looks like the code is pretty solid.
  2. I am not sure I follow. To figure out which packages exist I need to pull the repo and likewise, to determine which revision I have I need to have the repository available.

@NathanReb
Copy link
Contributor

I am not sure I follow. To figure out which packages exist I need to pull the repo and likewise, to determine which revision I have I need to have the repository available.

Yeah sorry this was poorly explained.

For inidividual packages which have a git source URL (a pinned package for instance), we first turn this potentially ambiguous URL into an "exact" one, i.e. one that ends with #<commit hash> via calls to git ls-remote, that is, without pulling.

What I meant in my second point was that we could do something similar here and separate things in two steps:

  1. rewrite the source_config so that repositories git URLs are rewritten to exact URLs
  2. pull the repositories described in source_config locally

If the opam cache helps us with this I'm also happy with the current implementation. It's slightly harder to figure out this resolution happens with the current code but as long as it works I'm happy with this. Maybe adding a comment or a bit of documentation here would help maintain this code in the future?

@Leonidas-from-XIV Leonidas-from-XIV merged commit 8e1bca6 into tarides:main Jul 5, 2022
@Leonidas-from-XIV Leonidas-from-XIV deleted the remote-url-support branch July 5, 2022 13:18
Leonidas-from-XIV added a commit to Leonidas-from-XIV/opam-repository that referenced this pull request Nov 21, 2022
CHANGES:

### Added

- Add support for specifying remote URLs in `x-opam-monorepo-repositories`
  (tarides/opam-monorepo#284, tarides/opam-monorepo#317, @Leonidas-from-XIV)

### Fixed

- Enable locking of packages with depexts even with uninitialized system
  package manager state (tarides/opam-monorepo#322, @Leonidas-from-XIV)
- Fix a bug where `pull` would crash if the lock file contained no package to
  vendor (tarides/opam-monorepo#321, @NathanReb)
- Display a better error message when the depext command fails when getting the
  status of the packages (tarides/opam-monorepo#258, tarides/opam-monorepo#323, @RyanGibb, @Julow)
- Take `archive-mirrors` from the global opam configuration into account to
  allow more local caches (tarides/opam-monorepo#337, @hannesm)
- Log at WARN level when opam-monorepo chooses a source for a package that
  doesn't match the package's version (tarides/opam-monorepo#352, @reynir)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support remote URLs in x-opam-monorepo-repositories
2 participants