-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to use a "compressed" (one file) repository format for performance and sustainability purpose #5648
Comments
Some data: today repo concatened gives a file of 39M and 1 235 506 lines. |
some more data: a xz-compressed repository would be only 2M and takes 0.2s to uncompress on my local machine |
One mechanism would be to use normal tar.xz files and use the OCaml libraries to parse them directly with unpacking. That has the benefits of making them easy to create, and there performance improvements from not having a lot of small files. |
From dev meeting We can have several formats in the opam repository itself:
The repo file can mention what it the format of the repo, but they can coexists in a simple repo. Opam can understand all those formats (backward compatibility), for API users, it is imperceptible as fetching repo functions remain the same. Opam can also try to retrieve compressed format, then fallback on aggregated format, then fallback to plain directory one. These new formats can be served via a webserver, for example having opam2web generate them. For github main opam repository, one solution is to have an alternate branch, that serves the aggregated file. It would be automatically updated for each merge. |
I have another suggestion: a |
xz is also randomely addressable. I use this feature in https://github.com/kit-ty-kate/opam-health-check-ng using the |
Oh I didn't know that!! Zip has the upside of having very mature bindings (camlzip) but xz does compress a lot better. In any case I think it'd accelerate some things a lot. Another performance issue I've seen is that opam tends to check the state of various switches many, many times in a row. |
I'd like to point out that we currently have an opam-mirror implementation as an unikernel that uses tar (as well as zlib/decompress) and allows random addressable contents. In your proposition, it would be difficult for us to support I know this probably implies a regression in the compression ratio but, as @c-cube points out, zip (or even tar) has the advantage of a mature existence in the OCaml ecosystem (in contrast to xz). |
I had a deeper look and I think we can keep the current I implemented a proof of concept reader of the opam-repository's index.tar.gz using The major pain-point in the opam code that I could see on switching to use that, is that currently we diff the previous state of the repository against the new one so if we want to keep doing that we'd need to reimplement diffing between two archives manually. Following every use of There is a chance this current issue is required to fix #5741 which is currently slotted for 2.2.0~rc1 so I might bite the bullet and take the time to implement if no-one does it beforehand (if you do please ping me so we can synchronize) |
opam repositories currently have a "one file per packages/versions" but as the number of packages grow it creates a sustainability problem for people with low number of inodes for their filesystems (e.g. see #5484) and a performance problem (you have to open each file on every
opam update
)I'm not set on a particular format for that file but it could be the format that
opam switch export
already uses.@mseri also suggested using SQLite
The text was updated successfully, but these errors were encountered: