Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continually blocked on upstream packages #45

Closed
AlistairB opened this issue Aug 22, 2021 · 27 comments
Closed

Continually blocked on upstream packages #45

AlistairB opened this issue Aug 22, 2021 · 27 comments

Comments

@AlistairB
Copy link
Contributor

AlistairB commented Aug 22, 2021

To update to a new ghc / cabal version they need to be available in Debian first. Unfortunately there is a lot of lag from when the new versions are released to when they appear as debian packages. This seems to be due to @hvr being the sole person responsible for the packaging, but often not being available to do this work.

This is not a criticism of @hvr who has given a tremendous amount to the Haskell ecosystem ❤️ , but it is just a reality that there is consistent lag here.

Options

Try to improve the situation around debian package updates

We could chase this up with https://github.com/haskell-CI/haskell-ci which is the home of the debian/ubuntu packages. We could also pull in the Haskell foundation who may have ideas.

Switch to ghcup

Ghcup does not depend on these same debian packages and does promptly add new versions. I've been playing with what this might looking like.

Pros

  • New versions of cabal, ghc and stack are promptly added.
  • Once mechanism for installing cabal, ghc and stack.
  • Removes some complexity of the installation from these images.
  • Ghcup is further consolidated as the one true way (excluding stack) of installing ghc / cabal. Haskell github actions are also using ghcup. Actually, it seems it uses the hvr ubuntu PPA where possible, then falls back to ghcup?
  • Ghcup has (preliminary?) support for windows, so it provides a path to a windows docker haskell image.
  • I believe ghcup has ARM support, so could help support arm.
  • We could support more linux distros that ghcup supports if we want ie. alpine

Cons

  • Ghcup is not a great fit for docker. Ghcup is a tool to manage many versions of tools, when in docker we only want to provide one exact version. Thus it is additional bloat + complexity when it shouldn't in theory be required. For this reason I believe no other official programming language images are using an equivalent tool.
  • Ghcup installs for the current user, where as in docker we want to install for all users. For example a common pattern in docker is to switch to a new user with less privileges. However, this new user cannot use ghc / cabal / stack that were installed for a different user by ghcup. There may be ways to work around this, potentially requiring ghcup changes.

Conclusion

???

Not sure, ghcup is a bit of a poor fit philosophically, however may be a great pragmatic choice resulting in a net win. If we do not go with ghcup, I think we should attempt to improve the lag problem with the current debian packages.

@psftw @hasufell any thoughts?

@bgamari
Copy link

bgamari commented Aug 22, 2021

For what it's worth, ghcup is just a (very good!) tool for installing and managing standard GHC release artifacts. I have considered adding Debian packaging jobs to GHC's release pipeline. If it would be helpful for you I would be happy to give it a try; I don't think it should be difficult.

@AlistairB
Copy link
Contributor Author

For what it's worth, ghcup is just a (very good!) tool for installing and managing standard GHC release artifacts. I have considered adding Debian packaging jobs to GHC's release pipeline. If it would be helpful for you I would be happy to give it a try; I don't think it should be difficult.

Yes that would be super useful and would eliminate the lag problem in regards to ghc 😄

Docker haskell also includes cabal, but perhaps that can be solved as part of the cabal release process.

The other consideration might be ubuntu packaging, which the haskell github action uses. The action also uses GHC + cabal installed on the default GHA agents.. not sure what they depend on. @jared-w may know more.

@hasufell
Copy link
Member

hasufell commented Aug 22, 2021

Ghcup is not a great fit for docker.

I believe it is. Many already use it in docker and you only install what you need. I don't believe it's bloat. And it makes a lot of sense for users who want to add additional GHC versions to an existing image.

Ghcup installs for the current user, where as in docker we want to install for all users.

Since the latest version, GHCup can install into any directory via https://gitlab.haskell.org/haskell/ghcup-hs#isolated-installs

This still requires (for GHC) that the target directory is empty, but we may lift that restriction, which means you'll be able to do ghcup install ghc latest --isolate /usr, which will put ghc binaries in /usr/bin/ghc.

So for now you could install via ghcup install ghc latest --isolate /opt/ghc and put /opt/ghc/bin in PATH.

@AlistairB
Copy link
Contributor Author

AlistairB commented Aug 22, 2021

I believe it is. Many already use it in docker and you only install what you need. I don't believe it's bloat.

I guess to be more specific it is not part of the core offering of the image. I think even if we use ghcup to install ghc etc. we still ideally want to strip it out from the final image. This is because docker is provides a mechanism to switch ghc versions with docker tags, so ghcup shouldn't be needed by the end user.

Having said that I think a haskell:ghcup image would be useful if people want to use that as a base to get an exact combination of ghc, cabal + stack versions.

So for now you could install via ghcup install ghc latest --isolate /opt/ghc and put /opt/ghc/bin in PATH.

Thank you! Your example image does exactly what I was failing to do 😅

I have adapted your example further to get more towards what I was trying to do. That being ghcup is used to install ghc, cabal + stack. However, a multi stage docker image is used to only provide an image with ghc, cabal and stack only without ghcup or other cache used as part of building.

What do you think?

EDIT: I think with these improvements I am more strongly in favour of the ghcup based solution.

@hasufell
Copy link
Member

so ghcup shouldn't be needed by the end user

I'd recommend to install it by default. I'd argue it's a common use case to derive images and augment them, e.g. when you have multi-GHC builds or simply need to do local experiments.

The binary is, compared to the 2GB GHC install, very small.

What do you think?

Yeah

@AlistairB
Copy link
Contributor Author

AlistairB commented Aug 22, 2021

I'd recommend to install it by default. I'd argue it's a common use case to derive images and augment them, e.g. when you have multi-GHC builds or simply need to do local experiments.

The binary is, compared to the 2GB GHC install, very small.

It's not a size issue, it is more that it doesn't really fit with how docker works and could be a foot gun. Using ghcup in docker and mutating the container can have unusual effects.

I think for specific use case wanting multiple ghc versions in an image I would recommend a haskell:ghcup image (which I think can be a follow up issue if we go with ghcup). Or in many cases you would just use the haskell docker image for each version you needed to build that step.


One other concern I have noticed is the ghcup based image is a bit larger. Not a deal breaker but not ideal. I stripped out /opt/ghc/share/ which helped a bit (this seems to be just docs). After that the size diff is:

Ghcup based 8.10.4 - 2.34GB
Current 8.10.4 - 1.52GB

I wonder if there is an easy way to strip out something unnecessary from /opt/ghc/lib?

@hasufell
Copy link
Member

It's not a size issue, it is more that it doesn't really fit with how docker works and could be a foot gun. Using ghcup in docker and mutating the container can have unusual effects.

Unusual effects?

This is about:

  1. deriving docker images based on an existing image, so you only have to install one additional GHC version instead of two from scratch. This is for Dockerfile developers, they know what they do.
  2. being able to shell into the container and quickly experiment with other GHC versions

@AlistairB
Copy link
Contributor Author

AlistairB commented Aug 22, 2021

Unusual effects?

I would prefer not to get bogged down on this finer point. I think this can be discussed further if ghcup is chosen as the best way forward. Cheers

@hazelweakly
Copy link

The other consideration might be ubuntu packaging, which the haskell github action uses. The action also uses GHC + cabal installed on the default GHA agents.. not sure what they depend on. @jared-w may know more.

To first address the Ubuntu packaging; the Ubuntu packaging does not seem to be updated anymore because it was also done by @hvr. As such, the upstream GitHub preinstalled software switched from the PPA to using ghcup by default and the GitHub actions followed suit. Any pre-installed version of GHC or cabal on a GitHub runner is currently installed by ghcup (or chocolatey on windows)

I think, currently, the ideal situation to install GHC in a docker container would be to use ghcup with the isolate flag and manually copy over all the binaries individually in a subsequent stage. This lets the Dockerfile be self-documenting in which Haskell components are installed as well as minimizing the size and following standard docker conventions

@AlistairB
Copy link
Contributor Author

Thanks for the info @jared-w .

I think, currently, the ideal situation to install GHC in a docker container would be to use ghcup with the isolate flag and manually copy over all the binaries individually in a subsequent stage. This lets the Dockerfile be self-documenting in which Haskell components are installed as well as minimizing the size and following standard docker conventions

Agreed. This is pretty much what is sketched in #44


I'll see if I can understand why the ghcup installed ghc takes up more space compared to the current debian packaging.

Also @psftw would need to chime in before we could consider proceeding.

@AlistairB
Copy link
Contributor Author

AlistairB commented Aug 23, 2021

So it seems that ghcup is installing the base libraries with profiling enabled/supported if I understand correctly. The current debian packaging splits out the profiling versions into a separate package ie. ghc-8.10.4-prof so we don't get those in the existing images.

I stripped out the profiling related files and now the sizes are comparable.

@hasufell would a flag to install / not install profiling support in the base libraries be something ghcup might look to support?

Another question, is there currently a way to subscribe to ghcup releases? This would be useful around knowing to bump ghcup in the images.

@hasufell
Copy link
Member

So it seems that ghcup is installing the base libraries with profiling enabled/supported if I understand correctly. The current debian packaging splits out the profiling versions into a separate package ie. ghc-8.10.4-prof so we don't get those currently.

I stripped out the profiling related files and now the sizes are comparable.

@hasufell would a flag to install / not install profiling support in the base libraries be something ghcup might look to support?

I'm not sure there's a clean UX for this, since ghcup after all isn't a real package manager, but an installer. Re-installing those profiling libs would require a full reinstall of the GHC version. If the build system (./configure) had a flag to omit installing profiling libs, however, that could easily be supported ad-hoc.

But if the only use case is docker, manually removing files may be the way to go.

Another question, is there currently a way to subscribe to ghcup releases? This would be useful around knowing to bump ghcup in the images.

There's an atom feed https://gitlab.haskell.org/haskell/ghcup-hs/-/tags

But new tags don't always mean new releases. ghcup list is the source of truth.

@AlistairB
Copy link
Contributor Author

AlistairB commented Aug 28, 2021

I've spent some time polishing up the changes for this in #44 . More feedback is welcome :)

But if the only use case is docker, manually removing files may be the way to go.

No worries, happy to go with the stripping approach for now.

But new tags don't always mean new releases. ghcup list is the source of truth.

No problem.


@hasufell now that the PR has progressed well and hopefully can be merged soon once @psftw gives his thoughts, perhaps we can discuss the question of whether to leave ghcup in the final image. So I'm pretty strongly against this. Reasons..

Additional bloat

  • There are additional packages that ghcup requires that we aren't otherwise required in the final image.
  • The pretty negligible 23MB size of the ghcup executable.

I'm not suggesting this is significant, but it is a downside.

Additional releases

If we don't support ghcup, I would only bump ghcup when it might have a substantial impact to ghc / cabal / stack in the images. (Or when we bump something else in an image)

If we officially support ghcup, then we should bump ghcup everytime it gets a new release IMO. This means the images need to be updated more (seems to be on average once a month).

  • More effort required by docker-haskell maintainers.
  • More updating of the images by users who I would suggest rarely want to use ghcup in the images.

Mutating a running image

So mutating a running container will not be persisted by default. So if your goal is to play around with ghcup, you will need to re setup everything if you exit the container. There are then interesting things like docker commit which can persist changes made into a new image.

All of this is finicky and non-obvious. I think there are many people who are new to docker who could get fooled by this. In other words I consider this a foot gun.

We actually currently disable stack from installing new GHC versions, so it is something of a current assumption around the images.

Upsides of having ghcup in the final image?

So my suggestion is that having ghcup is almost always not actually useful in the image. The core reason is that docker itself is providing a mechanism to access different versions of ghc.

deriving docker images based on an existing image, so you only have to install one additional GHC version instead of two from scratch. This is for Dockerfile developers, they know what they do.

I would suggest that in almost all cases if you need multiple ghc versions, you would use those versions in different docker images. For example perhaps you want to test your library builds on different ghc verisons. In that case you want to build based on multiple docker images for each version in parallel. You do not have any need to have multiple ghc versions loaded in a single image.

being able to shell into the container and quickly experiment with other GHC versions

Again, I think it much simpler to use docker for this. Most developers are familiar with switching versions in docker. As opposed to learning a new tool. As mentioned above, any mutation of the running container is not easy to persist, so you would need to repeat it.

Conclusion

To sum up, I can see multiple downside and I cannot think of a case where using ghcup in the image is useful.

I think the key requirement the current images don't really cover is when you want specific GHC(s) with specific cabal and specific stack. I see a haskell:ghcup image that doesn't include GHC / cabal / stack by default as the solution to this. (I would have this be a separate issue / future change.)

@hasufell
Copy link
Member

There are additional packages that ghcup requires that we aren't otherwise required in the final image.

That's not true. ghcup is a static executable that requires only curl at runtime and so does cabal.

So mutating a running container will not be persisted by default. So if your goal is to play around with ghcup, you will need to re setup everything if you exit the container. There are then interesting things like docker commit which can persist changes made into a new image.

Sorry, I don't understand what you're trying to say. If people don't understand docker, they should read the documentation.

Again, I think it much simpler to use docker for this. Most developers are familiar with switching versions in docker. As opposed to learning a new tool.

Can you tell me then how to augment an existing image, that I just derived from haskell/docker, with a new additional GHC version in one line?

Either you have to replicate all the ghcup foo in your Dockerfile or figure out what files you need to copy from an existing image layer. The latter is also inflexible, because it doesn't work in a running container.


At any rate, I'm not an active user of these images. I prefer to download alpine docker images of minimal size and then perform ghcup installation inside of them. That saves me from downloading several GBs of unpacked GHC.

The time it takes to unpack/install GHC usually is much less than downloading a huge image.

@AlistairB
Copy link
Contributor Author

That's not true. ghcup is a static executable that requires only curl at runtime and so does cabal.

So once you have installed ghcup you only need curl? Point taken.

Can you tell me then how to augment an existing image, that I just derived from haskell/docker, with a new additional GHC version in one line?

I'm suggesting it would be very rare to want to do this. ie. If I want to test my library builds with 8.10 and 9.0 I would run:

docker run haskell:8.10 stack build
docker run haskell:9 stack build

which are completely independent and can be run in parallel easily. Wanting both ghc 8.10 and 9 in a single container, I don't know why you would want this. Or at least, I would not optimize the default usage around such a thing.

Happy for @psftw to make the call who is the long time maintainer of these images.

@hasufell
Copy link
Member

I'm suggesting it would be very rare to want to do this

I don't think so. I frequently need multiple GHCs in a docker container.

  1. Building GHC from source (I use docker to create bindists for e.g. alpine) and I don't wanna bother switching images. Here I spawn a container and do mutable stuff there. That's a major use case of docker.
  2. Testing a project on a new distro: again, I just spin up a container and then do the work there manually. I don't want to deal with multiple images.

@AlistairB
Copy link
Contributor Author

I think that is rare. People building applications typically have one exact version they are using and only want that version and nothing else. People building libraries likely want to test on many versions, but as mentioned it is best to build / test on independent containers in parallel.

Doing mutable stuff in docker is quite unusual, because unless you use docker commit or are using docker volumes somehow, your changes in the container are lost when you shut it down.

I think haskell:ghcup is the right catchall for the edge cases. Perhaps we will have to agree to disagree here 😅

@hasufell
Copy link
Member

Doing mutable stuff in docker is quite unusual

Very hard disagree. I use docker since years and this is one of the primary use cases.

@AlistairB
Copy link
Contributor Author

FYI, I've been exploring another option of installing ghc + cabal directly from downloads from haskell.org #46

I think at this stage it is my current preference, although there are a couple of issues to iron out.

@AlistairB
Copy link
Contributor Author

AlistairB commented Oct 1, 2021

👋 So I was hoping to get this cabal issue resolved first, but I am very mindful that these images are ~4 months out of date on ghc 8.10.

I was originally pushing for the ghcup based approach but @psftw called out some limitations and asked why not just do what ghcup does directly?. To that end I have explored direct installation as an alternative approach. These are my thoughts on the 2 options.

Options

ghcup

There are two variant on this solution:

Pros

  • Remove some logic from the image build which ghcup can handle.
    • Download
    • Sha256 check
    • untar
    • ./configure + make install for ghc
  • More consolidation in the Haskell ecosystem around a single ghc installation tool.

Cons

  • ghcup can gpg verify its metadata file which it downloads, but it does not gpg (with PGP keys) verify ghc, cabal and stack downloads. The docker official images recommend gpg verification for artifacts downloaded during the build phase.
  • We switch out the debian packages for ghcup, but can still be blocked waiting on upstream. ghcup is very prompt with new release for now, but this may not always be the case.
  • Whilst we get some logic handled for us, we have less control over the download + install process.
  • Using a download / install tool, it becomes awkward to avoid multi-stage builds if we don't want to leave ghcup in the final image + keep the image layers clean. I personally do not think leaving ghcup in the image is a good idea.

direct installation

Direct GHC installation

Um, this essentially the opposite pros / cons of the ghcup approach. It is applying how we currently install stack, but to ghc + cabal.

(I also added sha256 verification in this PR as PGP + sha verification is actually the full recommended approach)

(If we go with this solution I would wait for this cabal issue before directly installing cabal-install and updating it to 3.6)

Conclusion

So for me I prefer the direct installation method. It boils down to the fact that ghcup doesn't actually remove much logic for us. The only special ghc install logic it removes is ./configure --prefix /opt/ghc/$GHC + make install which is not a big deal to call directly. In the case of cabal / stack it is just downloading tared binaries. The other sort of download / untar logic it does is very stock standard in docker images and not a huge win to remove IMO.

Alongside this minimal benefit, ghcup brings a bunch of downsides as noted. This is not any criticism of ghcup which is excellent for its core use cases, but I do not think it is a good fit here.

My conclusion here only applies to the current debian images, I would re-assess for Windows and ARM64 support as in those cases ghcup may be doing a lot more heavy lifiting for us.

@psftw and others let me know what you think!

@hasufell
Copy link
Member

hasufell commented Oct 1, 2021

@AlistairB I'm sorry but this is probably the 5th time I have to correct your misinformation:

We cannot currently gpg (using PGP keys) verify the ghcup release tar

Yes, you absolutely can, via the signed SHA256SUMS file, see https://github.com/haskell/docker-haskell/pull/47/files#diff-aed68aef536a4e912f837227c2884259958c90ff018f5933acb8d246212d4bc0R31

And even directy since 0.1.17.2:

ghcup can gpg verify its metadata file which it downloads, but it does not gpg verify ghc, cabal and stack.

This suggests you don't understand how cryptography works and how distributions use GPG signatures, look at this:

http://archive.ubuntu.com/ubuntu/pool/main/c/curl/curl_7.74.0-1ubuntu2.3.dsc

That's the signed metadata, which contains cryptographic hashes of the upstream tarballs. The hashes are verified by the person signing the metadata and hence there's no point in verifying the tarballs separately, because of the cryptographic hash.

The combination of GPG signing hashes instead of the actual data is as old as cryptography and is standard practice. Please read up on the topic.

@AlistairB
Copy link
Contributor Author

And even directy since 0.1.17.2

Ok apologies, I did not know that. I have removed that as a con.

The combination of GPG signing hashes instead of the actual data is as old as cryptography and is standard practice. Please read up on the topic.

I will not claim to be a security expert, but I do not believe this conforms with the "Preferred" solution from the official images docs. The direct GHC installation solution almost exactly matches this pattern. The key difference being that ghcup is not verifying pgp keys for ghc / cabal / stack (and of course the current images do this as well).

I am not saying this is "bad" security or anything, but whilst they are the recommendations I think we should follow them. As I have mentioned before there is a review process where they will call out stuff like this.

@hasufell
Copy link
Member

hasufell commented Oct 2, 2021

will not claim to be a security expert, but I do not believe this conforms with the "Preferred" solution from the official images docs.

You're wrong (again), because that would mean it's insecure to install via apt

@psftw
Copy link
Contributor

psftw commented Oct 6, 2021

Sorry again for neglecting the discussion here. I was more hopeful about ghcup in a previous thread, but I'm now a solid No, at least in the short term. The difference to install GHC from binaries is a much smaller step than refactoring the image based on ghcup, which will enable us to get caught back up on the 8.10 branch. In order to support a refactor to ghcup, we would need to build some trust and have a high expectation on the ghcup project to coordinate. Some of the impolite communication I've seen from @hasufell in particular has reduced my confidence in this route, irrespective of the technical merits. I'm still very open to ideas, but in the short term I think what @AlistairB has proposed in #46 is the most pragmatic step to buy us more time to get it right.

@kamek-pf
Copy link

kamek-pf commented Oct 7, 2021

Semi-related, since #46 was merged recently, does it mean 8.10.7 images will be available on DockerHub shortly ?

@AlistairB
Copy link
Contributor Author

AlistairB commented Oct 7, 2021

@kamek-pf yes! Sorry for the extended delay. Once docker-library/official-images#11050 is merged 8.10.7 and other missing versions should be available shortly (and going forward we can immediately update to new versions 🎉).

@AlistairB
Copy link
Contributor Author

Should be all released and updated now. Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants