Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty onboarding #431

Closed
awf opened this issue Oct 24, 2020 · 16 comments
Closed

Difficulty onboarding #431

awf opened this issue Oct 24, 2020 · 16 comments

Comments

@awf
Copy link
Contributor

awf commented Oct 24, 2020

Following the instructions on https://github.com/microsoft/knossos-ksc/blob/master/README-ksc.md yields a failure to find gmp, which is installed. See attached console log.

@awf
Copy link
Contributor Author

awf commented Oct 24, 2020

log.txt

@awf
Copy link
Contributor Author

awf commented Oct 24, 2020

Using apt-get and this ppa worked... https://launchpad.net/~hvr/+archive/ubuntu/ghc

sudo add-apt-repository ppa:hvr/ghc
sudo apt-get update
sudo apt-get install ghc-8.6.5
sudo apt-get install cabal-install-3.4
PATH="/opt/ghc/bin:$PATH"
cabal v2-build

@toelli-msft
Copy link
Contributor

Thanks for the report. It seems that the install is using an Anaconda-supplied toolchain. The linker that is used in the log is

/anaconda/envs/py37_default/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld

and, if I am reading things correctly, gcc is

x86_64-conda_cos6-linux-gnu-cc

I don't know whether or not this would cause the GHC installation to fail to find libgmp. I will try to reproduce this locally on a fresh WSL.

@toelli-msft
Copy link
Contributor

I've just successfully re-run the instructions for a newly-created user in my Ubuntu 18.04 WSL system. Everything worked fine. Could you try disabling your Anaconda environment and rerunning the instructions? If that succeeds then Anaconda must be interfering and I'll dig into how best to deal with that.

@toelli-msft
Copy link
Contributor

toelli-msft commented Oct 27, 2020

A similar problem has been noted with miniconda. And also.

@toelli-msft
Copy link
Contributor

If and when it complains about libgmp, a good diagnostic to try would be

echo -e '#include <gmp.h>\nint main() { mpz_t integ; mpz_init(integ); return 0; }' > /tmp/foo.c
gcc /tmp/foo.c -lgmp
ldd /tmp/a.out

@awf
Copy link
Contributor Author

awf commented Oct 27, 2020

Thanks, I think it's important to work well with conda, it's what a lot of people use. This was a relatively clean install on a new machine.

It's also good to work with the conventional package managers.

And I note ghcup now has a deprecation notice, and suggests that Ubuntu users might prefer the PPA solution.

And finally, curl | sh has a generally unpleasant whiff. Of course trusting a random PPA is similar, but the rollback options are better understood.

@toelli-msft
Copy link
Contributor

toelli-msft commented Oct 28, 2020

I gave considerable thought, about a year ago, to how we install the Haskell toolchain simply, reliably and reproducibly. I'll explain as much of the rationale as I can.

What was the overall goal for the toolchain installation instructions?

To provide a small set of simple instructions that reproducibly get a developer's system into a known-good state.

Why don't we use the default Ubuntu packages of GHC and cabal-install?

The ghc and cabal-install packages in Ubuntu 18.04 are years out of date, so until Ubuntu 20.04 was released we couldn't use the default Ubuntu packages. Ubuntu 20.04 was released six months ago. Its version of cabal-install is 2.4.0.0 which is two years old. It will probably work but I'm not certain. Therefore we could probably require everyone working on ksc to upgrade to Ubuntu 20.04 and then simply use the system packages.

Why do we use ghcup rather than the Ubuntu PPA?

We used HVR's Ubuntu PPA to install ghc and cabal-install until 328526b in January 2020. We switched because the package cabal-install-3.0 available through that PPA was not the stable released version, rather it was regularly updated from the upstream git repository. You can see this in the version numbers of the cabal-install packages. All sorts of bugs crept into it and it regularly broke in subtle ways. I remember sitting with Anna for an hour trying to understand why the build command she was using did not work when the same build command worked on my system. The reason turned out to be that she was using a package from the PPA that just so happened to be broken at the time she installed it. On the other hand ghcup installs the released version of cabal-install and will be the same on all systems where it is installed (modulo OS and architecture).

Therefore, for reproducibility, it seemed sensible to use ghcup to install cabal-install. Once we were doing that it also seemed sensible to get ghc from the same source.

Why do we use a deprecated version of ghcup?

Our install instructions specify installing a fixed version of cabal-install (3.0.0.0) and ghc (8.6.5). The version of ghcup that we use is only "deprecated" in the sense that it doesn't support the latest versions of ghc and cabal-install. It still works perfectly well (and will continue to work perfectly well indefinitely) to install the versions of ghc and cabal-install that we use.

Updating the version of ghcup that is referred to in the instructions is a fine thing to do but will provide no benefit for our current instructions.

Is it really OK to download a shell script from a URL and run it?

The install instructions indeed say to download ghcup, a shell script, and then run it. It's correct to be cautious of the security implications of this installation method. To be safer we could perhaps check the hash of the downloaded script to confirm that it is what we are expecting. On the other hand we do download it from a specific git commit, so unless Github is lying to us we are indeed getting the same version each time.

On the other hand, installing a package from a PPA requires root access, not just user access. A badly or maliciously written package could hose your entire system! Furthermore, as I understand it, the standard way of installing Anaconda is to download and run a shell script. To first approximation ghcup is to GHC as Anaconda is to Python.

Taking these observations into account, our particular usage of "curl | sh" doesn't seem so unreasonable.

How do I rollback my system to the state it was before I used ghcup?

rm -r ~/.ghcup

Using the PPA avoided a conflict with Anaconda. Doesn't that imply we should use the PPA?

EDIT: The PPA does avoid the conflict with Anaconda. See later comment. On the other hand the Ubuntu 20.04 default packages also solve the conflict with Anaconda and there is a workaround to make ghcup work with Anaconda.

I strongly suspect that it is a red herring that installing ghc and cabal-install via the PPA fixed the Anaconda conflict. It's hard to be sure without being able to reproduce locally, but my best guess is that the Anaconda environment which exhibited the bug will break all ghcs, regardless of how they were installed (and conversely, Anaconda environments that don't exhibit the bug will not break any ghcs). After all, the breakage was to do with a version of ld in the PATH that does not have access to the system libraries, as you can see from the error message

/anaconda/envs/py37_default/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lgmp

I suspect that after the PPA was installed it was used in a non-Anaconda environment, or an Anaconda environment that happened to be in a good state, so ghc worked. If the PPA ghc had been used in the bad Anaconda environment it would have broken too.

I would like to be able to reproduce this in order to be certain, but unfortunately I have not been able to. I'm finding the root cause extremely difficult to track down. Without being able to sit down at a broken system it may be impossible to reproduce.

(I installed Anaconda3-2020.07-Linux-x86_64.sh from https://www.anaconda.com/products/individual and I haven't been able to reproduce the problem.)

The ghcup README mentions the PPA. Doesn't that mean we should use the PPA?

ghcup is just trying to provide helpful information. It doesn't imply that the PPA is better somehow.

What are our options?

On the unproven but likely assumption that the Anaconda conflict will be suffered equally by any means of installing ghc and cabal-install our options are to use

  • Ubuntu 20.04 standard packages

    • Upsides: completely standard

    • Downsides: requires everyone to upgrade their Ubuntu to 20.04 (this may also be interpreted as an upside), not certain that cabal-install-2.4.0.0 will actually work for us (but can check)

  • HVR's PPA

    • Upsides: uses standard distribution tooling (apt) and contains latest versions

    • Downsides: the packages sometimes break without warning and are often unreleased versions (for example the PPA contains cabal-install-3.4.0.0 which has not yet even been released)

  • ghcup

    • Upsides: completely reproducible

    • Downsides: not standard distribution tooling so uses "curl | sh"

Thoughts?

@awf
Copy link
Contributor Author

awf commented Oct 29, 2020

Tom, huge thanks for this detailed analysis. I shall try to start a second WSL and see if it duplicates. I note that as part of getting cabal to work, I hit haskell/cabal#6551, and of course I'm now on WSL2, so I wonder if that will make a difference.

@awf
Copy link
Contributor Author

awf commented Oct 29, 2020

Some notes: The ppa https://launchpad.net/~hvr/+archive/ubuntu/ghc contains numerous specific cabal/ghc versions. In the instructions I pasted above, I explicitly selected ghc-8.6.5 and cabal-install-3.4, just because they seemed reasonable. If there's a fear that cabal 3.4 is risky, we can replace that instruction. I understand that you had some historical problems with the apt install of the tools, but (a) that was then, this is now, and (b) it actually reinforces the argument that people want and expect to use apt to install packages on Ubuntu. If we don't offer that, people will do it anyway, so we should understand how to help them.

I see some merit in the argument that "ghcup is to Haskell as conda is to Python", but:

  1. conda now manages a lot more than just Python - it is useful to switch C++ compilers, tensorflow installs, etc.
  2. conda's primary merit is quick switching between environments (I can of course do that with ghcup, but it's not so simple: mv ~/.ghcup ~/.ghcup-env1; mv ... etc.)
  3. The conda userbase is (I am guessing) rather larger than ghcup's, so the trust base of the "curl | sh" install is larger. If I am paranoid, I will need to inspect the conda script anyway, so adding ghcup gives me another one to check.

So, I believe we should at the very least add to the README a line saying "if you want to use apt-get instead of ghcup, see #431".

@awf
Copy link
Contributor Author

awf commented Oct 29, 2020

If and when it complains about libgmp, a good diagnostic to try would be

echo -e '#include <gmp.h>\nint main() { mpz_t integ; mpz_init(integ); return 0; }' > /tmp/foo.c
gcc /tmp/foo.c -lgmp
ldd ./a.out

I forgot to reply to this -- I did successfully run such a test at the time (i.e. I did't run ldd, but ran the a.out).

@toelli-msft
Copy link
Contributor

as part of getting cabal to work, I hit haskell/cabal#6551, and of course I'm now on WSL2, so I wonder if that will make a difference.

Cabal has had numerous similar problems on WSL over the years [1] [2]. There are builds of cabal-install that work perfectly well on WSL but others will break randomly. This is another reason that having reproducible installation instructions is so important.

WSL2 is actually running Linux, as opposed to WSL which runs a reimplementation of Linux syscalls. Therefore this problem should not exist on WSL2. If you do come across it on WSL2 please report it as soon as possible (at least to me and I can forward it upstream as necessary)!

conda's primary merit is quick switching between environments (I can of course do that with ghcup, but it's not so simple

The claim "ghcup is to Haskell as conda is to Python" was very much "to first order" only and not to be taken too literally. That said, I'm not sure what you mean about switching environments using ghcup. The way to "switch environments" with ghcup is ghcup set <version-number>. You wouldn't do it by moving directories.

adding ghcup gives me another one to check

Agreed, but

  1. ghcup comes from haskell.org so installing ghcup this way seems no more risky than installing ghc and cabal-install that are ultimately from the same source (Granted it is still an addition to the trusted code base. Granted too that the README says to install from Github rather than haskell.org, but we could change the instruction to point to haskell.org if we deem that more secure.)

  2. I don't understand why auditing ghcup imposes more overhead than auditing the packages from PPA. The PPA requires you to install packages as root. A badly or maliciously written PPA package is more harmful than a badly or maliciously written ghcup!

"Download a script recommended on haskell.org, from a specific git commit from Github and run it as non-root user" seems at least as safe as "add a private individual's PPA and install his packages as root, oh and by the way he may have switched out the package you audited for a new one". Both of these are a drop in the ocean compared to "trust all ksc's transitive Haskell dependencies as well as all of TensorFlow's and Anaconda's". Security engineering is very, very hard. We have to draw the line at a sensible place. To paraphrase XKCD

  • "I want our infrastructure to be reasonably secure"
  • Sure, give me a few hours
  • "I want our infrastructure to be really secure"
  • I'll need a research team and five years

Do you have a particular threat model in mind? If so let's dive deeper into the details and determine which of the PPA and ghcup would be better from a security standpoint.

What to do?

There are two tangentially related but distinct questions: what to do about security and what to do about obtaining a reproducible installation that doesn't interfere with other commonly-used software.

What to do about security?

On reflection it would be somewhat safer to

  • add --proto '=https' --tlsv1.2 to curl
  • suggest that users run sha256sum on the downloaded script to check it is indeed the exact one that we are expecting

Using the Ubuntu 20.04 standard packages would be even safer but has the drawbacks listed above. Using the binary distribution from haskell.org might be safer still but is unreasonably complicated.

Beyond that, let's dive more into a specific threat model.

What to do about reproducible installation?

I'm not sure what all the desiderata are here. My desiderata are that the instructions should be simple, almost always succeed, and after following the instructions the user's system should be in a known good state. ghcup achieves this, Ubuntu 20.04 standard packages would achieve this (but may have other drawbacks), the PPA does not achieve this. "Conflicting with Anaconda" is not a known good state, but as yet we have not been able to determine the root cause let alone determine how to remedy. Determining the root cause may require me (remotely) "sitting down at" the terminal of someone experiencing the issue.

Perhaps another desideratum is "it should be easy for users to install the Haskell toolchain using apt". Using the PPA for satisfying this desideratum conflicts with "after following the instructions the user's system should be in a known good state". It conflicts not because of historical problems I had with the PPA, but because the PPA by its very nature does not provide stable packages. The package names themselves indicate that the versions are not stable. The cabal-install-3.0 I install today could be different from the cabal-install-3.0 that Anna installs tomorrow. Now, cabal-install-3.2 hasn't been updated since February, so we may be lucky and get stable packages, but it would indeed be more luck than judgement.

Using Ubuntu 20.04, on the other hand, does satisfy all desiderata that I can see (with the wart that cabal-install-2.4 is very old, but this can be worked around).

Proposal

  • Add to the instructions an explanation of how to obtain the toolchain from standard Ubuntu 20.04 packages
  • Once everyone is on 20.04 anyway (in several months) remove the instructions to use ghcup, assuming the wart removal (possibly involving compiling cabal-install from source) is not too onerous

What do you think to this plan?

@toelli-msft
Copy link
Contributor

Update: I can reproduce the issue

I have managed to reproduce the issue. Thanks to Henry Flux-Jackson sharing his Anaconda expertise.

The problem is a bad interaction between Anaconda and the installation method that ghcup uses. The same problem would not be observed if ghc were installed via Ubuntu's standard package repository or HVR's PPA (it would be observed if ghc were installed via stack or from the haskell.org source or binary distribution, I believe). I explain the cause in more detail at haskell/cabal#5280 (comment).

The workaround is easy: make sure you are not in an Anaconda environment when you run ghcup. Nonetheless you can subsequently activate an Anaconda environment and use the ghc you installed via ghcup. There is no incompatibility between using the two environments. They are only incompatible at the time of ghc install.

Sadly this implies that ghcup is not as reproducible as I hoped.

@awf
Copy link
Contributor Author

awf commented Nov 24, 2020

I note that install_linux.sh uses the ppa option. It would make sense for user instructions to be consistent with CI.

@toelli-msft
Copy link
Contributor

toelli-msft commented Nov 24, 2020

Yes, agreed that we should be consistent. Let me update our list of choices based on what I learned about the Conda/ghcup conflict.

What are our options?

  • Ubuntu 20.04 standard packages

    • Upsides: completely standard

    • Downsides

      • Requires everyone to upgrade their Ubuntu to 20.04
      • Provides cabal-install-2.4.0.0 which is very old. I'm not sure will actually work for us (but can check).
      • Provides GHC 8.8.1. This version of GHC is buggy on Windows. It's probably not advisable to use a different version of GHC on Linux from the version we use on Windows.
  • HVR's PPA

    • Upsides: uses standard distribution tooling (apt) and contains latest versions

    • Downsides: the packages sometimes break without warning and are often unreleased versions (for example the PPA contains cabal-install-3.4.0.0 which has not yet even been released). We will probably get away with using cabal-install-3.0.0.0 and assuming it won't break because it is sufficiently old to not change again.

  • ghcup

    • Upsides: completely reproducible [EDIT: even though it is harder than initially believed to achieve this -- one would have to explicitly check for and exclude Conda]

    • Downsides

      • Not standard distribution tooling so requires downloading a binary from haskell.org and running it
      • Will require a warning "Will not install correctly if you are in an active conda environment. Please install in a shell without conda environment enabled".

What's your preference?

@awf
Copy link
Contributor Author

awf commented Nov 24, 2020

I'm not sure your upsides for ghcup match your previous observation "ghcup is not as reproducible as I hoped."

I prefer PPA. The conda workaround for ghcup would require some wordsmithing, and anyway our CI uses the PPA.

toelli-msft pushed a commit that referenced this issue Nov 24, 2020
@awf awf mentioned this issue Dec 1, 2020
toelli-msft pushed a commit that referenced this issue Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants