Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve dist/setup-config format mismatch detection #2251

Closed
hvr opened this issue Dec 2, 2014 · 54 comments
Closed

improve dist/setup-config format mismatch detection #2251

hvr opened this issue Dec 2, 2014 · 54 comments
Assignees
Milestone

Comments

@hvr
Copy link
Member

hvr commented Dec 2, 2014

@dcoutts asked me to file this:

I was having this weird issues with the devel version of cabal where some cabal invocations would fail with the following rather non-obvious error message

cabal: Prelude.chr: bad argument: 2041223

Even with -v3 this is the very first output you get from cabal. Only strace provided the hint that setup-config was the last file read before that error message was emitted.

(the underlying problem was most likely that cabal had been compiled against an older snapshot of the Cabal lib than the Cabal lib snapshot registered in the pkg db. And those two snapshots probably disagreed on the setup-config schema)

@23Skidoo
Copy link
Member

23Skidoo commented Dec 2, 2014

/cc @ttuegel

@dcoutts
Copy link
Contributor

dcoutts commented Dec 2, 2014

In the previous text based format we included info about the version of Cabal that wrote it and reported an appropriate message if the version did not match (and iirc, that was one of the cases where we'd automatically reconfigure).

@hvr
Copy link
Member Author

hvr commented Dec 2, 2014

fwiw, @svenpanne experienced this issue some time ago too: https://www.haskell.org/pipermail/ghc-devs/2014-October/006962.html

@23Skidoo
Copy link
Member

23Skidoo commented Dec 2, 2014

@dcoutts
We still do this, but new code is probably less robust when the Cabal snapshot changes, but the version number doesn't. Maybe the header should also include Git revision ID.

@23Skidoo
Copy link
Member

23Skidoo commented Dec 2, 2014

For those who come here by googling the error message: to fix this issue, just delete dist/setup-config.

@ttuegel
Copy link
Member

ttuegel commented Dec 2, 2014

The new code should be more robust to format changes, in the sense that binary returns errors explicitly rather than throwing exceptions. I think the problem here is that the header format changed; now we use a binary version header. I suspect this happens when a pre-binary-setup-config Cabal tries to read the binary header. I think we should switch back to the old header format; the new format still won't be backward compatible, but we will produce meaningful error messages. If we keep the old header format, we can be forward-compatible with the text setup-config format (eliminating the need to cabal configure explicitly when upgrading from text to binary; see #2214). The cost is breaking the format for anyone using a git checkout of Cabal (they will have to reconfigure manually; a small cost for an unreleased mistake).

@ttuegel ttuegel self-assigned this Dec 2, 2014
@ttuegel ttuegel added this to the Cabal-1.22 milestone Dec 2, 2014
@23Skidoo
Copy link
Member

23Skidoo commented Dec 2, 2014

@ttuegel

I think we should switch back to the old header format;

+1 on this.

I suspect this happens when a pre-binary-setup-config Cabal tries to read the binary header.

I think I had this happen to me with a new Cabal, but I don't know how to reproduce (the problem went away after a reconfigure). It'd help if someone could attach a setup-config file that exhibits this behaviour.

@hvr

Which version of cabal-install are you using?

@hvr
Copy link
Member Author

hvr commented Dec 2, 2014

@23Skidoo

The cabal-install executable was generated from

https://launchpad.net/~hvr/+archive/ubuntu/ghc/+sourcepub/4539223/+listing-archive-extra

(i.e. 7ece59b) whereas the Cabal library gitlinked in GHC HEAD (and thus registered in the pkg-db) is currently at 6c395bb

@23Skidoo
Copy link
Member

23Skidoo commented Dec 2, 2014

@hvr
Thanks. So this is a version that includes the binary setup-config patches.

@luite
Copy link
Member

luite commented Dec 3, 2014

+1 on better handling of version mismatches. i like the idea of sticking with the old header format and automatically reconfiguring.

I've also been seeing this type of error a lot, working with recent cabal versions. I know how to fix it, but it's probably going to bite lots of users, in particular in mixed cabal version setups (and i'll have to keep asking ghc 7.8 users to upgrade their cabal library to use ghcjs)

@svenpanne
Copy link

I just triggered a rebuild for one of my projects, and things seem to work now: https://travis-ci.org/haskell-opengl/StateVar/jobs/40772431 @hvr: Have the Ubuntu packages been updated?

The only funny thing is the warning https://travis-ci.org/haskell-opengl/StateVar/jobs/40772431#L96 ("Warning: /tmp/pkgConf-StateVar-1.0.12205.0: Unrecognized field data-dir on line 19"). Not a show-stopper, but I really try to keep all builds warning-free, so what's going on there?

@luite
Copy link
Member

luite commented Dec 3, 2014

data-dir is a new field in InstalledPackageInfo:
7fa8f88

I think the Cabal library shipping with GHC is still too old to include this change.

@svenpanne
Copy link

Hmmm, will GHC 7.8.4 contain the right Cabal library then? Having to live with the warning would be extremely annoying. Usual scenario: Install Haskell Platform, cabal update, cabal install cabal-install, cd MyCoolProject, cabal install => a warning you can't avoid (at least that's how I understand the current situation). 😕

@luite
Copy link
Member

luite commented Dec 3, 2014

It should go away when you install an updated Cabal library. In the current situation you probably have a Cabal library that's older than the one your cabal-install has been built with, but it does have the same version number, since that hasn't been bumped in the meantime.

This means that setup executables in your setup-exe-cache or those from custom setup packages will have some incompatibilities not reflected in the version number.

@ttuegel
Copy link
Member

ttuegel commented Dec 9, 2014

Fixed in 9ece664. If you are using an older git version, you may get a message about setup-config being corrupted; you simply need to reconfigure. Users of older, released versions will get the same old message about their Cabal version changing. I have made some necessary, but not sufficient, changes to address #2214, so you still have to reconfigure for now.

@svenpanne
Copy link

I don't think this is really fixed, the same problem showed up again recently: https://travis-ci.org/haskell-opengl/OpenGLRaw/builds/49116074

This time not only for cabal-HEAD/ghc-HEAD, but for cabal-1.22/ghc-7.10.1, too, which is a bit scary given the fact that 7.10 should be released soon.

@nikita-volkov
Copy link

Same issue here: https://travis-ci.org/nikita-volkov/record/jobs/48711367.

Also I randomly experience this on other projects locally. I run a freshly installed OS X 10.10.2 with the following setup:

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.10.0.20150123
$ cabal --version
cabal-install version 1.22.0.0
using version 1.22.0.0 of the Cabal library 

@svenpanne
Copy link

Can we please re-open this bug?

@23Skidoo
Copy link
Member

23Skidoo commented Feb 6, 2015

Reopened.

@23Skidoo 23Skidoo reopened this Feb 6, 2015
Blaisorblade added a commit to Blaisorblade/pts that referenced this issue Feb 7, 2015
This should be more robust (the Cabal format is subject to changes,
haskell/cabal#2251 discusses issues with ongoing changes).

Moreover, the tool currently does not work on Travis! We currently get:

  package-info.hs: Prelude.read: no parse

This looks like a version mismatch problem, but let's start from easy things.
Blaisorblade added a commit to Toxaris/pts that referenced this issue Feb 8, 2015
This should be more robust (the Cabal format is subject to changes,
haskell/cabal#2251 discusses issues with ongoing changes).

Moreover, the tool currently does not work on Travis! We currently get:

  package-info.hs: Prelude.read: no parse

This looks like a version mismatch problem, but let's start from easy things.
@dcoutts
Copy link
Contributor

dcoutts commented Feb 19, 2015

The fundamental problem is that cabal-install tries to read the dist/setup-config file, but when the build-type is Custom then it really has no right whatsoever to do so because it cannot expect it to exist or if it does to understand the format.

I see that we try and read the file with some degree of error handling, but that will fail for older binary lib versions that just throw exceptions.

We should look again in cabal-install/Main.hs at the uses of tryGetPersistBuildConfig and getPersistBuildConfig and for each one ask "but what about build-type: Custom?", remembering that for custom we cannot expect the file to be meaningful (a custom Setup.hs doesn't even have to use the Cabal lib).

One of my colleagues is actually getting this bug with building Cabal head itself (using cabal-install version 1.22.0.0 using version 1.22.0.0 of the Cabal library). I cannot reproduce it. We're using the same version of the binary package.

@dcoutts
Copy link
Contributor

dcoutts commented Feb 19, 2015

One option of course is to go back to the text format and use the old code that I wrote carefully to avoid these problems :-) until we can sort things out to use a binary format properly.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

@cocreature What version of cabal-install are you running that with?

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

@edsko Well, GHC stack traces are never as helpful as you hope, but that gives me an idea where to look. Thanks!

@cocreature
Copy link
Collaborator

@ttuegel 1.22.0.0 from the nix haskellngPackages.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

@cocreature Thank you! I can finally reproduce this.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

This seems to be unrelated to the version of binary. This issue is present in the 1.22 branch, but not on master.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

Actually, I think it does depend on the version of binary. I think it only happens with new binary. We use decodeOrFail when it's available, but it turns out decodeOrFail can throw exceptions from get and friends. The Binary instance for Char calls chr, which is partial.

I will switch over to using our wrapper around decode to catch exceptions. I'll also upstream a patch to binary that makes that instance total.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

Fixed in #2428.

@ttuegel ttuegel closed this as completed Feb 20, 2015
@svenpanne
Copy link

Cool, thanks. @hvr: Could you build an updated Ubuntu package and put that into your ppa? This would unbreak lots of Travis CI projects...

@23Skidoo
Copy link
Member

@ttuegel

it turns out decodeOrFail can throw exceptions from get and friends.

Is this a bug in binary or expected behaviour? Perhaps we should report it.

@ttuegel
Copy link
Member

ttuegel commented Feb 20, 2015

It is unexpected and avoidable, but technically correct (impossible to catch exceptions in pure code). I have submitted a patch.

@23Skidoo
Copy link
Member

One of my colleagues is actually getting this bug with building Cabal head itself

Interestingly enough, the problem reported by @edsko recently started happening on Travis with 7.10. What I don't get is why build fails to read back the setup-config file - both build and configure (which generated it) use the internal setup method, so the versions of binary should match. @ttuegel, any ideas?

#2428 should help, but why is it failing at all?

@svenpanne
Copy link

I know that I'm probably repeating myself, but I have problems with GHC 7.10 and Cabal 1.22 for months now, too, see e.g.

https://travis-ci.org/haskell-opengl/GLURaw/jobs/53960254
https://travis-ci.org/haskell-opengl/StateVar/jobs/53657899
https://travis-ci.org/haskell-opengl/OpenGL/jobs/53853598

to name just a few. https://launchpad.net/~hvr/+archive/ubuntu/ghc contains only relatively old cabal-install 1.22 versions, so could somebody please clarify what the current state of affairs is? To be honest, I don't understand things anymore. 😕 Which versions are expected to work together? How can I test this on Travis CI?

The current situation is quite frustrating: I can either ignore failures with 7.10 and 1.22 via allow_failures (bad, because I might not catch problems, and both versions should be out soon) or I have relatively random failures in my build matrix (bad, too, for the obvious reasons).

@ttuegel
Copy link
Member

ttuegel commented Mar 16, 2015

Cabal and cabal-install must always have the same major version, in this case 1.22.y.z. Releases with the same major version must be compatible, i.e. Cabal-1.22.u.v is compatible with cabal-install-1.22.y.z for all u, v, y, z. It is always best to use the latest minor release, as that will have the most recent bug fixes. Right now that is Cabal-1.22.1.1 and cabal-install-1.22.0.1. That means you should be updating cabal-install every time Cabal is updated. Travis does not do this, and so it is broken right now.

@ttuegel
Copy link
Member

ttuegel commented Mar 16, 2015

@23Skidoo The proximate cause of that error is Travis not using cabal-install built with the latest Cabal. Not sure yet why it insists on reconfiguring, but we would get a more meaningful explanation if Travis used the latest Cabal.

@23Skidoo
Copy link
Member

What's interesting is that the same version of cabal-install is used in all cases (7.4.2, 7.6.3, 7.8.3, ...), but the exception only happens with HEAD and 7.10. I wonder whether this has something to do with the 7.10 snapshot reporting its version as 7.10.0.20150314 (HEAD uses 7.11.20150314). Maybe this triggers some bug in the Data.Version Binary instance?

@23Skidoo
Copy link
Member

@ttuegel

The proximate cause of that error is Travis not using cabal-install built with the latest Cabal.

I understand that it doesn't include your fix (#2428) for the uncaught exception problem, which is why we get Prelude.chr: bad argument: instead of Reconfiguring with default arguments.... But I don't understand why we get an exception in the first place: setup-config was generated by the same cabal-install executable that tries to read it back, versions of Cabal and binary should match.

@svenpanne
Copy link

Hmmm, even after all those explanations my question remains: What shall I (and lots of other people) do to get their Travis builds green again? I don't fully understand the details yet, but having to do seemingly unrelated things in lockstep is a horrible user experience, and this should mitigated somehow. In the C/C++ world: Imagine if gcc/make/rpm/... would have to be kept tightly in sync somehow... 😱

@ttuegel
Copy link
Member

ttuegel commented Mar 16, 2015

@svenpanne I'm not sure what you mean by keeping things in lockstep here. The bug is fixed in Cabal-1.22.1.1. Travis does not use this version, so you do not get the bug fix. That's the extent of the problem blocking builds.

There is an unrelated issue that cabal-install sometimes reconfigures when it doesn't need to. In Cabal < 1.22.1.1, this also triggers the Prelude.chr bug, but if Travis updated their Cabal, your builds would still succeed. (Albeit after reconfiguring unnecessarily.) The bug fix will also be available in GHC 7.10.1-rc3, so if/when Travis updates to that, you'll be good to go.

@23Skidoo
Copy link
Member

@ttuegel

There is an unrelated issue that cabal-install sometimes reconfigures when it doesn't need to. In Cabal < 1.22.1.1, this also triggers the Prelude.chr bug

I don't think that the reconfigure logic is at fault. The Main.reconfigure function reconfigures only if necessary. buildAction unconditionally calls reconfigure, which tries to load setup-config with tryGetPersistBuildConfig, which raises the Prelude.chr exception.

@svenpanne
Copy link

Just to clarify: The only missing thing to get Travis CI builds with 7.10 green again is a new version (1.22.1.1) of cabal-install in https://launchpad.net/~hvr/+archive/ubuntu/ghc, correct? GHC seems to be updated quite regularly there, but not cabal-install. Can we somehow arrange to get that soon? 7.10 is already at release candidate 3, and I haven't been able to get consistently green builds for that.

Given the heavy reliance of lots of packages on Travis CI (not only mine), we should probably have some automated PPA where there relevant packages (GHC, cabal-install, etc.) live.

@23Skidoo
Copy link
Member

@svenpanne

The only missing thing to get Travis CI builds with 7.10 green again is a new version (1.22.1.1) of cabal-install in https://launchpad.net/~hvr/+archive/ubuntu/ghc, correct? GHC seems to be updated quite regularly there, but not cabal-install. Can we somehow arrange to get that soon?

That's up to @hvr, who maintains that PPA.

@svenpanne
Copy link

@23Skidoo Well, I know that, and I think Herbert is really doing a great job, but he's probably overloaded with work, so he couldn't update the PPA. My proposal was to replace that "single point of failure" for all those Haskell projects using Travis CI with something automatic, i.e. have a PPA somewhere with GHC, cabal-install etc. on it.

@ttuegel
Copy link
Member

ttuegel commented Mar 18, 2015

@23Skidoo

I don't think that the reconfigure logic is at fault. The Main.reconfigure function reconfigures only if necessary. buildAction unconditionally calls reconfigure, which tries to load setup-config with tryGetPersistBuildConfig, which raises the Prelude.chr exception.

I think you're right. I just checked, and the setup-config files created during the configure and build runs are byte-for-byte identical. I suspect a bug in binary or one of our instances.

@23Skidoo
Copy link
Member

Looks like @hvr has updated his PPA or something; this problem no longer happens for cabal-install on Travis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants