Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around GHC bug #12020 #192

Merged
merged 2 commits into from
Jun 14, 2016
Merged

Work around GHC bug #12020 #192

merged 2 commits into from
Jun 14, 2016

Conversation

enolan
Copy link
Contributor

@enolan enolan commented Jun 2, 2016

This PR tries to work around GHC bug #12010, where calls to readRawBufferPtr and writeRawBufferPtr fail to throw exceptions on Windows 64 bit. I wrote tests, and patched recvBuf and sendBuf to make them pass. Unfortunately, I can still get a segfault on my outside test with artificial network corruption some of the time - somewhere less than 1.5%. The test program downloads a small package from FP Complete's Hackage mirror over TLS. If anyone can explain it, I'd be grateful. If not, this PR makes the crash much more rare anyway.

To reproduce the crash:

  • be on a Windows 64 bit machine with Stack
  • git clone -b network-pr https://github.com/enolan/http-conduit-crash-hang.git
  • git clone https://github.com/enolan/findcrash.git
  • stack build both of those
  • get Clumsy and set it to tamper at 20%
  • run findcrash with the location of bad-memcpy-crash from the http-conduit-crash-hang repo as its argument and wait a bit. It'll run the program until it crashes or 200 times, whichever comes first.

You should get the Windows dialog box for a segfault: "bad-memcpy-crash.exe has stopped working".

Here's what I know:

  • Unlike the original bug, it doesn't follow a failed call to recv or send. It's either a new bug I introduced when I fixed the first one, or was always there.
  • It doesn't happen inside of a Windows library function
  • Turning on RTS sanity checking doesn't help
  • It happens regardless of whether the threaded RTS is on or not
  • It's rarer when the threaded RTS is off
  • It seems to happen between calls to recv, not during them. I've only seen it happen after TLS handshaking was done, but that may be chance.

I hope a fresh set of eyes sees something. Maybe in the refactoring.

Best,
Echo

@hreinhardt
Copy link
Contributor

I was able to reproduce the crash on GHC 7.10.3 but not on GHC 8.0.1. Would be interesting to see if you can reproduce it on GHC 8.0.1

Anyway, I did some debugging by compiling all libraries with debug information, using the following command-line:

cabal install --disable-library-stripping --ghc-options="-g -rtsopts" --only-dependencies

I then ran the crashing application inside gdb which gave me the following info:

Program received signal SIGSEGV, Segmentation fault.
0x000000000064152a in s3xB_info () at .DataASN1Get.hs:101

Which seems to point to this source-location: https://github.com/vincenthz/hs-asn1/blob/master/data/Data/ASN1/Get.hs#L101

I also looked up s3xB_info in the output of objdump -Wi but it didn't lead to further insight.

I'd guess that the bug is in the file I linked to, but the line number is probably wrong. I'm not sure if it's really worth investigating further if the bug doesn't occur in GHC 8.0.1.

@hreinhardt
Copy link
Contributor

I tracked down the bug and reported it: vincenthz/hs-asn1#17

@eborden eborden added the Windows label Jun 3, 2016
@eborden
Copy link
Collaborator

eborden commented Jun 3, 2016

So just to confirm: This PR does not provide a complete workaround, but prevents the common case?

@enolan
Copy link
Contributor Author

enolan commented Jun 3, 2016

It should work around #12010 completely. If Holger is right, then the crash
I get after applying it is unrelated. I'll investigate the hs-asn1 bug and
see if I can confirm.

On Fri, Jun 3, 2016, 8:15 AM Evan Borden [email protected] wrote:

So just to confirm: This PR does not provide a complete workaround, but
prevents the common case?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#192 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AADwTVM5_OFBoV1mD_WlR-ScMCqy-cSuks5qIEUZgaJpZM4ItEqg
.

@eborden
Copy link
Collaborator

eborden commented Jun 5, 2016

@enolan My only concern with this PR is that there is a fair amount of re-factoring that is intermixed with your work-around. I'm in favour of the de-duplication and clean up that you've done, but I'd like to see it migrated to another PR that we can merge before this one. That way your work-around is isolated in a single commit without ambiguity.

@enolan
Copy link
Contributor Author

enolan commented Jun 6, 2016

@eborden I created a separate PR for the refactoring bits - #200. I didn't rebase this branch to not include them though. Since I only added the second check to sendBuf and recvBuf, the new tests won't pass without the refactoring.

@eborden eborden mentioned this pull request Jun 6, 2016
@enolan
Copy link
Contributor Author

enolan commented Jun 7, 2016

Pulling off DRAFT and requesting merge. With the bug in hs-asn1 fixed I don't get crashes anymore. Thanks to @hreinhardt and @vincenthz for that.

@enolan enolan changed the title Work around GHC bug #12020 - DRAFT Work around GHC bug #12020 Jun 7, 2016
@eborden
Copy link
Collaborator

eborden commented Jun 8, 2016

@enolan Can you bump this PR to get the new windows CI to run?

@kazu-yamamoto Do you have any strong opinions here?

enolan added 2 commits June 7, 2016 21:55
Due to GHC bug #12010, on x86_64 Windows, in the threaded RTS, the return values
of recv and send aren't checked properly in readRawBufferPtr or
writeRawBufferPtr. Calls that return -1 are interpreted as returning
4,294,967,295 which is then converted back to a 32 bit int equal to -1. This
occurs on all release versions of GHC.

This patch adds tests that run obviously wrong operations and checks that they
throw exceptions.
Makes the tests in the previous commit pass.
@enolan
Copy link
Contributor Author

enolan commented Jun 8, 2016

Passing :)

@kazu-yamamoto
Copy link
Collaborator

Sorry but I have no opinions about Windows.

@eborden
Copy link
Collaborator

eborden commented Jun 14, 2016

Okay, with no objections I'm merging this. Thanks @enolan.

@eborden eborden merged commit a34b47f into haskell:master Jun 14, 2016
@enolan
Copy link
Contributor Author

enolan commented Jun 15, 2016

Thanks @eborden !

@enolan enolan mentioned this pull request Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants