Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

canonicalizePath hangs when used on a file in an sshfs share (threaded RTS only) #35

Closed
hamishmack opened this issue Sep 1, 2015 · 9 comments
Labels
type: x-duplicate-upstream This issue is caused entirely by an upstream issue.

Comments

@hamishmack
Copy link

Tested on GHC 7.10.2 and directory 1.2.2.0 on OS X.

To reproduce this issue first mount a sshfs share (I used a server on a quite high latency link).

sshfs server:haskell/Haxl haskell/Haxl

Check that the share is working and then compile and run something like this with --threaded

module Main (main) where

import System.Directory (canonicalizePath)

main :: IO ()
main = canonicalizePath "/Users/hamish/haskell/Haxl/haxl.cabal" >>= print

(where the file haxl.cabal is a file that exists on the server.)

The process hangs.

Changing

foreign import ccall unsafe "realpath" c_realpath

to

foreign import ccall safe "realpath" c_realpath

seems to make the problem go away.

@Rufflewind
Copy link
Member

I was not able to reproduce the problem on Linux with GHC 7.10.1. So either it's a problem with SSHFS on Mac, GHC 7.10.2, or that specific SSHFS mount.

  • How reproducible is the bug? Does it happen 100% of the time? If you change it to safe does it always prevent the bug?
  • Does the bug affect other SSHFS mounts? Is it related to latency somehow?
  • Can you run it through dtruss?

I'm somewhat baffled as to how making the call safe seems to fix the problem, as I normally would expect it to be the other way around.

Here is what the code boils down to essentially:

import GHC.Foreign as GHC
import GHC.IO.Encoding
import System.Posix.Error
import System.IO
import Foreign
import Foreign.C
import System.FilePath

foreign import ccall unsafe "realpath"
  c_realpath :: CString -> CString -> IO CString

main = do
  let fpath = "/Users/hamish/haskell/Haxl/haxl.cabal"
  enc <- getFileSystemEncoding
  GHC.withCString enc fpath $ \pInPath ->
    allocaBytes 1024 $ \pOutPath -> do
      _ <- throwErrnoPathIfNull "asdf" fpath (c_realpath pInPath pOutPath)
      path <- GHC.peekCString enc pOutPath
      print (normalise path)

@hamishmack
Copy link
Author

Seems to happen all the time on OS X. I am using FUSE on OS X.

How reproducible is the bug? Does it happen 100% of the time? If you change it to safe does it always prevent the bug?

Yes. It is happening all the time with unsafe and never with safe.

Does the bug affect other SSHFS mounts? Is it related to latency somehow?

I think it might be the latency that is triggering it. The server is in Germany and I am in Vancouver (so probably about 200ms or something).

Can you run it through dtruss?

Log bellow.

I'm somewhat baffled as to how making the call safe seems to fix the problem, as I normally would expect it to be the other way around.

My understanding is that safe means call this the safe way and unsafe means call it an unsafe (but faster) way. That seems to be what it describes here.

getattrlist("/Users\0", 0x7FFF858451A4, 0x7FFF5F6854F0)      = 0 0
getattrlist("/Users/hamish\0", 0x7FFF858451A4, 0x7FFF5F6854F0)       = 0 0
getattrlist("/Users/hamish/haskell\0", 0x7FFF858451A4, 0x7FFF5F6854F0)       = 0 0
getattrlist("/Users/hamish/haskell/Haxl\0", 0x7FFF858451A4, 0x7FFF5F6854F0)      = 0 0
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x7FFF5F6854F0)      = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x7FFF5F6854F0)        = -1 Err#-1
write(0xB, "\377\0", 0x1)        = 1 0
sigreturn(0x7FFF5F683AD0, 0x1E, 0x1)         = 0 Err#-2
poll(0x100906D50, 0x2, 0xFFFFFFFFFFFFFFFF)       = 1 0
statfs64(0x10090E010, 0x7FFF5F684078, 0x1)       = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x1)         = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x1)       = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x1)         = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x1)       = -1 Err#-1
sigreturn(0x7FFF5F683AD0, 0x1E, 0x1)         = 0 Err#-2
statfs64(0x10090E010, 0x7FFF5F684078, 0x1)       = -1 Err#-1

@Rufflewind
Copy link
Member

If you run it without -threaded, then I presume it does not hang?

I think it might be the latency that is triggering it.

Can you check by mounting a closer server?

@Rufflewind
Copy link
Member

Thanks for the dtruss log! It was very informative.

It appears to be caused by GHC's periodic timer signals, which repeatedly interrupts the statfs64 call before it has a chance to finish. It's likely stuck in a retry-loop in the realpath implementation. I suspect using safe turns the timer signals off, though I'd have to double-check.

Rufflewind added a commit to Rufflewind/directory that referenced this issue Sep 2, 2015
Although these functions never return EINTR, signals may still affect
these calls.  In particular, realpath on Mac OS X contains a
retry-if-interrupted loop on statfs64.  If the file system is slow
(e.g. SSHFS), the call will hang due to the barrage of periodic alarm
signals from the GHC runtime.  Marking the foreign import as safe
appears to silence the alarm signals.

We also do this for utimensat as a precautionary measure.  Given that
these are system calls that manipulate the file system, the performance
overhead should be negligible.

Fixes haskell#35.
@hamishmack
Copy link
Author

Thanks.

I have another issue that might be related. After patching Leksah to use a fixed version of canonicalizePath, it now fails trying to open the file. This time the non threaded RTS also fails.

Small test case is:

import Control.Monad (void)
import qualified GHC.IO.FD as FD (openFile)
import GHC.IO.IOMode (IOMode(..))

main = void $ FD.openFile "/Users/hamish/haskell/Haxl/haxl.cabal" ReadMode True

Non threaded RTS winds up like this...

51121/0x85059:  sigreturn(0x7FFF5F3055E0, 0x1E, 0x1B6)       = 0 Err#-2
51121/0x85059:  open("/Users/hamish/haskell/Haxl/haxl.cabal\0", 0x20004, 0x1B6)      = -1 Err#4
51121/0x85059:  sigreturn(0x7FFF5F3055E0, 0x1E, 0x1B6)       = 0 Err#-2
51121/0x85059:  open("/Users/hamish/haskell/Haxl/haxl.cabal\0", 0x20004, 0x1B6)      = -1 Err#4

Threaded RTS like this...

52296/0x87711:  open("/Users/hamish/haskell/Haxl/haxl.cabal\0", 0x20004, 0x1B6)      = -1 Err#4
52296/0x87711:  sigreturn(0x7FFF54C4C5C0, 0x1E, 0x1B6)       = 0 Err#-2
52296/0x87711:  __pthread_canceled(0x0, 0x1E, 0x1B6)         = -1 Err#22
52296/0x87711:  open("/Users/hamish/haskell/Haxl/haxl.cabal\0", 0x20004, 0x1B6)      = -1 Err#4
52296/0x87711:  sigreturn(0x7FFF54C4C5C0, 0x1E, 0x1B6)       = 0 Err#-2
52296/0x87711:  __pthread_canceled(0x0, 0x1E, 0x1B6)         = -1 Err#22

@hamishmack
Copy link
Author

main = void $ FD.openFile "/Users/hamish/haskell/Haxl/haxl.cabal" ReadMode False

Seems to work for both threaded and non threaded RTS

@Rufflewind
Copy link
Member

It would be useful to bring this up with haskell-libraries and/or ghc-devs. The heart of FD.openFile looks like this:

throwErrnoIfMinus1Retry "openFile"
                (if non_blocking then c_open      f oflags 0o666
                                 else c_safe_open f oflags 0o666)

So if non_blocking is True, you'd run into the same problem as you did for canonicalizePath.

@Rufflewind Rufflewind added blocked type: x-duplicate-upstream This issue is caused entirely by an upstream issue. labels Sep 5, 2015
@Rufflewind
Copy link
Member

The issue was discussed on the mailing lists and it looks like using safe isn't guaranteed to always work. The defect has been reported upstream, so I'll leave this open until that gets fixed.

Rufflewind added a commit to Rufflewind/directory that referenced this issue Sep 5, 2015
Although these functions never return EINTR, signals may still affect
these calls.  In particular, realpath on Mac OS X contains a
retry-if-interrupted loop on statfs64.  If the file system is slow
(e.g. SSHFS), the call will hang due to the barrage of periodic alarm
signals from the GHC runtime.  Marking the foreign import as safe
appears to silence the alarm signals, although it is not guaranteed [1].
This is the best we can do until [2] gets fixed.

We also do this for utimensat as a precautionary measure.  Given that
these are system calls that manipulate the file system, the performance
overhead ought to be negligible.

This should alleviate haskell#35 for now.

[1] https://mail.haskell.org/pipermail/ghc-devs/2015-September/009770.html
[2] https://ghc.haskell.org/trac/ghc/ticket/10840
@Rufflewind Rufflewind added the type: a-bug The described behavior is not working as intended. label Sep 5, 2015
@Rufflewind Rufflewind removed the blocked label Dec 8, 2015
bgamari pushed a commit to bgamari/directory that referenced this issue Jul 29, 2016
Refactor and fix test for splitExtension(s)
@Rufflewind
Copy link
Member

Fixed upstream: https://phabricator.haskell.org/D2796

@Rufflewind Rufflewind removed the type: a-bug The described behavior is not working as intended. label Dec 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: x-duplicate-upstream This issue is caused entirely by an upstream issue.
Projects
None yet
Development

No branches or pull requests

2 participants