-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
break failing rewrite to breakByte and failing to eliminate boxing/unboxing #70
Comments
I've spent some time digging into this and it seems the issue may be that the worker/wrapper transformation doesn't mange to unbox a value that occurs inside of the unboxed tuple. Specifically, in this case, the boxing and unboxing associated with the offset computed by To see if this was possibly right I wrote a version of findIndexOrEnd :: (Word8 -> Bool) -> ByteString -> Int
findIndexOrEnd k (PS (ForeignPtr a# c) s (I# l#)) =
case go a# l# 0# realWorld# of
(# s#, r #) -> case touch# c s# of
s# -> I# r
where
go :: Addr# -> Int# -> Int# -> State# RealWorld -> (# State# RealWorld, Int# #)
go a# l# n# s# = case tagToEnum# (n# >=# l#) :: Bool of
True -> (# s#, l# #)
False -> case readWord8OffAddr# a# n# s# of
(# s#, w #) ->
case k (W8# w) of
True -> (# s#, n# #)
False -> go a# l# (n# +# 1#) s# Compiling with
where optimized has go :: (Word8 -> Bool) -> Addr# -> Int# -> State# RealWorld -> (# State# RealWorld, Int# #)
go k a# n# s# = case tagToEnum# (n# <=# 0#) :: Bool of
True -> (# s#, n# #)
False -> case readWord8OffAddr# a# 0# s# of
(# s#, w #) ->
case k (W8# w) of
True -> (# s#, n# #)
False -> go k (plusAddr# a# 1#) (n# -# 1#) s# Assuming I'm correct that the underlying issue here is the compiler won't box and unbox under the unboxed tuple, I wonder if it would be possible to extend the worker/wrapper system to do so. Possibly there may be more code like this that could benefit from this sort of optimization? |
Detailed writeup, thanks! I don't have a whole lot of time at the moment to dig into this. I'd really appreciate if anyone else can uncover anything and suggest specific solutions (better, PRs). |
I filled a bug in GHC Trac: https://ghc.haskell.org/trac/ghc/ticket/11688 |
I suspect the issue here is that main = do
chunk <- DB.hGetSome stdin 32768
let newline = c2w '\n'
let (prefix,suffix) = DB.break (eq newline) chunk
DB.hPut stdout prefix
DB.hPut stdout suffix
eq :: Eq a => a -> a -> Bool
eq = (==)
{-# INLINE [1] eq #-}
{-# RULES
"ByteString specialise break (eq x)" forall x.
DB.break (eq x) = DB.breakByte x
#-} Actually, even if one doesn't specify an explicit phase on |
Previously these were matching on (==), which was rewritten by the class op rule before the breakByte rule had an opportunity to fire (haskell#70). Unfortunately fixing this requires that we change the Eq instances provided by GHC. This has been done in GHC 8.0.1 (base-4.9.0).
Thanks @alexbiehl for opening the GHC ticket and @bgamari for figuring out the rewrite issue. This is just a quick note that, after writing this up and continuing to play with it, I eventually discovered GHC track ticket 2289 and friends (e.g., 1600, 2387, etc.) It seems the inability of the compiler to unbox multiple return values is a well know issue going back many years. It's particularly annoying in this case though as there is really only one return value. The other is just the IO state token which we know the compiler will eventually throw away anyway. Until someone comes up with a compiler fix I don't think there is anything that can be done other than maybe using continuation passing to turn returning into calling when possible. |
Previously these were matching on (==), which was rewritten by the class op rule before the breakByte rule had an opportunity to fire (haskell#70). Unfortunately fixing this requires that we change the Eq instances provided by GHC. This has been done in GHC 8.0.1 (base-4.9.0).
Previously these were matching on (==), which was rewritten by the class op rule before the breakByte rule had an opportunity to fire (haskell#70). Unfortunately fixing this requires that we change the Eq instances provided by GHC. This has been done in GHC 8.0.1 (base-4.9.0).
Previously these were matching on (==), which was rewritten by the class op rule before the breakByte rule had an opportunity to fire (haskell#70). Unfortunately fixing this requires that we change the Eq instances provided by GHC. This has been done in GHC 8.0.1 (base-4.9.0).
Previously these were matching on (==), which was rewritten by the class op rule before the breakByte rule had an opportunity to fire (haskell#70). Unfortunately fixing this requires that we change the Eq instances provided by GHC. This has been done in GHC 8.0.1 (base-4.9.0).
Was this issue fixed by #71 or is there more to do here? |
There were two issues here
I think we can close this ticket though as the first looks like it was addressed by #71 and the second is a well known compiler issue that has it own ticket open. |
The following simple program (sucks in stdin in chunks of 32K and splits on newlines) demonstrates (I believe) some issues with
break
routine on GHC 7.10.3, 7.8.4, 7.6.3, 7.4.2, 7.2.2, and 7.0.4.break (== c2w '\n')
intobreakByte (c2w '\n')
. This is confirmed by -ddump-simpl or replacingbreak
withbreakByte
and observing the performance difference.findIndexOrEnd
call in the non-breakByte
implementation. This is confirmed by -ddump-simpl or viewing the memory allocation stats on a run over a large amount of input.Note that the allocations from this boxing/unboxing can be quite significant. As an example, I have around 10.7GB of input data that my original code runs on. When split it on '\n', the system indicates the non-
breakByte
version does an additional 3GB of allocations over the 10.8GB used by thebyteBreak
version (which doesn't have the boxing/unboxing issues). Splitting on the more frequently occuring ':' pushes it up to 34GB of additional allocations (yes 34GB which is why I noticed it).For completeness, here is the -O2 -ddump-simpl -dsuppress-all -dno-suppress-idinfo -output of the key loop for GHC 7.10.3 (ByteString 0.10.6.0). You can see it has failed to use the
breakByte
implementation. It has also failed to eliminate the boxing of the offset thatfindIndexOrEnd
($wa1_s3lI) returns (I# ww5_s3lG or l_a1r7 = I# ww3_s3lR) in the non-breakByte
implementation`Thanks! -Tyson
PS: I don't see how this boxing/unboxing survived. Both I# ww5_s3lG and l_a1r7 = I# ww3_s3lR clearly cannot be bottom, and the $wa1_s3lI call site immediately turns around and unboxes the returned value. I also don't see why all the DmdType values are <L,U> as the functions clearly force many of their arguments to at least WHNF regardless of the path taken through them.
The text was updated successfully, but these errors were encountered: