-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Straighten folds and scans. #364
Conversation
I am going to keep this pull request updated so that fellow developers may be apprised of my progress. Feel free to comment! |
(Windows test is flaky, do not pay attention to it; I'll rerun) |
@Bodigrim I noticed something suspicious. To get a feel for the ball park my benchmarks should be in, I added a benchmark for strict and lazy By chance you have an explanation? |
Yes, it is a known quirk of benchmarks. See #345, #329, #23. Note that benchmarks have a form Lines 501 to 502 in d52d42d
bytestring/Data/ByteString/Lazy.hs Lines 447 to 448 in d52d42d
The application of the strict I think that real-world consequences of this are not as severe as one may expect looking at benchmarks: I expect that in the majority of client code |
364fb3c
to
280b2ea
Compare
Curious, I would never think some inlining could give a tenfold performance boost. @Bodigrim I added some strictness checks and it looks to me as though the functions I defined behave the way their names suggest. (It is all quite intricate so a second look would be good.) However, I found out that I do not really know if it is appropriate to call them right folds. A lazy byte string is essentially a list of lists, and there is not really a text book definition for a left or right fold over that sort of thing… Does this look like a right fold to you?
I would like to make sure this piece of code is up to our quality standards before moving on to other functions, so your review would be appreciated! |
e3a08d8
to
955990f
Compare
We have comments like this:
I suppose it would be good if I can claim something similar for the functions I add, like How do we know that a given function will fuse? What does this even mean? What are the laws? Where is it documented and checked? |
I think this is a good place to take a break and get things merged. Property checks tell us that the behaviour of the new functions is identical to their strict analogues. |
Thanks for noticing this. The thing is the comment is 15 years old. 13 years ago fusion framework in |
I opened an issue #374 to track this. |
d7097a4
to
b3ac740
Compare
Data/ByteString/Lazy.hs
Outdated
-- ^ input of length n | ||
-> ByteString | ||
-- ^ output of length n+1 | ||
scanr f z = pack . fmap (foldr f z) . tails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unfortunate, because there is no sharing between tails
, so you end up folding each tail with f
independently, O(n^2) operations in total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would like to do something about it. I need to meditate on this. Can I maybe reverse the thing and then fold it from the start? It is forced all along anyway.
I am not sure how to think about this sort of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not gave it much thought, but maybe we can start from a high-level picture? Let's write lazy
mapAccumLChunks :: (acc -> Strict.ByteString -> (acc, Strict.ByteString)) -> acc -> Lazy.ByteString -> (acc, ByteString)
mapAccumRChunks :: (acc -> Strict.ByteString -> (acc, Strict.ByteString)) -> acc -> Lazy.ByteString -> (acc, ByteString)
Then reuse them in definitions of mapAccum{L,R}
, and scan{l,r}
as well (scans
are just a special case of mapAccum
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that, since we output a byte string about as long as the input, time and space complexity is bounded below by O(n). This means to my mind that reversing and then using scanl
is as efficient as it gets. Is there anything I get wrong? What is the mark I am aiming at?
Also, should I write some bench marks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do a bit better than reversing the entire input. I think we can do one pass to reverse the order of the chunks and determine the total length. Then do a second pass over the reversed sequence of chunks and write the output.
Benchmarks would be useful to check that we're roughly in the right ballpark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I get it like this:
- We agree that linear time and space are lower bounds.
- We still want to avoid some expensive operations, like reversing all chunks.
Sounds about right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sound right to me! :)
I ran some bench marks and performance is orders of magnitude behind expectations. I am now angling to follow the advice above and write everything in terms of |
812eb96
to
ea54f76
Compare
How about this? Turns out we can use standard recursion schemes. I am not sure if there would be some performance drop on older compilers but I did not notice any in my usual setting. |
6369ef9
to
890df8c
Compare
Sorry I disappeared. I had been trying to add checks for other functions and it was hard. So I took a month long creative break. It was fun and now I am back with new ideas. I am going to do this:
So, our checks are going to check themselves, and also verify that our definitions of functions with different strictness are truly different. |
@kindaro I'm kind of wary of large, long-running PRs and shifting goal-posts. PRs like this tend to attract merge conflicts and ultimately consume a disproportionate amount of developers' and reviewers' resources. Are there parts in this PR that could be merged already? Even if some parts are not "fully" ready, we could merge them and possibly consider not exposing them in the next release if we think they might not be safe for users yet. |
Yes, we can throw out automated strictness checks and trust our own judgement with regard to whether the functions in question have the right strictness properties. Then we can merge tomorrow. However, the way I see it, the whole point of automated strictness checks is to reduce the involvement of reviewers. So the optimizations for merging sooner and for spending less reviewers' resources are actually conflicting. (I am fine with spending more developers' resources since I am the only developer and I optimize for quality.) We can also merge the library code tomorrow, then merge the checks when they are ready and quickly fix the library code if any faults turn up. Whatever the maintainers say I do. |
I'm happy to spend additional reviewers' resources here, whenever you mark this PR as ready for review. While I deeply appreciate your efforts on automatic strictness checks, it feels like their volume and dependency footprint warrants a separate PR (and potentially a separate package). |
890df8c
to
d0d708d
Compare
Yo, I took away the strictness checks and responded to the pending comments from previous reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Couple minor suggestions, but looks good overall.
20292f4
to
5fa8aed
Compare
L.take (L.length xs + 1) (L.scanl (+) 0 (explosiveTail (xs <> L.singleton 1))) === (L.pack . fmap (L.foldr (+) 0) . L.inits) xs | ||
, testProperty "scanl1 is lazy" $ \ xs -> L.length xs > 0 ==> | ||
L.take (L.length xs) (L.scanl1 (+) (explosiveTail (xs <> L.singleton 1))) === (L.pack . fmap (L.foldr1 (+)) . tail . L.inits) xs | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why scanr
is less lazy than scanl
. The thing is that its output starts from an accumulator, and Data.ByteString.mapAccumR
is too strict in this respect.
Lines 734 to 743 in 05a09c3
go src dst = mapAccumR_ acc (len-1) | |
where | |
mapAccumR_ !s (-1) = return s | |
mapAccumR_ !s !n = do | |
x <- peekByteOff src n | |
let (s', y) = f s x | |
pokeByteOff dst n y | |
mapAccumR_ s' (n-1) | |
acc' <- unsafeWithForeignPtr gp (go a) | |
return (acc', BS gp len) |
I think this is fine: there are no particular expectations about strictness of scanr
(there is no scanr'
in Prelude).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. Is there a specific proposition you are reasoning towards? Or a question I may answer? For example:
Proposition Data.ByteString.Lazy.scanr
cannot be lazy.
Proof
As you noted, the output of a scanr
starts from the end, so this is the sort of laziness we can have:
λ take 2 . reverse $ Prelude.scanr (+) 0 [undefined, 1]
[0,1]
So, first the spine of the input list is evaluated to the end, then elements are evaluated from the end backwards. (Whether the accumulator is evaluated before or after the first element depends on the order of evaluation of +
.) Similarly, the byte stream's spine would have to be evaluated first. But the spine of the byte stream is strict in the leaf:
bytestring/Data/ByteString/Lazy/Internal.hs
Lines 74 to 85 in 05a09c3
-- | A space-efficient representation of a 'Word8' vector, supporting many | |
-- efficient operations. | |
-- | |
-- A lazy 'ByteString' contains 8-bit bytes, or by using the operations | |
-- from "Data.ByteString.Lazy.Char8" it can be interpreted as containing | |
-- 8-bit characters. | |
-- | |
data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString | |
deriving (Typeable, TH.Lift) | |
-- See 'invariant' function later in this module for internal invariants. | |
The leaf itself is a byte array and therefore also strict throughout. So, once we force the spine, every byte is also forced. There is no lazy scanr
for byte streams. ∎
Something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just a remark for myself and @sjakobi and anyone else who is puzzled why we have laziness properties for scanl
, but not for scanr
.
It's not like you cannot make Data.ByteString.Lazy.scanr
a bit lazier. E. g., for the proposed implementation
> Data.ByteString.Lazy.head $ Data.ByteString.Lazy.scanr const 42 ("foo" <> undefined)
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
undefined, called at <interactive>:11:75 in interactive:Ghci1
However, if we are ready to sacrifice performance, one can define
scanr f z bs = cons hd tl
where
(_, tl) = mapAccumR (\x y -> (f y x, x)) z bs
(hd, _) = List.mapAccumR (\x y -> (f y x, x)) z (unpack bs)
for which
> Data.ByteString.Lazy.head $ Data.ByteString.Lazy.scanr const 42 ("foo" <> undefined)
102
You can define an even lazier (and slower) version, capable to return first few chunks of bytestring, as long as f
is very lazy (e. g., f = const
).
My point is that this is a rare use case, which does not justify performance sacrifices, especially given that there is no general expectation how lazy scanr
should be. I'm fine with your implementation, no action required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see that this PR is close to being merged now! :)
Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.
5e785a3
to
93df278
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @kindaro! :)
Great stuff, @kindaro! Thanks! |
* Add strict right folds. * Add property checks. * Add benchmarks. * Inline strictness checks. * Straighten scans. * Fix whitespace. * Use `===` for equality. * Use infix operator for brevity. * Add bench marks for lazy scans. * Use standard recursion schemes. * Dodge import conflicts on older GHC versions. * Final considerations according to the last review. * Final considerations according to one more last review. * Add bench mark for lazy accumulating maps. * Throw away `mapAccum[LR]Chunks`. Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.
* Add strict right folds. * Add property checks. * Add benchmarks. * Inline strictness checks. * Straighten scans. * Fix whitespace. * Use `===` for equality. * Use infix operator for brevity. * Add bench marks for lazy scans. * Use standard recursion schemes. * Dodge import conflicts on older GHC versions. * Final considerations according to the last review. * Final considerations according to one more last review. * Add bench mark for lazy accumulating maps. * Throw away `mapAccum[LR]Chunks`. Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.
Resolve #373.
Resolve #372.
foldr'
foldr1'
scanl1
scanr
scanr1
Systematic strictness checks.