-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lazy dropEnd and friends #395
Conversation
Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I have a couple of suggestions.
Thanks for the help! I used Would explicitly implementing the folds help? eg; dropEnd i cs0 = cs
where
(_,cs) = go i cs0
go :: Int64 -> ByteString -> (Int64, ByteString)
-- n keeps track of dropped bytes
go n cs = ... Hope this makes sense 😄 |
Sorry, @3kyro, I'm on the go and won't have time to review until two weeks from now.
Good question. Could you possibly check it in REPL with something like |
No problem. I'll check lazy behavior and add some checks as well as fixing ghc-8,2 CI failures. |
Hi @Bodigrim, I'm not really sure I can create a version of these end manipulation functions that is sufficiently lazy. Based on the discussion in #306, we evaluate all chunks when doing Regarding style, I'm not sure if you'd prefer a more explicit implementation, similar to other functions in the module ( dropEnd :: Int64 -> ByteString -> ByteString
dropEnd i p | i <= 0 = p
dropEnd i cs0 = cs0'
where (_ , cs0') = dropEnd' (fromIntegral i) cs0
dropEnd' n cs | n <= 0 = (0, cs)
dropEnd' n Empty = (n, Empty)
dropEnd' n (Chunk c Empty) =
(n - S.length c, S.fromStrict $ S.dropEnd n c)
dropEnd' n (Chunk c cs) =
let (n',cs') = dropEnd' n cs
in (n' - S.length c, S.fromStrict (S.dropEnd n' c) `append` cs') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, @3kyro, I got distracted by other things for longer than planned. Hope it was not too long.
- Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test
@3kyro I'm afraid this is not quite correct, as you can see from CI. Try
|
Hi @Bodigrim, sorry for the delay but I was away for some days. I have corrected the behavior of |
AFAIK adding a dependency on |
I think this PR does not justify an additional dependency, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@3kyro, nice! I think we are almost there, please take a look at suggestions.
Data/ByteString/Lazy.hs
Outdated
dropEnd :: Int64 -> ByteString -> ByteString | ||
dropEnd i p | i <= 0 = p | ||
dropEnd i p = go [] [] 0 0 p | ||
where go hss tss acc h (Chunk c cs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment, explaining meaning of arguments? AFAIU hss
and tss
model a bidirectional queue (aka deque), acc
tracks total length of hss
+ tss
, and h
tracks the length of head hss
.
Computing length of head hss
is O(1) operation, so I think we can simplify a bit by not passing h
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the need for the head
of hss
as its length is already included in the accumulated length. Furthermore, in a Deque, calculating the length of itshead
might trigger a re-balancing which I think would not be ideal (e.g. it would always re balance on the second chunk). This seems a bit cleaner, although we might trigger getOutput
a bit more often than before.
Data/ByteString/Lazy.hs
Outdated
len c = fromIntegral (S.length c) | ||
hLen cs = maybe 0 len (listToMaybe cs) | ||
|
||
getOutput out [] [] acc = (out, [], [], acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it would be nice to document meaning of arguments and return values.
What I can suggest is creating data Deque = Deque { hss :: [S.ByteString], tss :: [S.ByteString], acc :: Int64 }
(bonus points for better fields names). Then go
becomes go :: Deque -> ByteString -> ByteString
, and getOutput :: [S.ByteString] -> Deque -> ([S.ByteString], Deque)
. This reduces cognitive load for a reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe now with the use of the Deque, arguments are quite easy to follow. I'd happily add comments though if necessary
Data/ByteString/Lazy.hs
Outdated
getOutput out [] [] acc = (out, [], [], acc) | ||
getOutput out [] bss acc = getOutput out (L.reverse bss) [] acc | ||
getOutput out (x:xs) bss acc = | ||
if len x <= acc - i - len x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks suspicious, there are len x
on both sides. Wouldn't len x <= acc - i
be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its redundant
Hi @Bodigrim, I've added an unexposed |
`dropEnd` now uses `Deque` for handling the accumulated bytestrigs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
-- O(1) , occasionally O(n) | ||
popFront :: Deque -> (Maybe S.ByteString, Deque) | ||
popFront (Deque [] [] _) = (Nothing, empty) | ||
popFront (Deque [] rs acc) = popFront (Deque (reverse rs) [] acc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such pattern is appealing, but GHC is likely to conclude that there is an unbounded general recursion going on and will resist to inline and optimize and such. I'd rewrite it as
popFront (Deque [] rs acc) = case reverse rs of
[] -> ...
x : xs -> ...
-- Pop a `S.ByteString` from the front of the `Deque` | ||
-- Returns the bytestring, or Nothing if the Deque is empty, and the updated Deque | ||
-- O(1) , occasionally O(n) | ||
popFront :: Deque -> (Maybe S.ByteString, Deque) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
popFront :: Deque -> (Maybe S.ByteString, Deque) | |
popFront :: Deque -> Maybe (S.ByteString, Deque) |
This is a bit more precise: if it's Nothing
, there is little point to return a "new" Deque
.
Data/ByteString/Lazy.hs
Outdated
else (out, deque) | ||
|
||
-- drop n elements from the rear of the accumulating `deque` | ||
dropElements deque n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add type signatures for go
, getOutput
, dropElements
and fromDeque
?
Data/ByteString/Lazy.hs
Outdated
case D.popFront deque of | ||
(Nothing, deque') -> (out, deque') | ||
(Just x, deque') -> | ||
if len x <= D.elemLength deque' - i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deque'
is deque
without x
, so I'd expect just if D.elemLength deque' >= i then ...
Data/ByteString/Lazy.hs
Outdated
|
||
-- drop n elements from the rear of the accumulating `deque` | ||
dropElements deque n | ||
| D.null deque = Empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you do not need to handle this case specially: if deque
is empty, then D.popRear
returns Nothing
and Empty
naturally arises below.
Plus all other review suggestions
Latest review changes added |
Data/ByteString/Lazy.hs
Outdated
|
||
-- get all `S.ByteString` from the front of the accumulating deque | ||
-- for which we know they won't be dropped | ||
getOutput :: [S.ByteString] -> D.Deque -> ([S.ByteString], D.Deque) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of operating on lists of strict bytestrings, it seems convenient to simply use the lazy bytestring type. Is there a reason not to do that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, seems nicer. The only problem is that there is no lazy left fold for chunks, and so we need to first reverse chunks and fold from the right. Might be even better to pay traveral than have the space leaked by the foldl?
Data/ByteString/Lazy.hs
Outdated
| D.elemLength deque < i = go (D.snoc c deque) cs | ||
| otherwise = | ||
let (output, deque') = getOutput [] (D.snoc c deque) | ||
in L.foldl (flip chunk) (go deque' cs) output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for using a lazy foldl
here instead of foldl'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lazy fold here is necessary so that we don't force go deque
cs`. However this has now been changed (see above review comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. I'll probably be AFK until Saturday. I can review more then.
Hi, I'm sorry, I started replying before pushing the PR. Thanks for the review edit: I've rebased as my previous commits were not clean. There is an additional change included, I'm returning a Sorry for the confusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
Since the implementation has turned out to be so complex, I'm relying on the tests to catch any bugs.
-- | Returns the longest (possibly empty) suffix of elements | ||
-- satisfying the predicate. | ||
-- | ||
-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example would be nice to have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used Chunk (pack [1,2]) (Chunk (pack [3,4,6])) Empty
as an example of a lazy bytestring. Hope it's not too verbose, but textual representation of a bytestring is always tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be helpful for users who will mostly not be aware of ByteString
's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists
syntax, e.g. [1,2,3,4,6]
.
Co-authored-by: Simon Jakobi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more wibble – LGTM apart from that. :)
-- | Returns the longest (possibly empty) suffix of elements | ||
-- satisfying the predicate. | ||
-- | ||
-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that would be helpful for users who will mostly not be aware of ByteString
's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists
syntax, e.g. [1,2,3,4,6]
.
@3kyro please check CI failure. |
Hi @sjakobi, I've updated the examples using Thanks again for the review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
Thanks @3kyro! |
* Add lazy dropEnd and friends Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <[email protected]> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <[email protected]>
* Add lazy dropEnd and friends Fixes haskell#306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <[email protected]> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <[email protected]>
Fixes #306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.