Add lazy dropEnd and friends #395

3kyro · 2021-05-26T09:10:05Z

Fixes #306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.

Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact.

Bodigrim

Thanks! I have a couple of suggestions.

Data/ByteString/Lazy.hs

3kyro · 2021-05-27T14:29:05Z

Thanks for the help!

I used foldrChunks instead of length based checks, but I still have some concerns/questions: Will the tuple based folds work when used in a streaming manner, as discussed in a comment in the original issue? e.g. would take x . dropEnd y operate in constant space?

Would explicitly implementing the folds help? eg;

dropEnd i cs0 = cs 
  where 
    (_,cs) = go i cs0
    go :: Int64 -> ByteString -> (Int64, ByteString)
    -- n keeps track of dropped bytes    
    go n cs = ...

Hope this makes sense 😄

Bodigrim · 2021-05-27T19:49:52Z

Sorry, @3kyro, I'm on the go and won't have time to review until two weeks from now.

Will the tuple based folds work when used in a streaming manner, as discussed in a comment in the original issue?

Good question. Could you possibly check it in REPL with something like Chunk "foo" (Chunk "bar" undefined)? Bonus points for adding tests, you can find some inspiration here.

3kyro · 2021-05-28T06:44:05Z

No problem. I'll check lazy behavior and add some checks as well as fixing ghc-8,2 CI failures.

3kyro · 2021-06-07T08:51:21Z

Hi @Bodigrim, I'm not really sure I can create a version of these end manipulation functions that is sufficiently lazy. Based on the discussion in #306, we evaluate all chunks when doing (take x . dropEnd y) cs. This seems necessary as cs needs to be fully evaluated to know if there's anything left to take.

Regarding style, I'm not sure if you'd prefer a more explicit implementation, similar to other functions in the module (drop, take etc). dropEnd would be something like:

dropEnd :: Int64 -> ByteString -> ByteString
dropEnd i p | i <= 0 = p
dropEnd i cs0 = cs0' 
  where (_ , cs0') = dropEnd' (fromIntegral i) cs0
        dropEnd' n cs | n <= 0 = (0, cs)
        dropEnd' n Empty = (n, Empty)
        dropEnd' n (Chunk c Empty) = 
            (n - S.length c, S.fromStrict $ S.dropEnd n c)
        dropEnd' n (Chunk c cs) = 
            let (n',cs') = dropEnd' n cs
              in (n' - S.length c, S.fromStrict (S.dropEnd n' c) `append` cs')

Bodigrim

Sorry, @3kyro, I got distracted by other things for longer than planned. Hope it was not too long.

Data/ByteString/Lazy.hs

- Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test

Data/ByteString/Lazy.hs

Bodigrim · 2021-07-01T18:07:53Z

@3kyro I'm afraid this is not quite correct, as you can see from CI. Try

cabal test --test-options '-p dropEnd --quickcheck-tests 1000'

3kyro · 2021-07-14T14:02:51Z

Hi @Bodigrim, sorry for the delay but I was away for some days.

I have corrected the behavior of dropEnd, trying not to reverse chunk lists too much. I believe I'm still within O(n) but I would really appreciate your feedback. Having a FIFO container would make the code much simpler, but I guess importing containers is out of the question.

sjakobi · 2021-07-14T18:44:43Z

Having a FIFO container would make the code much simpler, but I guess importing containers is out of the question.

AFAIK adding a dependency on containers (or maybe alternatively array) would probably be okay. We'd have to check how it affects GHC build times though.

Bodigrim · 2021-07-14T18:49:34Z

I think this PR does not justify an additional dependency, even containers. We do not really need Data.Sequence here, only a simple stack, which can be implemented in a small utility module (as two lists). But I guess reversing is not prohibitively expensive.

Bodigrim

@3kyro, nice! I think we are almost there, please take a look at suggestions.

Bodigrim · 2021-07-17T15:50:38Z

Data/ByteString/Lazy.hs

+dropEnd :: Int64 -> ByteString -> ByteString
+dropEnd i p | i <= 0 = p
+dropEnd i p = go [] [] 0 0 p
+  where go hss tss acc h (Chunk c cs)


Could you please add a comment, explaining meaning of arguments? AFAIU hss and tss model a bidirectional queue (aka deque), acc tracks total length of hss + tss, and h tracks the length of head hss.

Computing length of head hss is O(1) operation, so I think we can simplify a bit by not passing h argument.

I have removed the need for the head of hss as its length is already included in the accumulated length. Furthermore, in a Deque, calculating the length of itshead might trigger a re-balancing which I think would not be ideal (e.g. it would always re balance on the second chunk). This seems a bit cleaner, although we might trigger getOutput a bit more often than before.

Bodigrim · 2021-07-17T15:55:12Z

Data/ByteString/Lazy.hs

+        len c = fromIntegral (S.length c)
+        hLen cs = maybe 0 len (listToMaybe cs)
+
+        getOutput out [] [] acc = (out, [], [], acc)


Again, it would be nice to document meaning of arguments and return values.

What I can suggest is creating data Deque = Deque { hss :: [S.ByteString], tss :: [S.ByteString], acc :: Int64 } (bonus points for better fields names). Then go becomes go :: Deque -> ByteString -> ByteString, and getOutput :: [S.ByteString] -> Deque -> ([S.ByteString], Deque). This reduces cognitive load for a reader.

I believe now with the use of the Deque, arguments are quite easy to follow. I'd happily add comments though if necessary

Bodigrim · 2021-07-17T16:00:09Z

Data/ByteString/Lazy.hs

+        getOutput out [] [] acc = (out, [], [], acc)
+        getOutput out [] bss acc = getOutput out (L.reverse bss) [] acc
+        getOutput out (x:xs) bss acc =
+            if len x <= acc - i - len x


Looks suspicious, there are len x on both sides. Wouldn't len x <= acc - i be enough?

Yes, its redundant

3kyro · 2021-07-19T13:35:59Z

Hi @Bodigrim, I've added an unexposed Data.Bytestring.Lazy.Internal.Deque module and modified dropEnd accordingly. I have not prepared tests for Deque's functions as the module only tests the exposed API.

`dropEnd` now uses `Deque` for handling the accumulated bytestrigs

Bodigrim

Almost there!

Bodigrim · 2021-07-19T20:31:41Z

Data/ByteString/Lazy/Internal/Deque.hs

+-- O(1) , occasionally O(n)
+popFront :: Deque -> (Maybe S.ByteString, Deque)
+popFront (Deque [] [] _) = (Nothing, empty)
+popFront (Deque [] rs acc) = popFront (Deque (reverse rs) [] acc)


Such pattern is appealing, but GHC is likely to conclude that there is an unbounded general recursion going on and will resist to inline and optimize and such. I'd rewrite it as

popFront (Deque [] rs acc) = case reverse rs of [] -> ... x : xs -> ...

Bodigrim · 2021-07-19T20:48:31Z

Data/ByteString/Lazy/Internal/Deque.hs

+-- Pop a `S.ByteString` from the front of the `Deque`
+-- Returns the bytestring, or Nothing if the Deque is empty, and the updated Deque
+-- O(1) , occasionally O(n)
+popFront :: Deque -> (Maybe S.ByteString, Deque)


Suggested change

popFront :: Deque -> (Maybe S.ByteString, Deque)

popFront :: Deque -> Maybe (S.ByteString, Deque)

This is a bit more precise: if it's Nothing, there is little point to return a "new" Deque.

Bodigrim · 2021-07-19T20:50:02Z

Data/ByteString/Lazy.hs

+                       else (out, deque)
+
+        -- drop n elements from the rear of the accumulating `deque`
+        dropElements deque n


Could you please add type signatures for go, getOutput, dropElements and fromDeque?

Bodigrim · 2021-07-19T20:52:33Z

Data/ByteString/Lazy.hs

+            case D.popFront deque of
+                 (Nothing, deque') -> (out, deque')
+                 (Just x, deque')  ->
+                    if len x <= D.elemLength deque' - i


deque' is deque without x, so I'd expect just if D.elemLength deque' >= i then ...

Bodigrim · 2021-07-19T20:53:43Z

Data/ByteString/Lazy.hs

+
+        -- drop n elements from the rear of the accumulating `deque`
+        dropElements deque n
+            | D.null deque = Empty


I think you do not need to handle this case specially: if deque is empty, then D.popRear returns Nothing and Empty naturally arises below.

Plus all other review suggestions

3kyro · 2021-07-20T07:52:28Z

Latest review changes added

bytestring.cabal

Data/ByteString/Lazy.hs

Data/ByteString/Lazy/Internal/Deque.hs

sjakobi · 2021-07-20T11:42:24Z

Data/ByteString/Lazy.hs

+
+        -- get all `S.ByteString` from the front of the accumulating deque
+        -- for which we know they won't be dropped
+        getOutput :: [S.ByteString] -> D.Deque -> ([S.ByteString], D.Deque)


Instead of operating on lists of strict bytestrings, it seems convenient to simply use the lazy bytestring type. Is there a reason not to do that here?

yes, seems nicer. The only problem is that there is no lazy left fold for chunks, and so we need to first reverse chunks and fold from the right. Might be even better to pay traveral than have the space leaked by the foldl?

sjakobi · 2021-07-20T11:45:25Z

Data/ByteString/Lazy.hs

+            | D.elemLength deque < i = go (D.snoc c deque) cs
+            | otherwise              =
+                  let (output, deque') = getOutput [] (D.snoc c deque)
+                    in L.foldl (flip chunk) (go deque' cs) output


What's the reason for using a lazy foldl here instead of foldl'?

The lazy fold here is necessary so that we don't force go deque cs`. However this has now been changed (see above review comment)

sjakobi

Good progress. I'll probably be AFK until Saturday. I can review more then.

Data/ByteString/Lazy.hs

3kyro · 2021-07-21T08:06:18Z

Good progress. I'll probably be AFK until Saturday. I can review more then.

Hi, I'm sorry, I started replying before pushing the PR. Thanks for the review

edit: I've rebased as my previous commits were not clean. There is an additional change included, I'm returning a Deque from dropEndBytes instead of directly returning a ByteString as I believe makes the intent a bit clearer.

Sorry for the confusion

sjakobi

A few more comments.

Since the implementation has turned out to be so complex, I'm relying on the tests to catch any bugs.

Data/ByteString/Lazy.hs

sjakobi · 2021-07-26T10:42:30Z

Data/ByteString/Lazy.hs

+-- | Returns the longest (possibly empty) suffix of elements
+-- satisfying the predicate.
+--
+-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@.


An example would be nice to have here.

I've used Chunk (pack [1,2]) (Chunk (pack [3,4,6])) Empty as an example of a lazy bytestring. Hope it's not too verbose, but textual representation of a bytestring is always tricky.

I don't think that would be helpful for users who will mostly not be aware of ByteString's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists syntax, e.g. [1,2,3,4,6].

Data/ByteString/Lazy.hs

Data/ByteString/Lazy/Internal/Deque.hs

Co-authored-by: Simon Jakobi <[email protected]>

sjakobi

One more wibble – LGTM apart from that. :)

sjakobi · 2021-07-27T13:49:31Z

Data/ByteString/Lazy.hs

+-- | Returns the longest (possibly empty) suffix of elements
+-- satisfying the predicate.
+--
+-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@.


I don't think that would be helpful for users who will mostly not be aware of ByteString's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists syntax, e.g. [1,2,3,4,6].

Bodigrim · 2021-07-27T18:04:29Z

@3kyro please check CI failure.

3kyro · 2021-07-27T19:59:56Z

One more wibble – LGTM apart from that. :)

Hi @sjakobi, I've updated the examples using OverloadedList, however I still believe that we could have mentioned Chunk and Empty; for reference, "learn yourself haskell for real good" which is a pretty well known introduction to haskell explicitly mentions the lazy bytestrings constructors, eg in input and output.

Thanks again for the review :)

sjakobi

Good job!

Bodigrim · 2021-07-28T18:28:59Z

Thanks @3kyro!

* Add lazy dropEnd and friends Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <[email protected]> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <[email protected]>

* Add lazy dropEnd and friends Fixes haskell#306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <[email protected]> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <[email protected]>

Add lazy dropEnd and friends

d2c8c32

Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact.

Bodigrim reviewed May 26, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Fix since version number

9421dcd

Use foldrChunks instead of length based checks

891f82f

Bodigrim reviewed Jun 14, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

3kyro added 2 commits June 16, 2021 11:01

Add review changes

c0607a3

- Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test

Fix breakEnd and spanEnd lazyness

ad00a22

Bodigrim reviewed Jun 17, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

3kyro added 2 commits June 21, 2021 10:34

Make dropEnd lazier

a198e80

Formatting

cd822ae

Fix lazy dropEnd

1ae4ab7

Bodigrim reviewed Jul 17, 2021

View reviewed changes

3kyro added 2 commits July 19, 2021 16:37

Add Deque module

b688a25

`dropEnd` now uses `Deque` for handling the accumulated bytestrigs

Normalize function names

413d3f1

Bodigrim reviewed Jul 19, 2021

View reviewed changes

Bodigrim requested a review from sjakobi July 19, 2021 20:55

Return Maybe (S.ByteString, Deque) from pops

c48d764

Plus all other review suggestions

sjakobi reviewed Jul 20, 2021

View reviewed changes

sjakobi reviewed Jul 21, 2021

View reviewed changes

Data/ByteString/Lazy.hs Show resolved Hide resolved

3kyro added 2 commits July 21, 2021 11:15

Add review changes

909e02b

Rename dropElements to dropEndBytes

4240db9

sjakobi reviewed Jul 26, 2021

View reviewed changes

3kyro and others added 2 commits July 27, 2021 15:32

Add examples + style fixes

eedf0a2

Update Data/ByteString/Lazy/Internal/Deque.hs

7dc6214

Co-authored-by: Simon Jakobi <[email protected]>

sjakobi reviewed Jul 27, 2021

View reviewed changes

3kyro added 2 commits July 27, 2021 21:53

Replace elemLength with byteLength

6911faf

Upadate examples

af2f32e

Bodigrim approved these changes Jul 27, 2021

View reviewed changes

sjakobi approved these changes Jul 28, 2021

View reviewed changes

Bodigrim added this to the 0.11.2.0 milestone Jul 28, 2021

Bodigrim merged commit bb7540a into haskell:master Jul 28, 2021

	popFront :: Deque -> (Maybe S.ByteString, Deque)
	popFront :: Deque -> Maybe (S.ByteString, Deque)

Add lazy dropEnd and friends #395

Add lazy dropEnd and friends #395

Conversation

3kyro commented May 26, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

3kyro commented May 27, 2021 • edited Loading

Bodigrim commented May 27, 2021

3kyro commented May 28, 2021

3kyro commented Jun 7, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

Bodigrim commented Jul 1, 2021

3kyro commented Jul 14, 2021

sjakobi commented Jul 14, 2021

Bodigrim commented Jul 14, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

3kyro commented Jul 19, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

3kyro commented Jul 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

3kyro commented Jul 21, 2021 • edited Loading

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Jul 27, 2021

3kyro commented Jul 27, 2021

sjakobi left a comment

Choose a reason for hiding this comment

Bodigrim commented Jul 28, 2021

3kyro commented May 27, 2021 •

edited

Loading

Bodigrim Jul 17, 2021 •

edited

Loading

3kyro commented Jul 21, 2021 •

edited

Loading