Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Bodigrim · 2023-05-10T23:00:41Z

Currently base does not offer any ergonomic total replacement for Data.List.init :: [a] -> [a] and Data.List.last :: [a] -> a. There is also no efficient way to compute init and tail simultaneously, which is a very common task. Similar issues with head and tail are resolved by uncons :: [a] -> Maybe (a, [a]), so let's introduce its dual.

unsnoc :: [a] -> Maybe ([a], a)
unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing

> unsnoc []
Nothing
> unsnoc [1]
Just ([],1)
> unsnoc [1,2,3]
Just ([1,2],3)

The complete MR is available at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs

Implementation

It's important to use a lazy pattern match ~(a, b) in the definition of unsnoc above. Otherwise the function becomes non-productive on infinite lists and (probably more importantly) is prone to stack overflow. Compare against a hypothetical stricter version, without ~:

unsnoc' :: [a] -> Maybe ([a], a)
unsnoc' = foldr (\x -> Just . maybe ([], x) (\(a, b) -> (x : a, b))) Nothing

It's easier to spot the difference running ghci with artificially limited stack size:

$ ghci +RTS -K64K
> unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing
> unsnoc' = foldr (\x -> Just . maybe ([], x) (\(a, b) -> (x : a, b))) Nothing
> snd <$> unsnoc [1..10000]
Just 10000
> snd <$> unsnoc' [1..10000]
Just *** Exception: stack overflow
> head . fst <$> unsnoc [1..10000]
Just 1
> head . fst <$> unsnoc' [1..10000]
Just *** Exception: stack overflow

Here is another test to distinguish unsnoc and unsnoc':

> head . fst <$> unsnoc (1 : 2 : undefined)
Just 1
> head . fst <$> unsnoc' (1 : 2 : undefined)
Just *** Exception: Prelude.undefined

The difference between unsnoc and unsnoc' is similar to foldr vs. foldr'. See also #133, comparingbreak to break'.

The laziness of the proposed unsnoc precisely matches the behaviour of a naive, inefficient implementation via init / last:

naiveUnsnoc :: [a] -> Maybe ([a], a)
naiveUnsnoc [] = Nothing 
naiveUnsnoc xs = Just (init xs, last xs)

Impact assessment

Adding a new entity is not a breaking change according to PVP. GHC emits a -Wcompat warning for unqualified import of Data.List. Still some packages might dismiss the warning (or never enable it), thus there is a possibility of name clashes. I've built clc-stackage against GHC 9.4 with the proposed patch and prepared PRs to all affected packages:

Prior art

22 packages, including Cabal-syntax, filepath, unix, maintain their own implementations of unsnoc. I believe it is time to offer it from Data.List.

The text was updated successfully, but these errors were encountered:

mixphix · 2023-05-11T02:04:45Z

Heartily in favour.

chshersh · 2023-05-11T18:23:34Z

Sounds like an excellent function to be added 👍🏻

If this function is so popular, it makes sense to standardize it and provide a properly-lazy implementation (not sure that all 22 packages implement it correctly).

I trust you to write great documentation with examples and clear complexity explanations, so I don't worry about that 🙂

treeowl · 2023-05-11T18:42:49Z

I don't think this function is very practical, because it's rather slow and is only lazy on one side. But I think it may still have a place for "roughing out" code before choosing a sequence type, and for introducing that as a general sequence operation. FWIW, I think we should consider adding nubOn for the same reason—efficient nub variants in various libraries tend to have On forms and not By forms.

mixphix · 2023-05-11T18:44:20Z

A search for the more general ^unsnoc :: \[ yields 28 packages. Quickly perusing the various implementations, some were identical, and some seemed quite outlandish. I think this indicates that a standardized version with optimal laziness properties would be a good addition to base.

hasufell · 2023-05-12T02:50:41Z

+1

tomjaguarpaw · 2023-05-12T11:56:16Z

I don't think this function is very practical

I agree with this. However, it is more practical (due to being total) than init or last, which it is designed to be a replacement for.

because it's rather slow and is only lazy on one side.

I'm not sure what this means.

treeowl · 2023-05-12T14:30:25Z

I don't think this function is very practical

I agree with this. However, it is more practical (due to being total) than init or last, which it is designed to be a replacement for.

because it's rather slow and is only lazy on one side.

I'm not sure what this means.

Suppose I write

oof xs = case unsnoc xs of
  Nothing -> 12
  Just (ys, y)
    | y == 0 = sum ys
    | otherwise = 15

This will build some awful structure in memory and ultimately be even worse for performance than using init and last.

More subtly, how will this perform?

goof xs = case unsnoc xs of
  Nothing -> 17
  Just (ys, y) -> sum ys * y

That's up to whether the compiler decides to force sum ys first (which will be okay) or y (which will be pretty wretched).

tomjaguarpaw · 2023-05-12T15:37:52Z

Yes, I think the function ought to come with the caveat "make sure you finish using the first component of the result before you start using the second component". Accessing the init or last of a list really isn't the sort of thing one ought to be doing anyway, so I think this caveat is natural.

On the other hand I still don't understand what the "awful structure" is. Can you elaborate? I don't actually fully understand Bodigrim's implementation. I can never really grok foldrs where the accumulator is a function. However this simple version seems fine

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc (x:xs) = Just (unsnoc' x xs)

unsnoc' :: a -> [a] -> ([a], a)
unsnoc' x [] = ([], x)
unsnoc' x (x':xs) =
  let u = unsnoc' x' xs
  in (x : fst u, snd u)

If we access the second component before the first then the first will end up looking like

x1 : fst (x2 : fst (x3 : ..., xn), xn), xn)

which is unpleasant, and I'm open to persuasion that it's awful, but I don't see why yet. On the other hand, the alternative of

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc (x:xs) = Just (unsnoc' x xs)

unsnoc' :: a -> [a] -> ([a], a)
unsnoc' x [] = ([], x)
unsnoc' x (x':xs) =
  let u = unsnoc' x' xs
      xs' = fst u
  in (x : xs', xs' `seq` snd u)

yields a simple list (i.e. a sequence of actually evaluated (:) cells) for the first component, which seems fine. EDIT: to be precise:

x1 : x2 : ... : xn-1 : []

treeowl · 2023-05-12T16:03:20Z

I was wrong; @Bodigrim's implementation won't produce any weird structure. I can't make head or tail of what yours is trying to do.

tomjaguarpaw · 2023-05-12T16:08:52Z

I can't make head or tail of what yours is trying to do.

I'd appreciate if you would give it another go. It's a direct recursive implementation and I didn't mean anything complicated by it.

treeowl · 2023-05-12T16:12:04Z

I can't make head or tail of what yours is trying to do.

I'd appreciate if you would give it another go. It's a direct recursive implementation and I didn't mean anything complicated by it.

The purpose of the seq is what I find mysterious. What's that trying to do?

hasufell · 2023-05-12T16:19:24Z

I was wrong; @Bodigrim's implementation won't produce any weird structure. I can't make head or tail of what yours is trying to do.

Can we get concrete evidence for such claims? I find it hard to follow this otherwise:

weird structure -> let's look at core
worse performance -> benchmarks

tomjaguarpaw · 2023-05-12T16:37:13Z

The first component returned by the first version of unsnoc looks like

x1 : fst (x2 : fst (x3 : ..., xn), xn), xn)

The first component of the heap returned by the second version looks like

x1 : x2 : ... : xn-1 : []

Since you were worried about the heap object constructed in memory I wondered if you'd be happier with the second version.

treeowl · 2023-05-12T16:58:09Z

No, I just made a mistake about the first. @Bodigrim's has the advantage of participating in list fusion.

stephen-smith · 2023-05-13T01:25:28Z

I don't see a way to sneak a GHC.Exts.build in there so we get the other half of list fusion without a local Mu definition or impredicative polymorphism. But, if anyone else can, that would be an improvement.

LGTM already.

treeowl · 2023-05-13T01:45:58Z

@stephen-smith , no, but the foldr is good on the other side.

Bodigrim · 2023-05-15T18:34:00Z

I've updated the top post with a link to the MR (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs) and impact assessment (only trivial name clashes, all PRs are ready).

chshersh · 2023-05-15T19:05:34Z

-- * If the list is non-empty, returns @'Just' (xs, x)@,
-- where @xs@ is the 'init'ial part of the list and @x@ is its last element.

After 8 years with Haskell, I finally learned while init is called init 😅

In any case, left a documentation suggestion in the PR.

Bodigrim · 2023-05-19T17:47:43Z

If there are no further questions / comments / suggestions over the weekend, I'll trigger a vote next week.

Bodigrim · 2023-05-22T20:46:40Z

(changing hats)

Dear CLC members, let's vote on the proposal to add Data.List.unsnoc :: [a] -> Maybe ([a], a) to mirror Data.List.uncons and provide a total alternative to init and last. The change is detailed in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs. This is not a breaking change according to PVP, all potential breakage is limited to name clashes and the proposer prepared PRs for all 4 affected Stackage packages (3 of which are already merged, and 1 awaits whether this proposal gets approved).

@tomjaguarpaw @chshersh @hasufell @mixphix @angerman @parsonsmatt

+1 from me, unsurprisingly.

chshersh · 2023-05-22T21:20:46Z

+1

hasufell · 2023-05-23T02:27:07Z

+1

tomjaguarpaw · 2023-05-23T09:13:38Z

+1

parsonsmatt · 2023-05-23T12:45:29Z

+1

mixphix · 2023-05-23T14:00:30Z

+1

Bodigrim · 2023-05-24T21:27:27Z

Thanks all, 6 votes in favour are enough to approve.

See haskell/core-libraries-committee#165 for discussion

mauke · 2023-06-09T13:56:52Z

For anyone reading up on the discussion after the fact (like me), I want to leave a few remarks.

@tomjaguarpaw wrote:

I don't actually fully understand Bodigrim's implementation. I can never really grok foldrs where the accumulator is a function.

The implementation in question:

unsnoc :: [a] -> Maybe ([a], a)
unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing

It's not easy to visually disentangle this definition, but the accumulator here is not a function, but just Nothing (of type Maybe ([a], a), matching the result type of unsnoc).

I'm going to try to simplify the code through a few (hopefully obvious) steps.

Un-inline (outline?) the step function:

unsnoc = foldr step Nothing
    where
    step x = Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))

Eta-expand:

unsnoc list = foldr step Nothing list
    where
    step x z = (Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) z

Inline/expand .:

unsnoc list = foldr step Nothing list
    where
    step x z = Just (maybe ([], x) (\(~(a, b)) -> (x : a, b))) z)

Inline/expand maybe, turning it into a case with explicit pattern matching:

unsnoc list = foldr step Nothing list
    where
    step x z = Just (case z of
            Nothing -> ([], x)
            Just (~(a, b)) -> (x : a, b))

In prose:
If list is empty, we immediately return the (initial) accumulator, Nothing.
If list is non-empty, we use the step function, which always returns a Just.
In step, if z (the result of folding the rest of the list) is Nothing, then the rest of the list must have been empty, in which case x (the current element of the list) is the last element, so we return ([], x).
Otherwise the rest of the list was non-empty, so we just prepend the current element x to a (= the init part of the rest) and pass through b (the last element of the list).

tomjaguarpaw · 2023-06-09T13:59:22Z

Thanks! Perhaps you're missing a Just around ~(a, b) in the final version?

tomjaguarpaw · 2023-06-10T10:38:44Z

Thanks @mauke, I find your version much more readable and it's clear what's going on.

Bodigrim closed this as completed May 24, 2023

Bodigrim added the approved Approved by CLC vote label May 24, 2023

sthagen pushed a commit to sthagen/ghc-ghc that referenced this issue May 27, 2023

Add Data.List.unsnoc

36d5944

See haskell/core-libraries-committee#165 for discussion

mauke mentioned this issue Jun 9, 2023

make unsnoc handle infinite lists protolude/protolude#144

Open

tbidne mentioned this issue Aug 2, 2023

Is clc-stackage up to date? haskell/clc-stackage#14

Closed

Bodigrim added the base-4.19 Implemented in base-4.19 (GHC 9.8) label Jun 20, 2024

mpilgrem mentioned this issue Nov 11, 2024

Add a suitable {-# WARNING in "x-partial" ... #-} to Data.List.{init,last} #292

Open

Bodigrim mentioned this issue Dec 14, 2024

Improve the time performance of Data.List.unsnoc #307

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Bodigrim commented May 10, 2023 •

edited

Loading

mixphix commented May 11, 2023

chshersh commented May 11, 2023

treeowl commented May 11, 2023

mixphix commented May 11, 2023

hasufell commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

tomjaguarpaw commented May 12, 2023 •

edited

Loading

treeowl commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

hasufell commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

stephen-smith commented May 13, 2023

treeowl commented May 13, 2023

Bodigrim commented May 15, 2023

chshersh commented May 15, 2023

Bodigrim commented May 19, 2023

Bodigrim commented May 22, 2023

chshersh commented May 22, 2023

hasufell commented May 23, 2023

tomjaguarpaw commented May 23, 2023

parsonsmatt commented May 23, 2023

mixphix commented May 23, 2023

Bodigrim commented May 24, 2023

mauke commented Jun 9, 2023 •

edited

Loading

tomjaguarpaw commented Jun 9, 2023

tomjaguarpaw commented Jun 10, 2023

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Comments

Bodigrim commented May 10, 2023 • edited Loading

Implementation

Impact assessment

Prior art

mixphix commented May 11, 2023

chshersh commented May 11, 2023

treeowl commented May 11, 2023

mixphix commented May 11, 2023

hasufell commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

tomjaguarpaw commented May 12, 2023 • edited Loading

treeowl commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

hasufell commented May 12, 2023

tomjaguarpaw commented May 12, 2023

treeowl commented May 12, 2023

stephen-smith commented May 13, 2023

treeowl commented May 13, 2023

Bodigrim commented May 15, 2023

chshersh commented May 15, 2023

Bodigrim commented May 19, 2023

Bodigrim commented May 22, 2023

chshersh commented May 22, 2023

hasufell commented May 23, 2023

tomjaguarpaw commented May 23, 2023

parsonsmatt commented May 23, 2023

mixphix commented May 23, 2023

Bodigrim commented May 24, 2023

mauke commented Jun 9, 2023 • edited Loading

tomjaguarpaw commented Jun 9, 2023

tomjaguarpaw commented Jun 10, 2023

Bodigrim commented May 10, 2023 •

edited

Loading

tomjaguarpaw commented May 12, 2023 •

edited

Loading

mauke commented Jun 9, 2023 •

edited

Loading