Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Closed
Bodigrim opened this issue May 10, 2023 · 29 comments
Closed

Add Data.List.unsnoc :: [a] -> Maybe ([a], a) #165

Bodigrim opened this issue May 10, 2023 · 29 comments
Labels
approved Approved by CLC vote base-4.19 Implemented in base-4.19 (GHC 9.8)

Comments

@Bodigrim
Copy link
Collaborator

Bodigrim commented May 10, 2023

Currently base does not offer any ergonomic total replacement for Data.List.init :: [a] -> [a] and Data.List.last :: [a] -> a. There is also no efficient way to compute init and tail simultaneously, which is a very common task. Similar issues with head and tail are resolved by uncons :: [a] -> Maybe (a, [a]), so let's introduce its dual.

unsnoc :: [a] -> Maybe ([a], a)
unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing
> unsnoc []
Nothing
> unsnoc [1]
Just ([],1)
> unsnoc [1,2,3]
Just ([1,2],3)

The complete MR is available at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs

Implementation

It's important to use a lazy pattern match ~(a, b) in the definition of unsnoc above. Otherwise the function becomes non-productive on infinite lists and (probably more importantly) is prone to stack overflow. Compare against a hypothetical stricter version, without ~:

unsnoc' :: [a] -> Maybe ([a], a)
unsnoc' = foldr (\x -> Just . maybe ([], x) (\(a, b) -> (x : a, b))) Nothing

It's easier to spot the difference running ghci with artificially limited stack size:

$ ghci +RTS -K64K
> unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing
> unsnoc' = foldr (\x -> Just . maybe ([], x) (\(a, b) -> (x : a, b))) Nothing
> snd <$> unsnoc [1..10000]
Just 10000
> snd <$> unsnoc' [1..10000]
Just *** Exception: stack overflow
> head . fst <$> unsnoc [1..10000]
Just 1
> head . fst <$> unsnoc' [1..10000]
Just *** Exception: stack overflow

Here is another test to distinguish unsnoc and unsnoc':

> head . fst <$> unsnoc (1 : 2 : undefined)
Just 1
> head . fst <$> unsnoc' (1 : 2 : undefined)
Just *** Exception: Prelude.undefined

The difference between unsnoc and unsnoc' is similar to foldr vs. foldr'. See also #133, comparingbreak to break'.

The laziness of the proposed unsnoc precisely matches the behaviour of a naive, inefficient implementation via init / last:

naiveUnsnoc :: [a] -> Maybe ([a], a)
naiveUnsnoc [] = Nothing 
naiveUnsnoc xs = Just (init xs, last xs)

Impact assessment

Adding a new entity is not a breaking change according to PVP. GHC emits a -Wcompat warning for unqualified import of Data.List. Still some packages might dismiss the warning (or never enable it), thus there is a possibility of name clashes. I've built clc-stackage against GHC 9.4 with the proposed patch and prepared PRs to all affected packages:

Prior art

22 packages, including Cabal-syntax, filepath, unix, maintain their own implementations of unsnoc. I believe it is time to offer it from Data.List.

@mixphix
Copy link
Collaborator

mixphix commented May 11, 2023

Heartily in favour.

@chshersh
Copy link
Member

Sounds like an excellent function to be added 👍🏻

If this function is so popular, it makes sense to standardize it and provide a properly-lazy implementation (not sure that all 22 packages implement it correctly).

I trust you to write great documentation with examples and clear complexity explanations, so I don't worry about that 🙂

@treeowl
Copy link

treeowl commented May 11, 2023

I don't think this function is very practical, because it's rather slow and is only lazy on one side. But I think it may still have a place for "roughing out" code before choosing a sequence type, and for introducing that as a general sequence operation. FWIW, I think we should consider adding nubOn for the same reason—efficient nub variants in various libraries tend to have On forms and not By forms.

@mixphix
Copy link
Collaborator

mixphix commented May 11, 2023

A search for the more general ^unsnoc :: \[ yields 28 packages. Quickly perusing the various implementations, some were identical, and some seemed quite outlandish. I think this indicates that a standardized version with optimal laziness properties would be a good addition to base.

@hasufell
Copy link
Member

+1

@tomjaguarpaw
Copy link
Member

I don't think this function is very practical

I agree with this. However, it is more practical (due to being total) than init or last, which it is designed to be a replacement for.

because it's rather slow and is only lazy on one side.

I'm not sure what this means.

@treeowl
Copy link

treeowl commented May 12, 2023

I don't think this function is very practical

I agree with this. However, it is more practical (due to being total) than init or last, which it is designed to be a replacement for.

because it's rather slow and is only lazy on one side.

I'm not sure what this means.

Suppose I write

oof xs = case unsnoc xs of
  Nothing -> 12
  Just (ys, y)
    | y == 0 = sum ys
    | otherwise = 15

This will build some awful structure in memory and ultimately be even worse for performance than using init and last.

More subtly, how will this perform?

goof xs = case unsnoc xs of
  Nothing -> 17
  Just (ys, y) -> sum ys * y

That's up to whether the compiler decides to force sum ys first (which will be okay) or y (which will be pretty wretched).

@tomjaguarpaw
Copy link
Member

tomjaguarpaw commented May 12, 2023

Yes, I think the function ought to come with the caveat "make sure you finish using the first component of the result before you start using the second component". Accessing the init or last of a list really isn't the sort of thing one ought to be doing anyway, so I think this caveat is natural.

On the other hand I still don't understand what the "awful structure" is. Can you elaborate? I don't actually fully understand Bodigrim's implementation. I can never really grok foldrs where the accumulator is a function. However this simple version seems fine

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc (x:xs) = Just (unsnoc' x xs)

unsnoc' :: a -> [a] -> ([a], a)
unsnoc' x [] = ([], x)
unsnoc' x (x':xs) =
  let u = unsnoc' x' xs
  in (x : fst u, snd u)

If we access the second component before the first then the first will end up looking like

x1 : fst (x2 : fst (x3 : ..., xn), xn), xn)

which is unpleasant, and I'm open to persuasion that it's awful, but I don't see why yet. On the other hand, the alternative of

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc (x:xs) = Just (unsnoc' x xs)

unsnoc' :: a -> [a] -> ([a], a)
unsnoc' x [] = ([], x)
unsnoc' x (x':xs) =
  let u = unsnoc' x' xs
      xs' = fst u
  in (x : xs', xs' `seq` snd u)

yields a simple list (i.e. a sequence of actually evaluated (:) cells) for the first component, which seems fine. EDIT: to be precise:

x1 : x2 : ... : xn-1 : []

@treeowl
Copy link

treeowl commented May 12, 2023

I was wrong; @Bodigrim's implementation won't produce any weird structure. I can't make head or tail of what yours is trying to do.

@tomjaguarpaw
Copy link
Member

I can't make head or tail of what yours is trying to do.

I'd appreciate if you would give it another go. It's a direct recursive implementation and I didn't mean anything complicated by it.

@treeowl
Copy link

treeowl commented May 12, 2023

I can't make head or tail of what yours is trying to do.

I'd appreciate if you would give it another go. It's a direct recursive implementation and I didn't mean anything complicated by it.

The purpose of the seq is what I find mysterious. What's that trying to do?

@hasufell
Copy link
Member

I was wrong; @Bodigrim's implementation won't produce any weird structure. I can't make head or tail of what yours is trying to do.

Can we get concrete evidence for such claims? I find it hard to follow this otherwise:

  • weird structure -> let's look at core
  • worse performance -> benchmarks

@tomjaguarpaw
Copy link
Member

The first component returned by the first version of unsnoc looks like

x1 : fst (x2 : fst (x3 : ..., xn), xn), xn)

The first component of the heap returned by the second version looks like

x1 : x2 : ... : xn-1 : []

Since you were worried about the heap object constructed in memory I wondered if you'd be happier with the second version.

@treeowl
Copy link

treeowl commented May 12, 2023

No, I just made a mistake about the first. @Bodigrim's has the advantage of participating in list fusion.

@stephen-smith
Copy link

I don't see a way to sneak a GHC.Exts.build in there so we get the other half of list fusion without a local Mu definition or impredicative polymorphism. But, if anyone else can, that would be an improvement.

LGTM already.

@treeowl
Copy link

treeowl commented May 13, 2023

@stephen-smith , no, but the foldr is good on the other side.

@Bodigrim
Copy link
Collaborator Author

I've updated the top post with a link to the MR (https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs) and impact assessment (only trivial name clashes, all PRs are ready).

@chshersh
Copy link
Member

-- * If the list is non-empty, returns @'Just' (xs, x)@,
-- where @xs@ is the 'init'ial part of the list and @x@ is its last element.

After 8 years with Haskell, I finally learned while init is called init 😅

In any case, left a documentation suggestion in the PR.

@Bodigrim
Copy link
Collaborator Author

If there are no further questions / comments / suggestions over the weekend, I'll trigger a vote next week.

@Bodigrim
Copy link
Collaborator Author

(changing hats)

Dear CLC members, let's vote on the proposal to add Data.List.unsnoc :: [a] -> Maybe ([a], a) to mirror Data.List.uncons and provide a total alternative to init and last. The change is detailed in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10442/diffs. This is not a breaking change according to PVP, all potential breakage is limited to name clashes and the proposer prepared PRs for all 4 affected Stackage packages (3 of which are already merged, and 1 awaits whether this proposal gets approved).

@tomjaguarpaw @chshersh @hasufell @mixphix @angerman @parsonsmatt


+1 from me, unsurprisingly.

@chshersh
Copy link
Member

+1

4 similar comments
@hasufell
Copy link
Member

+1

@tomjaguarpaw
Copy link
Member

+1

@parsonsmatt
Copy link

+1

@mixphix
Copy link
Collaborator

mixphix commented May 23, 2023

+1

@Bodigrim
Copy link
Collaborator Author

Thanks all, 6 votes in favour are enough to approve.

@Bodigrim Bodigrim added the approved Approved by CLC vote label May 24, 2023
sthagen pushed a commit to sthagen/ghc-ghc that referenced this issue May 27, 2023
@mauke
Copy link

mauke commented Jun 9, 2023

For anyone reading up on the discussion after the fact (like me), I want to leave a few remarks.

@tomjaguarpaw wrote:

I don't actually fully understand Bodigrim's implementation. I can never really grok foldrs where the accumulator is a function.

The implementation in question:

unsnoc :: [a] -> Maybe ([a], a)
unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing

It's not easy to visually disentangle this definition, but the accumulator here is not a function, but just Nothing (of type Maybe ([a], a), matching the result type of unsnoc).

I'm going to try to simplify the code through a few (hopefully obvious) steps.

Un-inline (outline?) the step function:

unsnoc = foldr step Nothing
    where
    step x = Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))

Eta-expand:

unsnoc list = foldr step Nothing list
    where
    step x z = (Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) z

Inline/expand .:

unsnoc list = foldr step Nothing list
    where
    step x z = Just (maybe ([], x) (\(~(a, b)) -> (x : a, b))) z)

Inline/expand maybe, turning it into a case with explicit pattern matching:

unsnoc list = foldr step Nothing list
    where
    step x z = Just (case z of
            Nothing -> ([], x)
            Just (~(a, b)) -> (x : a, b))

In prose:
If list is empty, we immediately return the (initial) accumulator, Nothing.
If list is non-empty, we use the step function, which always returns a Just.
In step, if z (the result of folding the rest of the list) is Nothing, then the rest of the list must have been empty, in which case x (the current element of the list) is the last element, so we return ([], x).
Otherwise the rest of the list was non-empty, so we just prepend the current element x to a (= the init part of the rest) and pass through b (the last element of the list).

@tomjaguarpaw
Copy link
Member

Thanks! Perhaps you're missing a Just around ~(a, b) in the final version?

@tomjaguarpaw
Copy link
Member

Thanks @mauke, I find your version much more readable and it's clear what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote base-4.19 Implemented in base-4.19 (GHC 9.8)
Projects
None yet
Development

No branches or pull requests

9 participants