Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is unions "little" wrong ? #139

Open
ashishnegi opened this issue Nov 17, 2016 · 8 comments
Open

Is unions "little" wrong ? #139

ashishnegi opened this issue Nov 17, 2016 · 8 comments

Comments

@ashishnegi
Copy link

Looking at unions and union

union says that first argument should be smaller set.
foldl' used in unions would give bigger sets in each iteration's accumulator,
which becomes first argument to union ; leading to slower performance.

Should not we use flip union in unions ?
Or i am missing something.

@sjakobi
Copy link
Member

sjakobi commented Jun 1, 2020

Thanks for the report, @ashishnegi. Indeed it does seem like we're missing out on some easy wins!

union says that first argument should be smaller set.

This is the relevant code section:

-- | /O(n+m)/ Construct a set containing all elements from both sets.
--
-- To obtain good performance, the smaller set must be presented as
-- the first argument.
--
-- ==== __Examples__
--
-- >>> union (fromList [1,2]) (fromList [2,3])
-- fromList [1,2,3]
union :: (Eq a, Hashable a) => HashSet a -> HashSet a -> HashSet a
union s1 s2 = HashSet $ H.union (asMap s1) (asMap s2)
{-# INLINE union #-}

Since HashSet.union is just such a tiny wrapper around HashMap.union, which in turn wraps HashMap.unionWithKey via HashMap.unionWith, I think it's a bit curious that the latter two don't have a similar note!

union :: (Eq k, Hashable k) => HashMap k v -> HashMap k v -> HashMap k v
union = unionWith const
{-# INLINABLE union #-}
-- | /O(n+m)/ The union of two maps. If a key occurs in both maps,
-- the provided function (first argument) will be used to compute the
-- result.
unionWith :: (Eq k, Hashable k) => (v -> v -> v) -> HashMap k v -> HashMap k v
-> HashMap k v
unionWith f = unionWithKey (const f)
{-# INLINE unionWith #-}
-- | /O(n+m)/ The union of two maps. If a key occurs in both maps,
-- the provided function (first argument) will be used to compute the
-- result.
unionWithKey :: (Eq k, Hashable k) => (k -> v -> v -> v) -> HashMap k v -> HashMap k v
-> HashMap k v

I think we should first check whether this advice is still true. @tibbe, do you have any insight here?

@tibbe
Copy link
Collaborator

tibbe commented Jun 1, 2020

I don't remember the details here but likely HashMap should have the same note as HashSet.

@sjakobi
Copy link
Member

sjakobi commented Jun 1, 2020

Thanks, @tibbe! :)

It would be good to investigate the issue a bit! :)

I think the first thing to do would be some simple benchmarks in order to confirm that this "performance bias" exists.

It would also be interesting to know where the bias comes from – somewhere in unionWithKey, I assume! Although a fix wouldn't necessarily have to wait for that.

@ashishnegi Are you still interested in this issue?

@ashishnegi
Copy link
Author

it has been some time since i used this repo.. and haskell :(
but i think their should be many customers of this repo.. who will benefit from perf improvements.

@sjakobi
Copy link
Member

sjakobi commented Jun 1, 2020

Thanks for the fast response, @ashishnegi! :)

I understand that you're currently not interested in working on this. No wonder – it's been 3 and a half years!

@treeowl
Copy link
Collaborator

treeowl commented Dec 19, 2021

If we flip the union, I believe that'll change semantics. We might want to write an opposite-biased version for this purpose.

@sjakobi
Copy link
Member

sjakobi commented Dec 19, 2021

If this performance bias actually exists, improving unions might be as simple changing

-unions = L.foldl' union empty
+unions = L.foldl' (flip (unionWith (flip const))) empty

(just check that the flips are optimized away)

In fact, if this implementation gives significantly better benchmark results, I'd take that as sufficient evidence that the claimed performance bias exists.

@treeowl
Copy link
Collaborator

treeowl commented Dec 19, 2021

Good point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants