-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast inclusion operation on hashmaps and hashsets #282
Fast inclusion operation on hashmaps and hashsets #282
Conversation
Thanks for all this work! Can you add an explanation of what you mean by a "subkey"? |
The not-fun part: how does this change affect benchmarks? Also also, do you have ideas for implementing #226? That would presumably subsume this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I have a few comments regarding the consistency with the containers
API.
You mean the performance of
You mean something like Furthermore, I don't know how to implement this. I presume an implementation of
Thanks, I will change these function names. |
One thing that is missing is a proper benchmark for this operation. Any ideas for what would be a good benchmark? |
@treeowl, I compared the performance of The speedup is within 0.97 and 1.04, which I would attribute to measuring inaccuracies. |
@sjakobi, I renamed the functions for compatibility with the containers API in https://github.com/svenkeidel/unordered-containers/commit/ffa499e7c8a087a44442992af9d266c412e962c0. Github did not show this commit here because of an outage. |
Could you try pushing it again? The outage is supposed to be over: https://www.githubstatus.com/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments in isSubmapOfBy
are very helpful! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress!
I think mathematical symbols are fine, but I like them presented as MathJax, not Unicode in the source. |
@sjakobi, @treeowl thanks for the feedback. Except for this last issue (#282 (comment)), I addressed all feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments
I really like the examples BTW! :)
Thanks again for your feedback. I addressed all further comments. Unfortunately, this is about all time I can and want to spend on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite happy with the code at this stage.
I'm curious how much faster this is than a naive implementation though. The complexity seems to be the same I think.
Could you give us an indication of that @svenkeidel?
I don't feel like requiring that proper benchmarks be included in this PR. I think that task could be left for someone who wants to make the code even faster than it is now.
Data/HashMap/Base.hs
Outdated
-- | /O(min n m))/ Checks if a bitmap indexed node is a submap of another. | ||
submapBitmapIndexed :: (HashMap k v1 -> HashMap k v2 -> Bool) -> Bitmap -> A.Array (HashMap k v1) -> Bitmap -> A.Array (HashMap k v2) -> Bool | ||
submapBitmapIndexed comp b1 ary1 b2 ary2 = subsetBitmaps && go 0 0 (b1Orb2 .&. negate b1Orb2) | ||
where | ||
go :: Int -> Int -> Bitmap -> Bool | ||
go i j m | ||
| m > b1Orb2 = True | ||
|
||
-- In case a key is both in ary1 and ary2, check ary1[i] <= ary2[j] and | ||
-- increment the indices i and j. | ||
| b1Andb2 .&. m /= 0 = comp (A.index ary1 i) (A.index ary2 j) && | ||
go (i+1) (j+1) (m `unsafeShiftL` 1) | ||
|
||
-- In case a key occurs in ary1, but not ary2, only increment index j. | ||
| b2 .&. m /= 0 = go i (j+1) (m `unsafeShiftL` 1) | ||
|
||
-- In case a key neither occurs in ary1 nor ary2, continue. | ||
| otherwise = go i j (m `unsafeShiftL` 1) | ||
|
||
b1Andb2 = b1 .&. b2 | ||
b1Orb2 = b1 .|. b2 | ||
subsetBitmaps = b1Orb2 == b2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit goes way over my head. Can I rely on you, @treeowl, to review this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps, I copied some of this code from unionArrayBy
:
unordered-containers/Data/HashMap/Base.hs
Lines 1642 to 1657 in 7e8fc9e
let ba = b1 .&. b2 | |
go !i !i1 !i2 !m | |
| m > b' = return () | |
| b' .&. m == 0 = go i i1 i2 (m `unsafeShiftL` 1) | |
| ba .&. m /= 0 = do | |
x1 <- A.indexM ary1 i1 | |
x2 <- A.indexM ary2 i2 | |
A.write mary i $! f x1 x2 | |
go (i+1) (i1+1) (i2+1) (m `unsafeShiftL` 1) | |
| b1 .&. m /= 0 = do | |
A.write mary i =<< A.indexM ary1 i1 | |
go (i+1) (i1+1) (i2 ) (m `unsafeShiftL` 1) | |
| otherwise = do | |
A.write mary i =<< A.indexM ary2 i2 | |
go (i+1) (i1 ) (i2+1) (m `unsafeShiftL` 1) | |
go 0 0 0 (b' .&. negate b') -- XXX: b' must be non-zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit goes way over my head. Can I rely on you, @treeowl, to review this?
Mine eyes glazeth over when it comes to bit fiddling. I can try. Mostly, I rely on testing.
Data/HashMap/Base.hs
Outdated
-- | /O(n*m)/ Check if the first array is a subset of the second array. | ||
subsetArray :: Eq k => (v1 -> v2 -> Bool) -> A.Array (Leaf k v1) -> A.Array (Leaf k v2) -> Bool | ||
subsetArray cmpV ary1 ary2 = A.length ary1 <= A.length ary2 && A.all inAry2 ary1 | ||
where | ||
inAry2 (L k1 v1) = lookupInArrayCont (\_ -> False) (\v2 _ -> cmpV v1 v2) k1 ary2 | ||
{-# INLINE inAry2 #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svenkeidel, @treeowl What did you think about my idea to record the indices of any found elements in ary2
and to skip these when checking the remaining elements of ary1
?
I'm mostly wondering whether that's an idea worth recording – I wouldn't insist that it be included in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to say if this is going to make a big difference performance wise. For each key you still would need to iterate over ary2 anyways. The only thing you would save are a few equality checks of keys. It could be worth working on this optimization, but before we need some benchmarks to verify that we actually improve the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't record indices. But we could thaw
one of the arrays and play mutation games. Say we have
1 2 3 4
and we search for 3
. Then we can mutate the array to
undefined 2 1 4
That is, swap the 3
to the front and replace it with undefined
. Then start the next scan in the second slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@treeowl That's a neat idea! :)
If we don't do this in this PR, I'll record it in an issue.
@sjakobi, about benchmarks. My collaborator Tobias Hombücher is currently working on a static analysis for Scheme that uses this hashmaps and hashsets extensively (https://gitlab.rlp.net/plmz/sturdy/-/tree/PowersetT/scheme). We have a benchmark suit with a few scheme programs (https://gitlab.rlp.net/plmz/sturdy/-/blob/PowersetT/scheme/bench/FixpointBench.hs). Once Tobias gets the analysis working, this would be a great real-world benchmark for the unordered-containers library, because it uses the That being said, a micro benchmark for the |
I added a few benchmarks. There is still some room for improvement.
|
Data/HashMap/Base.hs
Outdated
-- False | ||
isSubmapOf :: (Eq k, Hashable k, Eq v) => HashMap k v -> HashMap k v -> Bool | ||
isSubmapOf = isSubmapOfBy (==) | ||
{-# INLINE isSubmapOf #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be INLINABLE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats wrong with and INLINE
pragma on isSubmapOf
? The function is nice and small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never need this to be inlined at a call site. But it is very valuable to specialize to particular key and value types. Once those types are known, I believe we want GHC to create a specialization that inlines the comparison function into isSubmapOfBy
.
Data/HashMap/Base.hs
Outdated
-- Collision and Full nodes always contain at least two entries. Hence it | ||
-- cannot be a map of a leaf. | ||
go _ (Collision {}) (Leaf {}) = False | ||
go _ (Full {}) (Leaf {}) = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be INLINE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't isSubmapOfBy
a bit large for INLINE
? This would create a lot of code bloat for every call of isSubmapOfBy
. Why not INLINABLE
? Then users can still specialize it within their code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. We want to be sure to inline it into isSubmapOf
; does that happen if this is just marked INLINABLE
(and that is too)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can collapse these three Leaf
cases into one and put it up above, matching the approach we take to Empty
:
go s (Leaf h1 (L k1 v1)) t2 = lookupCont (\_ -> False) (\v2 _ -> comp v1 v2) h1 k1 s t2
go _ _ (Leaf {}) = False
This makes no difference in the compiled code, but the source is shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. We want to be sure to inline it into isSubmapOf; does that happen if this is just marked INLINABLE (and that is too)?
Another option is to mark both functions INLINABLE and then inline the isSubmapOfBy
function at the isSubmapOf
call site with the inline
function.
General question: would it be useful to implement a lookup-by-hash function that returns the |
Yes, this function would allow us to remove a bit of code duplication. Furthermore, we can probably implement the regular |
How do the benchmarks look with the simplifications in place? |
The simplifications only affect negative cases where |
Something is fishy with the benchmarks. When I run only a single benchmark with
When I run multiple benchmarks with
Either garbage collection kicks in, or there is something wrong with the benchmark environment |
Definitely an issue. Could caching play a role too? I don't know enough about these things. |
@svenkeidel You might be running into haskell/criterion#166?! You can try the |
@sjakobi, Nonetheless, I think it is a bad idea to share a single benchmark environment across all benchmarks. It is reallocated for every benchmark group. We should split the environment into smaller environments for each benchmark group to reduce memory pressure. |
Ok, I rebased everything on master, removed the changes to the benchmark suite and squashed everything into one commit on this branch: https://github.com/svenkeidel/unordered-containers/tree/subset-squashed |
@svenkeidel Is it OK if we finish the review in this PR, so the commit to |
The former makes most sense to me. |
I haven't reviewed them specifically, but rewrite rules in the benchmark suite make me very nervous in principle, because that takes the suite further from realistic code. |
@treeowl, don't worry about them, I removed them from the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions about the INLIN[ABL]E
pragmas – mostly because I'm somewhat confused about where to prefer each. I hope @treeowl can give some principled advice.
How do the new definitions stack up against the naive ones in the benchmark now, BTW?
-- >>> fromList [1,2] `isSubsetOf` fromList [1,3] | ||
-- False | ||
isSubsetOf :: (Eq a, Hashable a) => HashSet a -> HashSet a -> Bool | ||
isSubsetOf s1 s2 = H.isSubmapOfBy (\_ _ -> True) (asMap s1) (asMap s2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSubsetOf s1 s2 = H.isSubmapOfBy (\_ _ -> True) (asMap s1) (asMap s2) | |
isSubsetOf s1 s2 = H.isSubmapOfBy (\_ _ -> True) (asMap s1) (asMap s2) | |
{-# INLINABLE isSubsetOf #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If INLINABLE helps with specialization, it would be good to use here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best thing to do is always to write some sample code and pore over the Core. But that's not a hard requirement when adding new functions.
Oh, if my questions about the pragmas cannot be answered with the current state of the benchmark suite, then never mind. It's fine for me if this is addressed in follow-up patches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Tweaks can happen later.
(I'd suggest a squashed commit here.)
@treeowl Are you done with reviewing this PR? |
I want to make sure someone's verified that this doesn't hurt |
@svenkeidel What did your benchmark spreadsheet say about this? I unfortunately can't find it in this thread anymore. |
Please squash to one or two logical commits. |
@treeowl, see #282 (comment) |
I'll let you and @sjakobi work out the merge mechanics. |
You're welcome, @svenkeidel! And thanks for the feedback on the test and benchmark setup! :) Any further help with that would be very much appreciated. Would you mind if I'd prepare a release now, or do you see anything that needs to be done before that? |
Not at all. Sounds good 👍 |
Dear Johan, Dear David,
First of all, thanks for putting all this hard work into this library. We use it heavily in Sturdy to implement static code analyses in Haskell. To this end, we need a fast subset operation on hashmaps and hashsets. Unfortunately, while a naive implementation is easy to understand, it is rather slow because it compares keys with different hashes:
Instead, I worked on an implementation that only compares keys with the same hash. I had to extend the
lookupCont
function with the number of the subkey to avoid code duplication. I tested this implementation with a few QuickCheck properties. What do you think?I would be delighted if you would add this function to the library.
Regards,
Sven