-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cats.kernel.Hash port for Scala CHAMP HashSet #4185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very exciting!
I personally don't see a strong reason for cats-collections to exist in a separate repo, but it does. I imagine this will see much more use if added here, but then again, use will probably be small no matter what.
It is very cool. I will give a more detailed review.
I'm strongly of the opinion that cats-collections should remain separate at least until stabilized. The bar is very high for breaking Cats. Too high for a library where things are just getting bootstrapped. I'm also generally in favor of it remaining separate on a permanent basis, but the reasoning is less objectively strong there. |
Since this is a port, does this contain code adapted from the Scala standard library? Cats is MIT licensed whereas the Scala standard library is Apache 2 licensed, if you're adapting code from the standard library you probably need to change your license to Apache 2 (or separately license some files as Apache 2? But that seems confusing). |
@smarter Good question. I actually referred mostly to @msteindorfer's initial PR to the collections-strawman, but that is also Apache licensed.
I did not start with any existing source file from the standard library and modify it but rather used Michael's work and the Scala standard library code as inspiration and reference. In many places that essentially means writing exactly the same code, so I don't know if that falls under the "modified files" case but I would certainly say it is a derivative work. Would it be sufficient to include the Apache license at the root of the cats repo, and add attribution notices to those files based on the scala/scala NOTICE file in addition to the (auto-generated) cats headers in those files? To be honest it has come as a surprise to me that cats is MIT licensed as the bulk of the ecosystem is on Apache 2.0! |
You'd also have to include it in Line 321 in fe40bc2
|
Just following up this discussion and realized that Cats has some compatibility-related code in the tests which is derived from Scala library. One of such piece of code which I am personally aware of: cats/tests/shared/src/test/scala-2.12/cats/tests/compat/SeqOps.scala Lines 30 to 46 in fe40bc2
@smarter @DavidGregory084 – wdyt? |
Yes you are right @satorg I don't think this is a new problem in cats - see e.g. the file HashCompat in the main branch, which already contains code from the Scala standard library. EDIT: btw I have raised a new issue for this #4189 |
I have made many things private and final. I'm not exactly sure how such modifiers are typically used in cats - is usage of |
…gations as a derived work
@@ -415,6 +415,12 @@ object arbitrary extends ArbitraryInstances0 with ScalaVersionSpecific.Arbitrary | |||
|
|||
implicit val catsLawsArbitraryForMiniInt: Arbitrary[MiniInt] = | |||
Arbitrary(Gen.oneOf(MiniInt.allValues)) | |||
|
|||
implicit def catsLawsArbitraryForHashSet[A](implicit A: Arbitrary[A], hash: Hash[A]): Arbitrary[HashSet[A]] = | |||
Arbitrary(getArbitrary[List[A]].map(HashSet.fromSeq(_)(hash))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see a different way to build a hashset. My opinion is basically this should have all the fundamental operations:
- empty
- fromSeq
recurse + a
recurse - a
recurse0 | recurse1
recurse0 & recurse1
recurse0 -- recurse1
something like this (but you will need to use Gen.frequency
to make sure the probability we branch into two is low enough that this doesn't go on forever.
The idea here is to simulate all the paths of creating a HashSet to make sure there aren't some buggy paths.
OK, as per discussion on #4193 I have reopened this as typelevel/cats-collections#533 |
This PR implements an immutable hash set using the CHAMP encoding and
cats.kernel.Hash
for hashing of elements, as per #4147.I've written up some basics about CHAMP here in case anyone wants to take a look.
Big questions remain:
Hash
constraint appear at the method level or at the constructor level inHashSet
?Contributed on behalf of the @opencastsoftware open source team. 👋