Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the creation of Sets from Maps #21

Open
NthPortal opened this issue Oct 24, 2020 · 13 comments · May be fixed by #35
Open

Support the creation of Sets from Maps #21

NthPortal opened this issue Oct 24, 2020 · 13 comments · May be fixed by #35
Labels
enhancement New feature or request library:collections status:pending This enhancement request lacks consensus on whether or not to include it

Comments

@NthPortal
Copy link
Contributor

NthPortal commented Oct 24, 2020

Add implementations for Sets backed by Maps (and corresponding factories).

Add fluent fromMap APIs to Set companions.

Motivation:

  • It should be first-class
  • Java has it, why don't we?
@NthPortal NthPortal added enhancement New feature or request library:collections labels Oct 24, 2020
@joshlemer
Copy link

Could we get a bit more motivation for this addition? What use-case isn't covered by myMap.keySet?

@NthPortal
Copy link
Contributor Author

most of the keySet implementations will make a full copy of the Set if you modify it in any way, and will also just use the default set implementation for that copy. With this, it will be backed by the Map implementation, and will not make a full copy if you modify it

@joshlemer
Copy link

immutable.HashMap uses structural sharing for inserts as well as removals https://github.com/scala/scala/blob/2.13.x/src/library/scala/collection/immutable/HashMap.scala#L70

Maybe we can port those optimizations to immutable.Map and the other abstract Map subclasses (SeqMap, SortedMap) without having to introduce new collection types

@NthPortal
Copy link
Contributor Author

what about mutable Maps?

@NthPortal
Copy link
Contributor Author

Also, I don't think it's a reasonable burden to put on Map implementors that their keySet implementation must act as a full Set implementation.

Two of the main motivators for this are to provide implementations for the following types (that do not currently exist) based on currently existing Map implementations:

  • scala.collection.concurrent.Set
  • scala.collection.immutable.SeqSet

@joshlemer
Copy link

joshlemer commented Oct 26, 2020

I don't think there need be any burden put on Map implementers. In immutable.Map just add the optimizations (might not compile but you get the jist):

// Map.scala

self => 

  override def keySet: Set[K] = new ImmutableKeySet

  /** The implementation class of the set returned by `keySet` */
  protected class ImmutableKeySet extends AbstractSet[K] with GenKeySet with DefaultSerializable {
    def incl(elem: K): Set[K] = if (this(elem)) this else (self + (elem -> null)).keySet
    def excl(elem: K): Set[K] = if (this(elem)) (self - elem).keySet else this
  }

Then in immutable.SortedMap you'd do almost the same thing

// SortedMap.scala
self => 
  override def keySet: SortedSet[K] = new ImmutableKeySortedSet

  /** The implementation class of the set returned by `keySet` */
  protected class ImmutableKeySortedSet extends AbstractSet[K] with SortedSet[K] with GenKeySet with GenKeySortedSet {
    def rangeImpl(from: Option[K], until: Option[K]): SortedSet[K] = {
      val map = self.rangeImpl(from, until)
      new map.ImmutableKeySortedSet
    }
    def incl(elem: K): SortedSet[K] = if (this(elem)) this else (self + (elem -> null)).keySet
    def excl(elem: K): SortedSet[K] = if (this(elem)) (self - elem).keySet else this
  }

SeqMap already gets the optimization inherited from Map, and then hypothetically if we ever added SeqSet then you'd just do basically the same thing with that one:

// SeqMap.scala

self => 
  override def keySet: SeqSet[K] = new ImmutableKeySeqSet

  protected class ImmutableKeySeqSet extends AbstractSet[K] with SortedSet[K] with GenKeySet with GenKeySeqSet {
    def incl(elem: K): SeqSet[K] = if (this(elem)) this else (self + (elem -> null)).keySet
    def excl(elem: K): SeqSet[K] = if (this(elem)) (self - elem).keySet else this
  }

Any new implementation of Map, SeqMap or SortedMap with get the optimization passed down and wouldn't need to define it again unless they wanted to add their own special semantics around the keySet

@NthPortal NthPortal changed the title Allow the creation of Sets from Maps Support the creation of Sets from Maps Oct 28, 2020
@NthPortal
Copy link
Contributor Author

There are still several things you're missing:

  • transformations (map, filter, etc.) should return a Set backed by the same type of Map, rather than returning the default Set implementation. and if your solution to that is to override all those methods in ImmutableKeySet as well, you have successfully written the majority of a Set-from-Map implementation.
  • keySet does not provide a Factory
  • not all Maps permit null values (though you could probably use () or something for immutable Maps)
  • this doesn't work for mutable Maps, where adding elements modifies the Map on which keySet was called, and where keys are expected to be mapped to meaningful and non-null values

@NthPortal NthPortal linked a pull request Nov 2, 2020 that will close this issue
3 tasks
@joshlemer
Copy link

joshlemer commented Nov 3, 2020

I'm not seeing the practical use-case of a SetFromMap whose map/filter/etc operations return the same type of SetFromMap, rather than the generic Set, or why I would want to use a keySet-producing factory. This still seems to me like quite niche requirements, and would advise against adding a whole new collection type for such a thing (in the stdlib at least, for now, until the usecase is more proven).

@NthPortal
Copy link
Contributor Author

NthPortal commented Nov 3, 2020

I'm not seeing the practical use-case of a SetFromMap whose map/filter/etc operations return the same type of SetFromMap

the same use case as all other collections that return the same implementation for transformations?

This still seems to me like quite niche requirements

apparently common enough for Java to have it (Collections.newSetFromMap)

would advise against adding a whole new collection type

the implementation would be private; just the functionality would be public

@NthPortal NthPortal added the status:pending This enhancement request lacks consensus on whether or not to include it label Nov 21, 2020
@NthPortal
Copy link
Contributor Author

I'm curious to get others' thoughts on this (cc @julienrf @Ichoran)

@NthPortal NthPortal mentioned this issue Dec 3, 2020
@julienrf
Copy link
Contributor

julienrf commented Dec 3, 2020

@NthPortal I must say that I’m not really convinced by the use case. Maybe I misunderstood. Could we improve the implementation of keySet instead?

@NthPortal
Copy link
Contributor Author

I plan to reply to this, but I don't have the capacity right now

@NthPortal
Copy link
Contributor Author

NthPortal commented Aug 24, 2021

sorry it's taken me so long to reply.

There are basically two questions:

  1. do we support the creation of sets from maps?
  2. if so, what's the API for that?

I opine that the answer to (1) is "yes".

For a variety of reasons, I do not believe keySet is a good API for this task, and in fact cannot (consistently) be an API for it.

mutable.Map#keySet returns a collection.Set, because returning a mutable.Set would allow you to add keys to the map without adding associated values, which would fundamentally break the coherence of the map. thus, keySet cannot be the API for creating a set backed by a map for mutable.Sets. If we wanted to add such an API where a key set would be able to have values added without breaking the coherency of the underlying map, we would need an API similar to Java's ConcurrentHashMap#keySet(V) which provides a default value to associate with added keys.

Additionally, I just don't think that keySet is the correct API for creating a set backed by a map. The purpose of keySet is to return the set of keys in the map; having it also be the API for creating sets backed by maps would be giving it a "double-duty" of sorts, which I don't think is great design in general. On top of this, you should not have to create an instance of the backing map yourself to create the set; if you look for example at Java's ConcurrentHashMap, it has a static newKeySet method which does not require creating an instance of the map yourself.

I also don't think keySet a particularly discoverable API for creating a set backed by a map—it has the wrong name. There's nothing about keySet that says that it can't just be a copy-on-write set. Java actually has an API for creating sets backed by maps (though not with more specific types than java.util.Set, which I believe is why ConcurrentHashMap.newKeySet exists): Collections.newSetFromMap. This is a discoverable API with an intuitive name describing its purpose. In Scala, we could do slightly better even with Set.fromMap and similar methods for more specific types of set (e.g. SortedSet.fromMap). Such an API would, in its very name, commit to giving you a set that's always backed by the given map type when possible. keySet makes no such commitment.


all that said, I'm happy to move my implementation to collection-contrib if that's deemed a more appropriate place for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request library:collections status:pending This enhancement request lacks consensus on whether or not to include it
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants