Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Set#sorted to make a sorted Seq without unnecessary copies #11711

Open
smarter opened this issue Aug 26, 2019 · 9 comments
Open

Add Set#sorted to make a sorted Seq without unnecessary copies #11711

smarter opened this issue Aug 26, 2019 · 9 comments

Comments

@smarter
Copy link
Member

smarter commented Aug 26, 2019

Right now it seems that to get a sorted Seq out of a Set, one has to write .toSeq.sorted which creates an unnecessary intermediate collection.

@SethTisue
Copy link
Member

@joshlemer wdyt?

@joshlemer
Copy link

joshlemer commented Sep 3, 2019

I think having a method sorted directly on Set seems like a bit of a special-case (why not also on Maps, MultiSets, MultiMaps, Views?) and also might look as if set.sorted should return a SortedSet (and likewise, Map -> SortedMap, MultiSet -> SortedMultiSet, MultiMap -> SortedMultiMap). I think it'd be nicer if we could solve this with a nice implementation of set.view.sorted.toSeq.

in the hypothetical class View.Sorted[+A] we could override toSeq with something like...

object View {
  class Sorted[+A](it: SomeIterableOps[A])(o: Ordering[A]) {
    override def toSeq: Seq[A] = 
      ArraySeq.unsafeWrapArray(it.toArray.sortInPlace.array)
  }
}

@smarter
Copy link
Member Author

smarter commented Sep 4, 2019

@texasbruce has a different proposal which might also work for this: #11725

@joshlemer
Copy link

I think going through Views is more versatile and less heavy way to deal with this.

For one thing, we already have SortedMultiSet (maybe one day SortedBag) in ScalaCollectionContrib, and as mentioned in the closed issue, SortedSeq doesn't really behave like a Seq (what should SortedSeq(1,2,3,4,5).updated(2, 20) do?).

Second, inventing a new collection type doesn't help when you want to go from a Set to a sorted List, Vector, ArrayBuffer, Array, etc. The view approach does though:

val v = Set(1,2,3).view.sorted
v.to(Vector)
v.to(ArrayDeque)
...

Last, by making sorted and take completely we will get orders-of-magnitude speedup for things like "take the top 10 items" (I confirmed the orders-of-magnitude speedup with an implementation involving use of google-guava's min-max heap) which will not be the case for SortedSeq's.

(1 to 10000000).to(Set).view.sorted.take(10).toSeq

@neontorrent
Copy link

neontorrent commented Sep 4, 2019

As a subclass of Seq, it makes sense to have a more restrictive behavior than the super class, but I don't think it makes it a different type of thing. We can definitely discuss more on the "unintuitive" behaviors that may imply.

Using view is a good solution for Sorting an existing collection, but if we have a collection that is consistently adding/removing elements, it seems a little tedious to create a view on that on every +/- operation, and realizing it to read from it.


Should we have an updated method on Seq in the first place? Seq is unindexed collection and why do we have a method that takes an index and update the element? Shouldn't we use IndexedSeq in that case?

@joshlemer
Copy link

@texasbruce Have you considered using SortedMultiSet? I would be interested to hear how that collection differs from the SortedSeq.

@neontorrent
Copy link

neontorrent commented Sep 4, 2019

@joshlemer It could work in my use case. Just concern about the performance. Do we have documentation of the big O for the collections in the scala-collection-contrib module?

Also is there any plan to merge into the core library?

@joshlemer
Copy link

joshlemer commented Sep 5, 2019

I don't know of anywhere where the performance characteristics of the collections-contrib data structures are documented, that would be a great issue to create 😉 .

But basically, under the hood the SortedMultiSet[A] will just be a counted, sorted set: TreeMap[A, Int]. So inserting/removing amounts to updating a key in the map, so roughly log(n). Traversal will be n log(n) Sorry, since it is a counted set, it is proportional to the number of distinct elements in the set, k, so log(k) for insertion/removal, and k log(k) + n , respectively

@texasbruce I am not aware of any plan to merge these data structures into core.

@NthPortal
Copy link

another possibility is to add a sortedTo (or toSorted) method that takes a Seq (or arguably also a SeqSet that doesn't exist yet, or a SeqMap? idk)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants