-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Add sortedTo (and friends) to Iterator #19
Comments
@joshlemer might have an opinion? |
I think that having a fully lazy It would also come into handy with more than just terminal operations, if you could have a lazy sort: case class Employee(name: String, salary: Double)
val employees: Iterable[Employee] = ???
val top3 = employees.{view, iterator}.sortBy(_.salary).take(3).toSeq |
the lazy sorting would open the door to optimizing the above self =>
def take(n: Int): Iterator = {
if (self.knownSize != -1 && self.knownSize < n) {
// nothing to be gained from heapifying. Just sort it
return Iterator.empty.concat(self.toArray.sortInPlace.iterator)
}
new Iterator[A] {
var heap: MinMaxHeap[A] = null
def hasNext: Boolean = (heap != null && heap.nonEmpty) || self.hasNext
def next(): A = {
if (heap != null) return heap.popMin()
// must force `self` now
self.foreach { a =>
if (heap.size < n) heap.addOne(a)
else {
if (a < heap.peekLast()) {
heap.popLast()
heap.addOne(a)
}
}
}
heap.popMin()
}
}
} |
regardless of whether or not we have something for
while a min-max heap could drastically improve |
What about having I may be mistaken, but I do not think that a It seems (but I may be mistaken) that it would be easier and better to implement a lazy sort in the context of a consumption of the Iterator. |
Just my opinion but I think it's lighter to just have class SortedIterator[A](underlying: Iterator[A]) {
// optionally, we can optimize these methods, which one cannot do if you take the `sortedTo` approach
override def take(n: Int): Iterator[A] = ???
override def drop(n: Int): Iterator[A] = ???
override def takeRight(n: Int): Iterator[A] = ???
override def dropRight(n: Int): Iterator[A] = ???
override def slice(i: Int, j: Int): Iterator[A] = ???
// whatever optimizations you would have put into sortedTo, just put in here
override def to[C](f: Factory[A, C]): C = ???
} This is no less simple than making a new sortedTo method. It's much less intrusiv than duplicating all of the
We could have a heuristic like "only kick in the heap once we confirm that the iterator is larger than (say, for example) 5 or 10 times
there isn't a |
I'm pretty sure they just meant variants for |
found a previous overlapping discussion: scala/bug#11711 |
@SethTisue got me, I am realizing I've been beating this drum for ages 🤣 |
I am for adding lazy sort for view and iterator |
there's only so lazily you can sort. at the end of the day, you need to at the very least iterate over all elements |
If I understand correctly, the value of such an operation would be to be able to sort the X first elements in O(N) rather than in O(N*log(N)), right? (when X is small and in particular X << N) |
The benefits would be:
View.fill(1000000)(Random.nextInt()).sorted.take(5).toList this goes for take and friends (slice, takeRight, drop, dropRight, ...)
val v: View[Int] = ???
View.fill(10)(1).concat(v.sorted).take(5).toList ^ Under the strict approach, v would have been forced, but it turned out we never even needed it at all. |
Either way it works for me. It is not a lazy operation but I think it just conforms to the new way of collection conversion, which is using |
currently, |
I feel like having a
sortedTo(factory)
(as well as all the other sorting variants) onIterator
would be a good addition.It would be good to discuss if it can produce any performance advantage over
iterator.to(factory).sorted
or if it can be considered useful just as an API improvement to avoid two calls.It is also worth discussing if this would be useful for all collections in general; for example,
list.sortedTo(ArraySeq)
.The text was updated successfully, but these errors were encountered: