Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Don’t force views elements prematurely. #343

Merged
merged 1 commit into from
Jan 22, 2018

Conversation

julienrf
Copy link
Contributor

@julienrf julienrf commented Jan 16, 2018

IndexedView[A] was previously trying to return an IndexedView[A] on
transformation operations such as filter or take. Doing so sometimes
required to evaluate the views elements to build an indexed sequence
from which to get an IndexedView (see the last case):

 override protected[this] def fromSpecificIterable(it: Iterable[A]): IndexedView[A] =
   it match {
     case v: IndexedView[A] => v
     case i: IndexedSeq[A] => i.view
     case _ => it.to(IndexedSeq).view
   }

Now IndexedView[A] returns only a View[A] on such operations.

Another solution would have been to have IndexedView based versions of all transformation operations that are View based (i.e. IndexedView.Filter, IndexedView.Map). The drawback of that solution would be that we would duplicate all transformation operations (we would have View.Filter and IndexedView.Filter, etc.), but the advantage would be that the fact that a collection is indexed would be preserved across transformations applied to its view.


I discovered this issue while benchmarking the following code:

      xs.view
        .filter(x => x % 2L == 0L)
        .map(x => x * x)
        .sum

When xs was an IndexedSeq, then it’s filter operation returned fromSpecificIterable(View.Filter(…)), which was converting the View.Filter(…) into a strict IndexedSeq to take its view.

As a consequence, that code was (up to 2×) slower than the same code using strict transformation operations only (in red on the following chart):

sumofsquareseven

With this PR, the version that uses views rather than strict transformation is slightly faster but still a bit slower than the strict version:

sumofsquareseven-after

I’ve run a profiler and noticed that most of the time is spent on the hasNext operation. It is not clear to me why the view based version is still slower because in the strict version we also iterate on the same transformations and call the same hasNext operations…

IndexedView[A] was previously trying to return an IndexedView[A] on
transformation operations such as filter or take. Doing so sometimes
required to evaluate a views elements to build an indexed sequence
from which to get an IndexedView.

Now IndexedView[A] returns only a View[A] on such operations.
@julienrf julienrf requested review from szeiger, Ichoran and lrytz January 16, 2018 10:46
@Ichoran
Copy link
Contributor

Ichoran commented Jan 16, 2018

Doesn't this change clobber the performance of xs.map(f).drop(1000) because it can't jump to the 1000th element any more?

@julienrf
Copy link
Contributor Author

Yes.

@Ichoran
Copy link
Contributor

Ichoran commented Jan 16, 2018

Well, maybe it doesn't because map is overridden, but we should be careful to think about which operations are going to be mysteriously slow because views are used. Often we don't think very much about chained operations, but in the case of views there isn't much reason to use them explicitly unless they're going to be chained. I don't have time now to think through them all, though.

@julienrf
Copy link
Contributor Author

Well to be more precise: xs.view.map(f).drop(1000) (given xs: IndexedSeq[?]) will not take advantage of xs being an IndexedSeq to perform the drop part. I agree that this is a severe limitation. I can try to fix it by following the strategy mentioned in the PR description (override all view operations inherited from View such that they all return an IndexedView now).

@Ichoran
Copy link
Contributor

Ichoran commented Jan 16, 2018

I think there's a big difference between non-indexable and indexable operations. filter inherently can't be indexed lazily because you need to build up an intermediate data structure that is almost as expensive as just eagerly doing the filter. Same with flatMap and collect. But span and drop and map and splitAt and so on have fast ways to compute the index, so are view-friendly. Even if there isn't anything in the inheritance hierarchy that distinguishes these two types of operations, it would be good for there to be an override for all the relevant ones in IndexedView. If it's overridden, it doesn't really matter whether the type drops back to View or not; you'll still get the better performance from dynamic dispatch.

@julienrf
Copy link
Contributor Author

oh sorry we were talking about map but I was thinking of filter. So, of course xs.view.map(f).drop(1000) will jump to the 1000th element.

Copy link
Contributor

@szeiger szeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct. I think this was the intention all the time. We already have the overrides of IndexedView methods to return an IndexedView where appropriate.

@@ -294,19 +294,7 @@ trait ArrayLike[+A] extends Any {
}

/** View defined in terms of indexing a range */
trait IndexedView[+A] extends View[A] with ArrayLike[A] with SeqOps[A, View, IndexedView[A]] { self =>

final override def toSeq: immutable.Seq[A] = to(immutable.IndexedSeq)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider making this the default for Iterable.toSeq and build something like an ArrayBuffer. If you care about linear characteristics for consing / unconsing you have to request an appropriate type, otherwise you get the most efficient representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most efficient representation for what?

Building a List is faster than building a Vector, IIRC.

@julienrf julienrf merged commit 5f39e81 into scala:master Jan 22, 2018
@julienrf julienrf deleted the indexed-view branch January 22, 2018 08:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants