Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Primitive array types in ArrayOps / ArrayBuilder / ImmutableArray #278

Merged
merged 1 commit into from
Jan 8, 2018

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Oct 27, 2017

This is similar to the old implementation. Primitive array types are not
polymorphic on the JVM so we need to special-case them in order to be
able to efficiently wrap existing arrays as collections.

@szeiger szeiger force-pushed the wip/specialized-arrays branch from 0378cb4 to 8a711b7 Compare October 27, 2017 17:28
@szeiger
Copy link
Contributor Author

szeiger commented Oct 27, 2017

The test class doesn't compile on Dotty anymore with these changes. The error message is not very helpful:

[info] Compiling 1 Scala source to /Users/szeiger/code/scala/collection-strawman/collections/.jvm/target/scala-0.4/test-classes...
[error]
[error] Type argument strawman.collection.immutable.IndexedSeq has not the same kind as its bound  <: [_$61] => scala.Any
[error] one error found
[error] (collectionsJVM/test:compileIncremental) Compilation failed

// contract all empty ones must be equal, so discriminating based on the reference
// equality of an empty array should not come up) but we may as well be
// conservative since wrapRefArray contributes most of the unnecessary allocations.
def make[T](x: AnyRef): ArrayView[T] = (x match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type of x AnyRef rather than Array[_] (for instance)? Do we want this method to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently done with AnyRef but it should work with Array[_] (which also erases to Object). Since this is the only way of creating an ArrayView that takes primitive arrays into account, it needs to be public.

def elemTag = ClassTag.Unit
def length: Int = array.length
def apply(index: Int): Unit = array(index)
override def hashCode = arrayViewHash(array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to override ArrayView’s hashcode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make use of specialization

case x: Array[Unit] => new ofUnit(x)
}).asInstanceOf[ImmutableArray[T]]

final class ofRef[T <: AnyRef](val array: Array[T]) extends ImmutableArray[T] with Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the well established naming convention where class names begin with an upper case letter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're public and copied directly from the current API. I wouldn't want to do any unnecessary breaking changes here.

while (newsize < size) newsize *= 2
resize(newsize)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an idea of the loss of performance we would get if we defined these methods in ArrayBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't benchmarked it but I wouldn't expect any performance difference. ensureSize and sizeHint don't touch the actual array. They should be OK to pull up.

*
* @tparam T type of elements for the array builder, subtype of `AnyRef` with a `ClassTag` context bound.
*/
final class ofRef[T <: AnyRef : ClassTag] extends ArrayBuilder[T] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make those private? Scala.js does not like them. We have an entirely different implementation of ArrayBuilder in Scala.js. For example see https://github.com/scala-js/scala-js/blob/7ea476aa62349f681bb2dbba16b7ffa6470bd786/scalalib/overrides-2.13/scala/collection/mutable/ArrayBuilder.scala#L32-L73

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types are exposed as part of the API so that you can use them directly with unboxed primitive element types in all cases where you don't need polymorphism over primitive and reference types. Having to code against a non-specialized superclass means you need to box and unbox every single element you append to the Builder.

@szeiger szeiger force-pushed the wip/specialized-arrays branch from 8a711b7 to ff24f7d Compare November 3, 2017 15:55
@szeiger
Copy link
Contributor Author

szeiger commented Nov 3, 2017

Updated:

  • Rebased
  • Workaround for Dotty bug (see comment in ArrayOps.scala)
  • Removed ArrayView in favor of ImmutableArray
  • More abstraction in specialized ArrayBuilders

@szeiger szeiger force-pushed the wip/specialized-arrays branch from ff24f7d to 6ba4846 Compare November 3, 2017 16:26

def toIterable = ArrayView(xs)
def toIterable: ImmutableArray[A] = ImmutableArray.wrapArray(xs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it is not safe to expose it as an immutable one because the underlying array xs is still available and can be mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a general risk with ImmutableArray but it is unavoidable if we want to make the default Seq a collection.immutable.Seq and support varargs as Seq.

Or should we limit the implicit conversion from Array to produce a generic Seq? Varargs could still expose it as an immutable one. It may still be somewhat confusing once we make the immutable Seq the default one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would expose collection.IndexedSeq as the public result type of toIterable. We can still safely use ImmutableArray to support varargs since the reference to the underlying mutable Array will not leak (or will it?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will but I'd classify that as an abuse of varargs. If you mutate an array after passing it to a varags method all bets are off.

@@ -104,7 +104,7 @@ class LowPriority {
import strawman.collection._

/** Convert array to iterable via view. Lower priority than ArrayOps */
implicit def arrayToView[T](xs: Array[T]): ArrayView[T] = ArrayView[T](xs)
implicit def arrayToImmutableArray[T](xs: Array[T]): immutable.ImmutableArray[T] = immutable.ImmutableArray.wrapArray[T](xs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, we should use collection.IndexedSeq[T] as a return type instead of immutable.ImmutableArray[T]


final class ofRef[T <: AnyRef](val array: Array[T]) extends ImmutableArray[T] with Serializable {
protected def elements: Array[T] = array
def length: Int = array.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pull up length’s implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would require a polymorphic array field (which erases to Object) in ImmutableArray. All methods which operate on the actual array field need to be in the subclasses.

Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments I left: we shouldn’t publish the Array as an immutable collection as long as the reference to the underlying mutable Array has also publicly leaked.

@szeiger szeiger force-pushed the wip/specialized-arrays branch from 6ba4846 to 987cf68 Compare November 7, 2017 16:27
@szeiger
Copy link
Contributor Author

szeiger commented Nov 7, 2017

Updated to return a generic IndexedSeq as the static type when implicitly wrapping an Array as an ImmutableArray.


protected def fromTaggedIterable[B: ClassTag](coll: Iterable[B]): Array[B] = coll.toArray[B]

// Note: Declaring the return type as ArrayOps.WithFilter[A] breaks Dotty 0.4.0-RC1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we won’t get back an Array but an IndexedSeq if we call map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you file an issue in the dotty repository (lampepfl/dotty)? These issues have high priority and might be fixed in a few days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strangely enough I was unable to reproduce the error again. Let's see how CI fares.

ImmutableArray.wrapArray(elements)
}

def wrapArray[T](x: AnyRef): ImmutableArray[T] = (x match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method have to be public? We should at least document what it does and also warn about the fact that it doesn’t create a copy of the given Array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be public in the bytecode so that the codegen can generate code that calls it, in cases that are known to be safe (i.e., varargs generation). However it could be private-ish at the language level.

One way is to make it protected[immutable].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will buy us anything to hide the method. The specialized subclasses expose the constructors and the wrapped arrays, and both are necessary to write high-performance code that uses ImmutableArray. Also exposing wrapArray on top of that does not make it any less safe. The current collections do the same and I have not seen any problems in the wild that were caused by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it non-public would just prevent confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current collections do the same and I have not seen any problems in the wild that were caused by this.

Hum ... the current collections don't have ImmutableArrays to begin with, so I'm not sure what you're talking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called WrappedArray in the current collections. (WrappedArray is a mutable collection though, so maybe it would be more problematic for ImmutableArray)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrappedArray is a mutable collection though

Ah, well, yes. That changes everything!

@szeiger szeiger force-pushed the wip/specialized-arrays branch from 987cf68 to 2b4a83c Compare November 13, 2017 15:48
@szeiger
Copy link
Contributor Author

szeiger commented Nov 13, 2017

Opened scala/scala3#3467 against Dotty

@szeiger
Copy link
Contributor Author

szeiger commented Nov 17, 2017

So what should we do about the ImmutableArray constructors / extractors? IMHO being able to wrap and unwrap arrays into collections is too important to forego.

  • For decent varargs support with the default scala.Seq being immutable we need the full ImmutableArray, including specialization. Access to internals could be hidden in the scala namespace. But we have two additional use cases:

  • Explicit conversion from arrays to collections: Can we rely on users to use this responsibly when we pretend that an ImmutableArray is immutable even though we expose the non-copying constructors? I think we can. It is already commonplace to treat mutable collections as effectively immutable after construction. ImmutableArray goes a bit further by being an immutable.Seq but it provides sensible semantics for this use case: The additional methods for immutable Seqs are implemented without any structural sharing. If we can't agree on this interpretation we will have to bring WrappedArray back. It would be exactly the same as ImmutableArray except that it implements mutable.Seq instead of immutable.Seq (and is therefore not as convenient to use because it is not a scala.Seq anymore)

  • Implicit array wrapping: Pretending to have an immutable collection could be dangerous because these conversions are not always obvious. I would make this contingent on the decision for the previous case. If we add WrappedArray, that's the right one to use, otherwise use ImmutableArray but declare the return type as collection.Seq rather than immutable.Seq.

@julienrf
Copy link
Contributor

Explicit conversion from arrays to collections: Can we rely on users to use this responsibly when we pretend that an ImmutableArray is immutable even though we expose the non-copying constructors? I think we can

In such a case I would make it as clear as possible in the method name. E.g. ImmutableArray.fromArrayNoCopy.

@sjrd
Copy link
Member

sjrd commented Nov 17, 2017

I definitely think we need WrappedArray to wrap an existing Array into a Seq.

The current collection.Colls are read-only, not immutable. We do not have precedent to create an immutable.Coll that we can still mutate. immutable.Colls need to stay immutable.

@julienrf
Copy link
Contributor

julienrf commented Nov 17, 2017

@sjrd What about the following implementation?

package object collection {
  type WrappedArray[A] <: Seq[A]
  def wrap[A](array: Array[A]): WrappedArray[A] = ImmutableArray.fromArrayNoCopy(array)
}

My point is that we can always statically expose the collection.Seq type even though the actual implementation type is ImmutableArray. This approach would fail, though, if one checks the type of his collection at runtime… (xs match { case _: immutable.Seq[_] => … })

@sjrd
Copy link
Member

sjrd commented Nov 17, 2017

In addition to the run-time type check issue, that implementation is not helpful when one wants to wrap an Array and give to someone who wants a mutable.Seq, in order to actually mutate it.

@Ichoran
Copy link
Contributor

Ichoran commented Nov 17, 2017

@sjrd - immutable.BitSet has fromBitMaskNoCopy that results in an "immutable" object where the backing array can be mutated. So there is precedent. (That doesn't make it a good idea, though.)

@julienrf
Copy link
Contributor

@szeiger I think that bringing back WrappedArray would make sense. I created #300 to keep track of that work (which is out of the scope of that PR). In the meanwhile do you agree with making fromArray package private?

@szeiger
Copy link
Contributor Author

szeiger commented Nov 28, 2017

I still think we need non-private wrapping and unwrapping even in the presence of WrappedArray. This is also the only way to implement them separately but I'd still prefer to add WrappedArray back to this PR (it was present in the original version) directly.

@julienrf julienrf added this to the 0.7.0 milestone Nov 30, 2017
@szeiger
Copy link
Contributor Author

szeiger commented Nov 30, 2017

Looking at WrappedArray again I wonder why we should have it at all. Why not add the manual specialization to ArrayBuffer instead? There doesn't seem to be any substantial difference between the current ArrayBuffer and WrappedArray.ofRef.

@julienrf
Copy link
Contributor

The difference is that ArrayBuffer is a Growable, whereas WrappedArray isn’t. Not sure if it is that important in practice, though.

@sjrd
Copy link
Member

sjrd commented Nov 30, 2017

WrappedArray guarantees that it is a view of an Array. Changes to one are visible from the other. ArrayBuffer is just a buffer that happens to backed, privately, by an Array (to provide its complexity guarantees). That's the fundamental difference (much more so than Growable versus not-growable).

@julienrf
Copy link
Contributor

Right @sjrd I missed that one and I agree it’s important.

@szeiger
Copy link
Contributor Author

szeiger commented Nov 30, 2017

We'll need to make some bigger changes for that. Currently every mutable.Iterable is Growable and furthermore every mutable.Seq is Shrinkable.

@szeiger szeiger force-pushed the wip/specialized-arrays branch 3 times, most recently from aa72779 to 4be1730 Compare December 1, 2017 18:18
@szeiger
Copy link
Contributor Author

szeiger commented Dec 4, 2017

Everything should be ready here, including WrappedArray and rebasing to build on Dotty 0.5.0

@julienrf
Copy link
Contributor

julienrf commented Dec 4, 2017

I’m a bit lost with the builders. Why do we need ArrayBuilder at all? Can’t ArrayBuffer[A] be a Builder[A, Array[A]] (like we do with ListBuffer)?

def newBuilder[A](): Builder[A, WrappedArray[A]] =
ArrayBuffer.newBuilder[A]().mapResult(fromArrayBuffer)

def newBuilder[A](implicit t: ClassTag[A]): Builder[A, WrappedArray[A]] = new WrappedArrayBuilder[A](t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need WrappedArrayBuilder? Can’t we use ArrayBuilder.mapResult(fromArray)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason why we couldn't.

}

def fromArrayBuffer[A](arr: ArrayBuffer[A]): WrappedArray[A] =
make[A](arr.asInstanceOf[ArrayBuffer[Any]].toArray)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fromArrayBuffer rather than fromArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have that in the form of make. fromArrayBuffer is only used in two places. We could inline it or make the method private.

@Ichoran
Copy link
Contributor

Ichoran commented Dec 4, 2017

At least in the existing collections, ArrayBuffer stores elements as AnyRef; ArrayBuilder has to build an array of the correct type (including primitive type, if asked for).

@julienrf julienrf modified the milestones: 0.7.0, 0.8.0 Dec 6, 2017
@julienrf julienrf modified the milestones: 0.8.0, 0.9.0 Jan 3, 2018
@szeiger szeiger force-pushed the wip/specialized-arrays branch from 4be1730 to b0cbdf9 Compare January 3, 2018 18:14
@szeiger
Copy link
Contributor Author

szeiger commented Jan 3, 2018

Updated: Removed WrappedArrayBuilder, made WrappedArray.fromArrayBuffer private, rebased

class ArrayOps[A](val xs: Array[A]) extends AnyVal
with IterableOnce[A]
with IndexedSeqOps[A, immutable.IndexedSeq, Array[A]]
with StrictOptimizedIterableOps[A, Seq, Array[A]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StrictOptimizedSeqOps is even better!

@throws[ArrayIndexOutOfBoundsException]
def apply(n: Int) = xs(n)
override def className = "ArrayView"
def append[B >: A : ClassTag](x: B): Array[B] = fromTaggedIterable(View.Append(toIterable, x))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods have been renamed appended and prepended.

We should also add appendedAll, prependedAll, and concat, BTW.

extends IndexedSeq[A]
with IndexedSeqOps[A, ImmutableArray, ImmutableArray[A]]
with StrictOptimizedSeqOps[A, ImmutableArray, ImmutableArray[A]] {

def iterableFactory: SeqFactory[ImmutableArray] = ImmutableArray

/* The wrapped mutable `Array` */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clearly remind that mutating the Array will break the immutability contract.

/* The wrapped mutable `Array` */
def unsafeArray: Array[A @uncheckedVariance]
// uncheckedVariance should be safe: Array[A] for reference types A is covariant at the JVM level. Array[A] for
// primitive types A can only be widened to Array[Any] which erases to Object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is not safe:

val rectangles: ImmutableArray[Rectangle] = ImmutableArray(rect1, rect2)
val shapes: ImmutableArray[Shape] = rectangles // assuming Rectangle is a Shape
val shapesArray: Array[Shape] = shapes.unsafeArray
shapesArray(0) = someCircle // assuming Circle is a Shape which is not a Rectangle
rectangles.foreach(rect => println(rect.height)) // ClassCastException! Circle can not be cast to Rectangle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this? I would expect it to fail with an ArrayStoreException in shapesArray(0) = someCircle. But both this error and the one you demonstrate are due to the inherent unsoundness of covariant arrays.

This could be mitigated in ImmutableArray by making it invariant, but for an immutable array this should not be necessary. The JVM array actually is covariant and. If you use it in a mutable way all bets are off. That's why it is called unsafe in the first place. Any immutable use should be sound.

}

/** Wrap an existing `Array` into an `ImmutableArray` of the proper primitive specialization type without copying. */
def unsafeWrapArray[T](x: AnyRef): ImmutableArray[T] = (x match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious here: x: Array[_] doesn’t work?

case x: Array[Short] => new ofShort(x)
case x: Array[Boolean] => new ofBoolean(x)
case x: Array[Unit] => new ofUnit(x)
}).asInstanceOf[ImmutableArray[T]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end we never statically know that we have an underlying ofInt instance. We always manipulate an ImmutableArray[X]. How does the JVM know that it doesn’t have to box/unbox elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Any access through ImmutableArray is boxed. If you want unboxed access you need to cast down or instantiate a primitive subclass directly. It's the same for WrappedArray. In theory a @specialized annotation should be beneficial for both cases but since WrappedArray isn't currently specialized I didn't want to do any speculative changes without more complete benchmarks.

if(s > 0) b.sizeHint(s)
b ++= coll
WrappedArray.make(b.result())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like iterableFactory.from(coll) would be a better implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the IterableFactory cannot make use of the existing elemTag. It would always have to build a boxed array.

b ++= coll
WrappedArray.make(b.result())
}
protected[this] def newSpecificBuilder(): Builder[T, WrappedArray[T]] = ArrayBuilder.make()(elemTag).mapResult(WrappedArray.make)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have different implementations here and in WrappedArray.newBuilder?

@julienrf
Copy link
Contributor

julienrf commented Jan 4, 2018

Did you measure the performance difference?

This is similar to the old implementation. Primitive array types are not
polymorphic on the JVM so we need to special-case them in order to be
able to efficiently wrap existing arrays as collections.

We also remove ArrayView in favor of WrappedArray and ImmutableArray.
With the new equality semantics (where Views are never equal unless they
are the same reference) using a View as a default wrapper type for Array
is problematic. ImmutableArray is basically the same as ArrayView, except
it is an immutable.Seq and WrappedArray is the same for mutable.Seq, so
we get the expected equality semantics and strict operations for both
of them.

Implicit conversions from Array to Seq always produce a WrappedArray.
This is the safe choice because there is no guarantee that the array
is effectively immutable. ImmutableArray is intended for use in
compiler-generated varargs bridges and manually by the user via an
explicit conversion call.

Unlike the 2.12 collections the strawman currently requires a mutable
Seq to be Growable and Shrinkable. This needs to be changed in order to
implement WrappedArray. The new definition of mutable.Seq allows for
fixed-size collections like WrappedArray. A mutable.Seq which is
growable and shrinkable is a mutable.Buffer.

We also introduce mutable.IndexedSeq as a new type. It does not define
any new operations of its own but merely combines mutable.Seq with
collection.IndexedSeq for orthogonality with immutable.IndexedSeq.
@szeiger szeiger force-pushed the wip/specialized-arrays branch from b0cbdf9 to 848f5d5 Compare January 5, 2018 14:35
@szeiger
Copy link
Contributor Author

szeiger commented Jan 5, 2018

Addressed review comments. I didn't do any performance tests since this only brings the implementation up to the previous standard in 2.12.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants