Implement ImmutableArray #41777

Keno · 2021-08-03T19:59:44Z

This rebases #31630 with several fixed and modifications.
After #31630, we had originally decided to hold off on said
PR in favor of implementing either more efficient layouts for
tuples or some sort of variable-sized struct type. However, in
the two years since, neither of those have happened (I had a go
at improving tuples and made some progress, but there is much
still to be done there). In the meantime, all across the package
ecosystem, we've seen an increasing creep of pre-allocation and
mutating operations, primarily caused by our lack of sufficiently
powerful immutable array abstractions and array optimizations.

This works fine for the individual packages in question, but it
causes a fair bit of trouble when trying to compose these packages
with transformation passes such as AD or domain specific optimizations,
since many of those passes do not play well with mutation. More
generally, we would like to avoid people needing to pierce
abstractions for performance reasons.

Given these developments, I think it's getting quite important
that we start to seriously look at arrays and try to provide
performant and well-optimized arrays in the language. More
importantly, I think this is somewhat independent from the
actual implementation details. To be sure, it would be nice
to move more of the array implementation into Julia by making
use of one of the abovementioned langugage features, but that
is a bit of an orthogonal concern and not absolutely required.

This PR provides an ImmutableArray type that is identical
in functionality and implementation to Array, except that
it is immutable. Two new intrinsics Core.arrayfreeze and
Core.arraythaw are provided which are semantically copies
and turn a mutable array into an immutable array and vice
versa.

In the original PR, I additionally provided generic functions
freeze and thaw that would simply forward to these
intrinsics. However, said generic functions have been omitted
from this PR in favor of simply using constructors to go
between mutable and immutable arrays at the high level.
Generic freeze/thaw functions can always be added later,
once we have a more complete picture of how these functions
would work on non-Array datatypes.

Some basic compiler support is provided to elide these copies
when the compiler can prove that the original object is
dead after the copy. For instance, in the following example:

function simple()
    a = Vector{Float64}(undef, 5)
    for i = 1:5
        a[i] = i
    end
    ImmutableArray(a)
end

the compiler will recognize that the array a is dead after
its use in ImmutableArray and the optimized implementation
will simply rewrite the type tag in the originally allocated
array to now mark it as immutable. It should be pointed out
however, that semantically there is still no mutation of the
original array, this is simply an optimization.

At the moment this compiler transform is rather limited, since
the analysis requires escape information in order to compute
whether or not the copy may be elided. However, more complete
escape analysis is being worked on at the moment, so hopefully
this analysis should become more powerful in the very near future.

I would like to get this cleaned up and merged resonably quickly,
and then crowdsource some improvements to the Array APIs more
generally. There are still a number of APIs that are quite bound
to the notion of mutable Arrays. StaticArrays and other packages
have been inventing conventions for how to generalize those, but
we should form a view in Base what those APIs should look like and
harmonize them. Having the ImmutableArray in Base should help
with that.

This rebases #31630 with several fixed and modifications. After #31630, we had originally decided to hold off on said PR in favor of implementing either more efficient layouts for tuples or some sort of variable-sized struct type. However, in the two years since, neither of those have happened (I had a go at improving tuples and made some progress, but there is much still to be done there). In the meantime, all across the package ecosystem, we've seen an increasing creep of pre-allocation and mutating operations, primarily caused by our lack of sufficiently powerful immutable array abstractions and array optimizations. This works fine for the individual packages in question, but it causes a fair bit of trouble when trying to compose these packages with transformation passes such as AD or domain specific optimizations, since many of those passes do not play well with mutation. More generally, we would like to avoid people needing to pierce abstractions for performance reasons. Given these developments, I think it's getting quite important that we start to seriously look at arrays and try to provide performant and well-optimized arrays in the language. More importantly, I think this is somewhat independent from the actual implementation details. To be sure, it would be nice to move more of the array implementation into Julia by making use of one of the abovementioned langugage features, but that is a bit of an orthogonal concern and not absolutely required. This PR provides an `ImmutableArray` type that is identical in functionality and implementation to `Array`, except that it is immutable. Two new intrinsics `Core.arrayfreeze` and `Core.arraythaw` are provided which are semantically copies and turn a mutable array into an immutable array and vice versa. In the original PR, I additionally provided generic functions `freeze` and `thaw` that would simply forward to these intrinsics. However, said generic functions have been omitted from this PR in favor of simply using constructors to go between mutable and immutable arrays at the high level. Generic `freeze`/`thaw` functions can always be added later, once we have a more complete picture of how these functions would work on non-Array datatypes. Some basic compiler support is provided to elide these copies when the compiler can prove that the original object is dead after the copy. For instance, in the following example: ``` function simple() a = Vector{Float64}(undef, 5) for i = 1:5 a[i] = i end ImmutableArray(a) end ``` the compiler will recognize that the array `a` is dead after its use in `ImmutableArray` and the optimized implementation will simply rewrite the type tag in the originally allocated array to now mark it as immutable. It should be pointed out however, that *semantically* there is still no mutation of the original array, this is simply an optimization. At the moment this compiler transform is rather limited, since the analysis requires escape information in order to compute whether or not the copy may be elided. However, more complete escape analysis is being worked on at the moment, so hopefully this analysis should become more powerful in the very near future. I would like to get this cleaned up and merged resonably quickly, and then crowdsource some improvements to the Array APIs more generally. There are still a number of APIs that are quite bound to the notion of mutable `Array`s. StaticArrays and other packages have been inventing conventions for how to generalize those, but we should form a view in Base what those APIs should look like and harmonize them. Having the `ImmutableArray` in Base should help with that.

Tokazama · 2021-08-07T14:50:47Z

Is the goal here to be able to replace SArray with something like this...

struct SArray{S,T,N,L}
    data::ImmutableArray{T,N}
end

...or is it a unique thing entirely.

timholy · 2021-08-07T20:12:00Z

These are heap-allocated (when they allocate...) and so it's a little different, but the basic idea is the same. For example the construct

X .+= 1

might be the best way to implement an elementwise-increment if X is mutable, but it's an error if X is an immutable. As you know well from your work on ArrayInterface, that makes it harder to write generic code. If we have immutable arrays, then it should be easier to write this as

X += 1

(without the dot) and have the compiler determine that there are circumstances where the operation could be performed in place. In other words, it should allow code to become more generic without loss of performance.

However, the escape-analysis that @Keno refers to would be important for making this a reality.

AriMKatz · 2021-08-08T04:21:24Z

Related: @tkf 's https://github.com/tkf/Mutabilities.jl

For reference, I believe this is the escape analysis work? https://github.com/aviatesk/EscapeAnalysis.jl

tkf · 2021-08-08T04:59:29Z

(I think Keno's strategy mentioned in the OP was to keep it minimal and postpone harder design decisions. So, I posted my comment in the original PR which already contained various discussions #31630 (comment))

chriselrod · 2021-08-09T04:49:38Z

Any plans to communicate immutability via alias info or invariance to LLVM?

fingolfin · 2021-09-02T21:58:28Z

What is the rationale for providing a "thaw" function, though? Doesn't the mere possibility that one can "thaw" an "immutable" array render some optimizations impossible?

andyferris · 2021-09-02T22:30:47Z

Doesn't the mere possibility that one can "thaw" an "immutable" array render some optimizations impossible?

My understand is no - thaw would make a mutable copy in the default case, unless the compiler can prove you are thawing the only live reference to the array in which case it is safe to simply make it mutable without copying.

tecosaur · 2021-09-05T08:55:38Z

I'm curious, will this allow for a similar performance to that currently seen with StaticArrays for small matrix operations? If so, that would be brilliant to have in Base.

Multiplying a 2x2 matrix ~1350x faster with StaticArrays and Julia 1.6

timholy · 2021-09-05T09:03:37Z

@tecosaur, times that are much less than the duration of a CPU clock tick indicate that the compiler is just eliminating the entire workload. Here's the right way to do it:

julia> @btime A*A setup=(A=rand(SMatrix{2,2}));
  1.485 ns (0 allocations: 0 bytes)

julia> @btime A*A setup=(A=rand(2,2));
  36.058 ns (1 allocation: 112 bytes)

And no, this proposal won't narrow the entire gap. The sizes aren't static, the values are. But if the compiler doesn't actually have to instantiate the array, then most of that time may disappear.

tecosaur · 2021-09-05T10:08:14Z

Ah, thanks for explaining that @timholy 👍. Given the still huge performance difference that your benchmark shows I think it would be nice if I didn't need a package for high-performance small matrix operations, but it's nice to hear that this proposal may narrow the gap.

chriselrod · 2021-09-05T16:26:40Z

It's the stack allocation and static sizing that are good for performance.
Immutability is largely orthogonal to these (but can potentially enable some optimizations, like removing alias checks when used with mutated arrays).

jpsamaroo · 2021-09-05T17:17:15Z

Immutability can also be good for performance. If I understand this PR correctly, when we freeze an array, we've frozen its size, shape, and values, and thus multiple loads of the same attributes or values may potentially be coalesced by the optimizer.

chriselrod · 2021-09-05T19:12:14Z

Immutability can also be good for performance. If I understand this PR correctly, when we freeze an array, we've frozen its size, shape, and values, and thus multiple loads of the same attributes or values may potentially be coalesced by the optimizer.

This should be happening in many cases anyway, but TBAA information often fails to propagate.

I also don't know if it's just that the associated LLVM pass isn't turned on (like why @llvm.expect doesn't work), but invariant information doesn't really seem to work / enable the optimizations I would expect it to in Julia.

tkf · 2021-09-05T23:12:40Z

when we freeze an array, we've frozen its size, shape, and values

In principle, we can freeze values and "shape" separately (as a possible concrete API, see freezevalue and freezeindex in Mutabilities.jl). Value-frozen and shape-not-frozen vector can act as an append-only data structure where the compiler can assume loaded value won't change even after new values are appended. Not sure if how much is doable at LLVM level since it'd look like the buffer is swapped to a larger one and then memcpy'ed. Maybe it could be useful if you incrementally build an array while passing intermediate result to some other outlined functions.

This should be happening in many cases anyway, but TBAA information often fails to propagate.

I think adding invariance at Julia's type-level helps things LLVM (alone) cannot reason about. For example, in

xs::ImmutableArray
x = xs[1]
dynamic_dispatch(xs)
x = xs[1]

the second load can be elided by the Julia compiler (or Julia helping LLVM?).

vtjnash · 2022-02-01T23:05:03Z

Moved to #42465

c42f mentioned this pull request Aug 4, 2021

Add similar_type andyferris/Dictionaries.jl#54

Open

Keno force-pushed the kf/immutablearray branch from 5fd24d2 to fcbafaa Compare August 4, 2021 15:48

Tokazama mentioned this pull request Aug 5, 2021

Approach to alloc JuliaSIMD/ManualMemory.jl#5

Open

tkf mentioned this pull request Aug 8, 2021

Julep/Very WIP - Heap allocated immutable arrays and compiler support #31630

Closed

KristofferC mentioned this pull request Sep 8, 2021

Immutable Collections #42157

Closed

ianatol mentioned this pull request Oct 2, 2021

ImmutableArrays #42465

Closed

vtjnash marked this pull request as draft October 2, 2021 17:19

ooovi mentioned this pull request Oct 6, 2021

Way to mark function as "black box" DiffMu/DiffPrivacyInferenceHs#69

Closed

vtjnash closed this Feb 1, 2022

vtjnash deleted the kf/immutablearray branch February 1, 2022 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ImmutableArray #41777

Implement ImmutableArray #41777

Keno commented Aug 3, 2021

Tokazama commented Aug 7, 2021

timholy commented Aug 7, 2021

AriMKatz commented Aug 8, 2021 •

edited

Loading

tkf commented Aug 8, 2021

chriselrod commented Aug 9, 2021

fingolfin commented Sep 2, 2021

andyferris commented Sep 2, 2021

tecosaur commented Sep 5, 2021 •

edited

Loading

timholy commented Sep 5, 2021 •

edited

Loading

tecosaur commented Sep 5, 2021

chriselrod commented Sep 5, 2021 •

edited

Loading

jpsamaroo commented Sep 5, 2021

chriselrod commented Sep 5, 2021

tkf commented Sep 5, 2021

vtjnash commented Feb 1, 2022

Implement ImmutableArray #41777

Implement ImmutableArray #41777

Conversation

Keno commented Aug 3, 2021

Tokazama commented Aug 7, 2021

timholy commented Aug 7, 2021

AriMKatz commented Aug 8, 2021 • edited Loading

tkf commented Aug 8, 2021

chriselrod commented Aug 9, 2021

fingolfin commented Sep 2, 2021

andyferris commented Sep 2, 2021

tecosaur commented Sep 5, 2021 • edited Loading

Multiplying a 2x2 matrix ~1350x faster with StaticArrays and Julia 1.6

timholy commented Sep 5, 2021 • edited Loading

tecosaur commented Sep 5, 2021

chriselrod commented Sep 5, 2021 • edited Loading

jpsamaroo commented Sep 5, 2021

chriselrod commented Sep 5, 2021

tkf commented Sep 5, 2021

vtjnash commented Feb 1, 2022

AriMKatz commented Aug 8, 2021 •

edited

Loading

tecosaur commented Sep 5, 2021 •

edited

Loading

timholy commented Sep 5, 2021 •

edited

Loading

chriselrod commented Sep 5, 2021 •

edited

Loading