-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julep/Very WIP - Heap allocated immutable arrays and compiler support #31630
Conversation
Yes! I'm really glad this is being looked at, @Keno. I have been thinking of/wanting a I think an immutable array would be a wonderful improvement - but I think it is only the beginning, and we can take the idea further. There are a bunch of related things that I feel an interface like this can solve. In no particular order:
I think there are massive opportunities here. :) Are there ambitions for these kind of broad changes for Julia v2.0? |
|
||
JL_CALLABLE(jl_f_arraymelt) | ||
{ | ||
JL_NARGSV(arrayfreeze, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arraymelt
?
|
||
JL_CALLABLE(jl_f_mutating_arrayfreeze) | ||
{ | ||
JL_NARGSV(arrayfreeze, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutating_arrayfreeze
?
return (jl_value_t*)na; | ||
} | ||
|
||
JL_CALLABLE(jl_f_mutating_arrayfreeze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I was thrown by what this name meant. It makes sense in the end; in this case you are sort-of mutating the type of a
but not the value/data... though I did wonder if it would be better language to describe this as "taking" or "stealing" a
or something like that.
The most trivial possible comment, but I think that How does thawing/melting work? If the compiler can prove that there's only one reference to an object, I can see it but what about the general case? What if there are multiple references to the same object? Does the general case require doing a full GC to figure out if there's only one reference? |
@StefanKarpinski My interpretation was in the general case you would make a copy, which you may then mutate without affecting the other readers of the original immutable. |
Ideally, we would check the size of the buffer and use something like Then the price for the compiler failing to prove safety of re-using the buffer during melt/freeze would be a syscall, not a copy (obviously only good for non-tiny arrays). In the common case that only part of the array gets modified after melting, or only part of the parent array gets modified after freezing, we would only copy (and allocate) a handful of pages, instead of a giant object. The resulting |
Tangentially related, the new |
@chethega: that does seem like a very cool implementation strategy if it can be made to work well. |
Yes, I've been thinking about how to make this work. I think there's something here, but it probably needs to be more complicated than just freezing an immutable struct.
Don't know yet, that's part of the reason to make these PRs. One could imagine that the default
Yes, ideally.
I haven't really thought through how this API extends to non-array collections yet, so I don't yet have an answer to these questions. I think it'll depend on the answer to question 1. Ideally these various combinations would just fall out.
Maybe, but I'm not entirely sure. Part of the appeal of SArray is that you can specialize for every size of the array and do size-specific optimizations. That could potentially be replaced by constant prop and specialization hints, but for me it's not currently in scope to do anything about this. I do plan some general infrastructure improvements to make SArray perform better.
Yes |
Yes, runtime support for using the MMU to lazily perform the copy is very much in scope. |
Yes, as @andyferris said, you get a copy (or something that semantically behaves like one) in either direction if the compiler can't prove that there's currently only a single reference. One could imagine in the future having a mode of the language that enforces that property by disallowing certain values from escaping or doing static analysis to that extent (a poor man's borrow checker), but that's not really necessary to design for the current proposal. |
So, I did some quick googling and thinking on how to possibly use the MMU for copy-on-write freeze/thaw. Unfortunately, it does not look pretty. At least the linux kernel appears to fail to provide us with the necessary tools. Refs: 1, 2, [3] manpages: One way that works on most operating systems is the following ugly kludge: Large buffer allocations don't go to Now we have five operations: allocate, freeze, thaw, free, segfault. I already talked about allocation (grab free pages from file or grow file, On segfault, we need to check whether we attempted to write to one of our It is unfortunate that the linux kernel apparently does not permit us to create complex mappings and use its in-built pagefault handler for copy-on-write (as used by A more elegant approach is maybe to Or is there another way of offloading all this to the kernel? |
Your analysis is basically correct. Dual-mapped memory regions are not super convenient on linux. The one mechanism you missed is
|
In particular something like https://lwn.net/ml/linux-api/20180328101729.GB1743@rapoport-lnx/ might help if it ever gets added. |
This brings to mind the following speculation: I've long wanted to claim the array literal syntax These kind of short array literals are quite pervasive in certain types of code. It would be really nice if this minimalistic syntax also produced objects with minimal runtime overhead. |
You mean |
I've always wondered why we don't just give Tuples a full abstract array interface to make them the real SVector. Would there be any weird side effects? |
Continuing this thought, would it be a bad idea if (1 2
3 4) constructed an equivalent of |
Except we really want something like #31681; otherwise there are some poor corner cases of the type system that get angry w/ things like |
By the time you've got 10_000 elements, you're not really dealing with small literal arrays anymore though, so it's unclear if one really needs the |
I think the main issue is that they have a different type-level structure; if they are subtypes of |
If we could make it work, I'd take it! But the behavior of
Which leads to wondering whether array literals must be mutable and if not whether size information could be added to them. Side note... I once implemented unicode brackets as an experiment for small array literals |
Oh I would love to parse all those brackets if we could decide how to handle them. Making literal arrays immutable also makes sense to me. |
Is julia> {1,2,3}
ERROR: syntax: { } vector syntax is discontinued |
I object. Having If we want better syntax for literal static vectors, then something like |
I don't think this was the idea, in both cases you have an array literal. |
I do think that having |
@cthega I always liked |
Btw, this StackOverflow post "Allocating copy on write memory within a process" has some unsuccessful attempts that may be interesting. On the idea of a lightweight borrow-checker alternative, "deny capabilities" seem powerful and easier to understand/manage. See paper, video, more info introducing a handful of capabilities and alias types for high-performance safe concurrency (and implementation in the Pony language). Another interesting development is "Separating Permissions from Data in Rust", ICFP 2021. |
This rebases #31630 with several fixed and modifications. After #31630, we had originally decided to hold off on said PR in favor of implementing either more efficient layouts for tuples or some sort of variable-sized struct type. However, in the two years since, neither of those have happened (I had a go at improving tuples and made some progress, but there is much still to be done there). In the meantime, all across the package ecosystem, we've seen an increasing creep of pre-allocation and mutating operations, primarily caused by our lack of sufficiently powerful immutable array abstractions and array optimizations. This works fine for the individual packages in question, but it causes a fair bit of trouble when trying to compose these packages with transformation passes such as AD or domain specific optimizations, since many of those passes do not play well with mutation. More generally, we would like to avoid people needing to pierce abstractions for performance reasons. Given these developments, I think it's getting quite important that we start to seriously look at arrays and try to provide performant and well-optimized arrays in the language. More importantly, I think this is somewhat independent from the actual implementation details. To be sure, it would be nice to move more of the array implementation into Julia by making use of one of the abovementioned langugage features, but that is a bit of an orthogonal concern and not absolutely required. This PR provides an `ImmutableArray` type that is identical in functionality and implementation to `Array`, except that it is immutable. Two new intrinsics `Core.arrayfreeze` and `Core.arraythaw` are provided which are semantically copies and turn a mutable array into an immutable array and vice versa. In the original PR, I additionally provided generic functions `freeze` and `thaw` that would simply forward to these intrinsics. However, said generic functions have been omitted from this PR in favor of simply using constructors to go between mutable and immutable arrays at the high level. Generic `freeze`/`thaw` functions can always be added later, once we have a more complete picture of how these functions would work on non-Array datatypes. Some basic compiler support is provided to elide these copies when the compiler can prove that the original object is dead after the copy. For instance, in the following example: ``` function simple() a = Vector{Float64}(undef, 5) for i = 1:5 a[i] = i end ImmutableArray(a) end ``` the compiler will recognize that the array `a` is dead after its use in `ImmutableArray` and the optimized implementation will simply rewrite the type tag in the originally allocated array to now mark it as immutable. It should be pointed out however, that *semantically* there is still no mutation of the original array, this is simply an optimization. At the moment this compiler transform is rather limited, since the analysis requires escape information in order to compute whether or not the copy may be elided. However, more complete escape analysis is being worked on at the moment, so hopefully this analysis should become more powerful in the very near future. I would like to get this cleaned up and merged resonably quickly, and then crowdsource some improvements to the Array APIs more generally. There are still a number of APIs that are quite bound to the notion of mutable `Array`s. StaticArrays and other packages have been inventing conventions for how to generalize those, but we should form a view in Base what those APIs should look like and harmonize them. Having the `ImmutableArray` in Base should help with that.
This rebases #31630 with several fixed and modifications. After #31630, we had originally decided to hold off on said PR in favor of implementing either more efficient layouts for tuples or some sort of variable-sized struct type. However, in the two years since, neither of those have happened (I had a go at improving tuples and made some progress, but there is much still to be done there). In the meantime, all across the package ecosystem, we've seen an increasing creep of pre-allocation and mutating operations, primarily caused by our lack of sufficiently powerful immutable array abstractions and array optimizations. This works fine for the individual packages in question, but it causes a fair bit of trouble when trying to compose these packages with transformation passes such as AD or domain specific optimizations, since many of those passes do not play well with mutation. More generally, we would like to avoid people needing to pierce abstractions for performance reasons. Given these developments, I think it's getting quite important that we start to seriously look at arrays and try to provide performant and well-optimized arrays in the language. More importantly, I think this is somewhat independent from the actual implementation details. To be sure, it would be nice to move more of the array implementation into Julia by making use of one of the abovementioned langugage features, but that is a bit of an orthogonal concern and not absolutely required. This PR provides an `ImmutableArray` type that is identical in functionality and implementation to `Array`, except that it is immutable. Two new intrinsics `Core.arrayfreeze` and `Core.arraythaw` are provided which are semantically copies and turn a mutable array into an immutable array and vice versa. In the original PR, I additionally provided generic functions `freeze` and `thaw` that would simply forward to these intrinsics. However, said generic functions have been omitted from this PR in favor of simply using constructors to go between mutable and immutable arrays at the high level. Generic `freeze`/`thaw` functions can always be added later, once we have a more complete picture of how these functions would work on non-Array datatypes. Some basic compiler support is provided to elide these copies when the compiler can prove that the original object is dead after the copy. For instance, in the following example: ``` function simple() a = Vector{Float64}(undef, 5) for i = 1:5 a[i] = i end ImmutableArray(a) end ``` the compiler will recognize that the array `a` is dead after its use in `ImmutableArray` and the optimized implementation will simply rewrite the type tag in the originally allocated array to now mark it as immutable. It should be pointed out however, that *semantically* there is still no mutation of the original array, this is simply an optimization. At the moment this compiler transform is rather limited, since the analysis requires escape information in order to compute whether or not the copy may be elided. However, more complete escape analysis is being worked on at the moment, so hopefully this analysis should become more powerful in the very near future. I would like to get this cleaned up and merged resonably quickly, and then crowdsource some improvements to the Array APIs more generally. There are still a number of APIs that are quite bound to the notion of mutable `Array`s. StaticArrays and other packages have been inventing conventions for how to generalize those, but we should form a view in Base what those APIs should look like and harmonize them. Having the `ImmutableArray` in Base should help with that.
Replying to #41777 (comment), the idea I explored in Mutabilities.jl was (1) how can we expose ownership to the user so that we can memory-optimized code for non-inplace surface API and (2) if manually doing so right now is beneficial without a compiler support. The answer to the second question was "kind of, but it's cumbersome." So I'm keep using the more direct linear/affine update API provided via BangBang.jl, Mutabilities.jl is essentially a copy of C++'s move (... I think. Not that I know C++). There might be better interfaces but I think it's still a valid design direction for exposing linear and affine updates semantics to user-defined function. Though maybe a fundamental question is if it should be exposed at the language level and if so, if it should be an "unsafe" or checked API. |
Some mutable value semantic stuff from the s4tf team: https://arxiv.org/abs/2106.12678 From @tkf :
In the purest form of mutable value semantics, references are second-class: they are only created implicitly, at function boundaries, and cannot be stored in variables or object fields. Hence, variables can never share mutable state.
|
Rebased and moved to #41777 |
This is part of a larger set of overhauls I'd like to do in the 2.0 timeframe (along with #21912 and other things along these lines). As such this is more of a straw-man implementation to play with various ideas. I doubt any of this code will get merged as is, but should provide a place for experimentation and we may start picking off good ideas from here.
The basic concept here is that I think we're missing a heap-allocated immutable array. We have Array (mutable and dynamically sized), StaticArray (immutable and statically sized) and MArray (mutable and statically sized), but we don't really have an immutable dynamically sized array. This PR adds that.
In my ideal world, most functions would return immutable arrays. In a lot of code, it is fairly rare to require semantically mutable arrays at the highest level of the API (of course operations are internally often implemented as mutating operations) and even in a good chunk of the cases that make use of them, they are used as a performance optimization rather than a semantic necessity.
On the other hand, having an immutability guarantee can be quite useful. For example, it would solve a lot of the performance problems around aliasing (the reason LLVM can't vectorize in a lot of cases is that it doesn't know that the output array doesn't overlap the input array - if the input array is immutable that obviously can't happen).
Immutability is also nice for higher level applications. Since views and slices are the same thing in immutable arrays, operations that would semantically be required to make copies on mutable arrays (say an AD system taking captures during the forward pass), can use views instead.
Now, the problem here of course is that sometimes you do want mutation, particularly during construction of various objects (i.e. you construct the object once by setting something to every memory location, but then it's immutable afterwards). This PR introduces the
freeze
function, which takes a mutable array and returns an immutable array with the same memory contents. Semantically this function is a copy, but the idea is that the compiler will be able to elide this copy in most circumstances, thus allowing immutable arrays to be constructed using mutating operations without overhead. Similarly, there is themelt
function which does the opposite. Once this infrastructure is mature, it should be trivial to get either the immutable or the mutable version in a zero-overhead (after compiler optimizations) manner of any array function just by adding the appropriate freeze/melt function. The 2.0 goal would then be to actually make most array operations return the immutable versions of the array.Another motivation here is to make it easier to write code that it generic over mutability in order to support things like XLA and other optimizing linear algebra compilers that operate on immutable tensors as their primitives. By having a well defined way to talk about mutability in the standard library, it should be easier to plug in those external implementations seamlessly.