Allocations when using dualcache #31

dbstein · 2022-08-05T14:09:16Z

Perhaps I'm doing something wrong in how I set things up, but dualcache seems to always generate allocations when requesting the dual portion of the cache. Here's the most minimal example I could think of; I posted a slightly less minimal example here. Writing my own (simpler, less flexible) version of get_tmp eliminates the issue. The following code:

using ForwardDiff, PreallocationTools

function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
    return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
    return dc.du
end

function test()
    ChunkSize = 12
    for n in (10, 20)
        println("\nTesting for n=", n)
        A = zeros(n);
        DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
        cache = dualcache(A, ChunkSize);
        print("... get_tmp on Vector{Float64}\n")
        a = @allocated my_get_tmp(cache, A);  println(a)
        print("... get_tmp on Vector{Dual}\n")
        a = @allocated get_tmp(cache, DA); println(a)
        print("... my_get_tmp on Vector{Float64}\n")
        a = @allocated my_get_tmp(cache, A);  println(a)
        print("... my_get_tmp on Vector{Dual}\n")
        a = @allocated my_get_tmp(cache, DA); println(a)
    end
end

test()

gives:

julia> test()
Testing for n=10
... get_tmp on Vector{Float64}
0
... get_tmp on Vector{Dual}
1168
... my_get_tmp on Vector{Float64}
0
... my_get_tmp on Vector{Dual}
0

Testing for n=20
... get_tmp on Vector{Float64}
0
... get_tmp on Vector{Dual}
2240
... my_get_tmp on Vector{Float64}
0
... my_get_tmp on Vector{Dual}
0

I.e. results in allocations that scale with the vector size and are slightly larger than what I would think would be needed (8*Chunksize*n=960 for n=10 and 1920 for n=20). Using Julia v1.7.3, PreallocationTools v0.4.0, on Mac OSX 12.5.

ChrisRackauckas · 2022-08-05T14:49:27Z

https://github.com/SciML/PreallocationTools.jl/blob/v0.4.0/src/PreallocationTools.jl#L57

The view probably is allocating? Try https://github.com/JuliaArrays/UnsafeArrays.jl

This was supposed to be solved: JuliaLang/julia#34126 JuliaLang/julia#14955

thomvet · 2022-08-05T15:02:52Z

If using a simple reshape() instead of the ArrayInterfaceCore.restructure() the allocations disappear. Copy-paste version below:

using ForwardDiff, PreallocationTools

function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
    return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: Float64}
    return dc.du
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::AbstractArray{T}) where T<:ForwardDiff.Dual
    nelem = div(sizeof(T), sizeof(eltype(dc.dual_du)))*length(dc.du)
    if nelem > length(dc.dual_du)
        enlargedualcache!(dc, nelem)
    end
    reshape(reinterpret(T, view(dc.dual_du, 1:nelem)),size(dc.du))
end

function test()
    ChunkSize = 12
    for n in (10, 20)
        println("\nTesting for n=", n)
        A = zeros(n);
        DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
        cache = dualcache(A, ChunkSize);
        print("... get_tmp on Vector{Float64}\n")
        a = @allocated my_get_tmp(cache, A);  println(a)
        print("... get_tmp on Vector{Dual}\n")
        a = @allocated get_tmp(cache, DA); println(a)
        print("... my_get_tmp on Vector{Float64}\n")
        a = @allocated my_get_tmp(cache, A);  println(a)
        print("... my_get_tmp on Vector{Dual}\n")
        a = @allocated my_get_tmp(cache, DA); println(a)
        print("... get_tmp2 on Vector{Dual}\n")
        a = @allocated get_tmp2(cache, DA); println(a)
    end
end

test()

dbstein · 2022-08-05T15:03:24Z

Removing the view was the first thing I did when I tried to write my own, it didn't help:

using ForwardDiff, PreallocationTools, ArrayInterfaceCore

function my_get_tmp(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
    return reshape(reinterpret(T, dc.dual_du), size(dc.du))
end
function get_tmp2(dc::PreallocationTools.DiffCache, u::Vector{T}) where {T <: ForwardDiff.Dual}
    ArrayInterfaceCore.restructure(dc.du, reinterpret(T, dc.dual_du))
end

function test()
    ChunkSize = 12
    n = 10
    A = zeros(n);
    DA = zeros(ForwardDiff.Dual{ForwardDiff.Tag{nothing, Float64}, Float64, ChunkSize}, 10);
    cache = dualcache(A, ChunkSize);
    print("... get_tmp on Vector{Dual}\n")
    a = @allocated get_tmp(cache, DA); println(a)
    print("... get_tmp2 on Vector{Dual}\n")
    a = @allocated get_tmp2(cache, DA); println(a)
    print("... my_get_tmp on Vector{Dual}\n")
    a = @allocated my_get_tmp(cache, DA); println(a)
end

test()

gives:

... get_tmp on Vector{Dual}
1168
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0

Since I didn't know all of what's going on under the hood in restructure I switched it to reshape and voila.

ChrisRackauckas · 2022-08-05T15:04:39Z

Is restructure changing the view to an array?

thomvet · 2022-08-05T15:11:24Z

I checked with earlier versions of PreallocationTools.jl (down to 0.2.4) where it was still ArrayInterface 5.y.z; the allocations also happen there.

ChrisRackauckas · 2022-08-05T15:35:53Z

And is it because of changing a view to an Array?

thomvet · 2022-08-05T15:48:32Z

Is this the right way to test it?

using ArrayInterfaceCore
a = zeros(10)
ArrayInterfaceCore.restructure(a, reinterpret(Float32, view(a, 1:5)))

Then yes:

10-element Vector{Float32}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

ChrisRackauckas · 2022-08-05T15:51:57Z

There's the allocation then. Does Adapt.jl's adapt allocate here? Maybe that's the primitive that should be used. Otherwise we could specialize restructure to keep the view here.

thomvet · 2022-08-06T05:37:00Z

I’ll have to look some more into it, but it looks like restructure() always allocates for non-Base.Array. Maybe it needs a general fix?

Regarding adapt() I have to play around more and understand better, but at the moment I don’t see how it can deliver a “reshape” (it just takes care of the wrappers?).

My proposal right now would be to do a fix for normal Arrays (simply doing a reshape, as I did above; same as dbstein’s version, but keeping the view) and care about the more advanced AbstractArrays later (ArrayPartition and friends).

CuArray would still need adapt though, right? (I don’t have a readily accessible Cuda GPU at hand to test).

ChrisRackauckas · 2022-08-06T11:23:44Z

Yeah we can create a specialized dispatch on Array for now. I think in general it cannot just be a reshape because yes, it may need to be type matching in which case the view would get converted.

thomvet · 2022-08-09T16:45:03Z

Just a note here to say that I am playing around on this. I hope to have a PR with a (at least partial) fix today or tomorrow.

dbstein · 2022-08-09T17:06:34Z

Thanks for the fast work on it. Writing non-allocating Jacobians for complicated functions (or even allocating ones, for that matter) is a huge pain, and getting them for free is nothing short of magical.

thomvet · 2022-08-10T12:36:50Z

#32 fixes the allocations for Base Arrays now. However, I couldn't get rid of the reshape(reinterpret(...view(...)) wrappers and stay allocation free.

That said, I don't know whether the wrappers interfere with anything downstream negatively. At least in the cases that I use PreallocationTools for, it didn't cause a performance loss (actually, a small gain due to the allocations being gone).

I tested with Adapt, but also with unsafe_wrap (as a last resort):

function _restructure(normal_cache::Array, duals) 
    unsafe_wrap(Array, pointer(duals), size(normal_cache))
end

That gets rid of the wrappers and returns a correctly sized Array with duals, but it also causes allocations (because the view() is also removed?).

Finally, the more complex AbstractArray cases still allocate.

ChrisRackauckas · 2022-10-13T06:33:38Z

@chriselrod do you know if there's a more general solution here?

chriselrod · 2022-10-23T16:26:33Z

I got:

julia> test()
... get_tmp on Vector{Dual}
0
... get_tmp2 on Vector{Dual}
1168
... my_get_tmp on Vector{Dual}
0

get_tmp2, which calls ArrayInterface.restructure(::Array, ::ReinterpretArray) calls convert(Array, ::ReinterpretArray), which is why it allocates.

ChrisRackauckas · 2022-10-23T17:06:53Z

Yes we know. But is it possible to solve?

chriselrod · 2022-10-23T17:31:32Z

Yes we know. But is it possible to solve?

Don't make a copy?
Why the need for convert? What's wrong with the reinterpret array?
Where is restructure being called?

Like, obviously we can avoid these allocations. SimpleChains uses just a single UInt8 buffer vector for all its temporaries, whether Bool, Float32, Float64, or ForwardDiffDual.

I'd just need more context on the design and why things are the way they are now to suggest an alternative approach.

thomvet · 2022-12-02T10:59:46Z

I am intrigued by this comment; it might be simpler to adapt this approach here as well. I don't think it will end up in a general solution for any AbstractArray type, though, but maybe it's a start towards one?

ChrisRackauckas · 2022-12-02T11:49:48Z

The main issue is that it's a missing primitive, so it needs something in ArrayInterfaceCore to capture it. There is no primitive for "restructure but non-allocating". restructure is supposed to be type-matching: if you started with an ArrayPartition of Array, then after you restructure it should turn a Vector into an ArrayPartition of Array. @chriselrod 's comment is of course that if you want that to be non-allocating, that needs to be an ArrayPartition of SubArray, which is correct, but we don't have a primitive operation for "create a type that's like X but not exactly matching X, where the differences that are allowed are shortcuts that improve the performance".

I tend to think that this is a more fundamental issue that SubArray -> Array should just be a pointer thing and shouldn't actually allocate a new array, and I'm not sure why that is true in Julia, and so it would probaby be best to just get that fixed. But this notion of restructuring is actually quite important to PreallocationTools, in fact, it's how the current DiffCache works, so having some verbiage to ask for reshaping functionality to do this kind of stuff is likely useful in many places as long as this SubArray -> Array allocation exists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocations when using dualcache #31

Allocations when using dualcache #31

dbstein commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022 •

edited

Loading

dbstein commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022 •

edited

Loading

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 6, 2022 •

edited

Loading

ChrisRackauckas commented Aug 6, 2022

thomvet commented Aug 9, 2022

dbstein commented Aug 9, 2022

thomvet commented Aug 10, 2022 •

edited

Loading

ChrisRackauckas commented Oct 13, 2022

chriselrod commented Oct 23, 2022

ChrisRackauckas commented Oct 23, 2022

chriselrod commented Oct 23, 2022 •

edited

Loading

thomvet commented Dec 2, 2022

ChrisRackauckas commented Dec 2, 2022

Allocations when using dualcache #31

Allocations when using dualcache #31

Comments

dbstein commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022 • edited Loading

dbstein commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022 • edited Loading

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 5, 2022

ChrisRackauckas commented Aug 5, 2022

thomvet commented Aug 6, 2022 • edited Loading

ChrisRackauckas commented Aug 6, 2022

thomvet commented Aug 9, 2022

dbstein commented Aug 9, 2022

thomvet commented Aug 10, 2022 • edited Loading

ChrisRackauckas commented Oct 13, 2022

chriselrod commented Oct 23, 2022

ChrisRackauckas commented Oct 23, 2022

chriselrod commented Oct 23, 2022 • edited Loading

thomvet commented Dec 2, 2022

ChrisRackauckas commented Dec 2, 2022

thomvet commented Aug 5, 2022 •

edited

Loading

thomvet commented Aug 5, 2022 •

edited

Loading

thomvet commented Aug 6, 2022 •

edited

Loading

thomvet commented Aug 10, 2022 •

edited

Loading

chriselrod commented Oct 23, 2022 •

edited

Loading