Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocations with StaticArrays in Julia v1.11 #1282

Open
kaipartmann opened this issue Oct 22, 2024 · 4 comments
Open

Allocations with StaticArrays in Julia v1.11 #1282

kaipartmann opened this issue Oct 22, 2024 · 4 comments

Comments

@kaipartmann
Copy link

I am the main author of the package Peridynamics.jl and recently changed to Julia v1.11.
With the new Julia version, a significant increase in the simulation time can be found when using a certain material.

Peridynamics.jl Benchmark

Please see the following benchmark, which is also described in more detail in the tutorials (Tutorial mode I fracture).

using Peridynamics

function mode_i(mat, npyz)
    l, Δx, δ, a = 1.0, 1/npyz, 3.015/npyz, 0.5
    pos, vol = uniform_box(l, l, 0.1l, Δx)
    ids = sortperm(pos[2,:])
    body = Body(mat, pos[:, ids], vol[ids])
    material!(body; horizon=3.015Δx, E=2.1e5, nu=0.25, rho=8e-6, Gc=2.7)
    point_set!(p -> p[1]  -l/2+a && 0  p[2]  2δ, body, :set_a)
    point_set!(p -> p[1]  -l/2+a && -2δ  p[2] < 0, body, :set_b)
    precrack!(body, :set_a, :set_b)
    point_set!(p -> p[2] > l/2-Δx, body, :set_top)
    point_set!(p -> p[2] < -l/2+Δx, body, :set_bottom)
    velocity_bc!(t -> -30, body, :set_bottom, :y)
    velocity_bc!(t -> 30, body, :set_top, :y)
    vv = VelocityVerlet(steps=2000)
    job = Job(body, vv; path=joinpath(@__DIR__, "results", "mode_i"))
    @time submit(job)
    return nothing
end

mode_i(NOSBMaterial(), 25) # compilation
mode_i(NOSBMaterial(), 50) # work
❯ julia +1.10.5 --project -t 6 mode_i_benchmark.jl
  0.719477 seconds (1.39 M allocations: 450.082 MiB, 2.77% gc time, 74.16% compilation time)
  6.090774 seconds (1.28 M allocations: 1.118 GiB, 1.06% gc time, 2.17% compilation time)

❯ julia +1.11.1 --project -t 6 mode_i_benchmark.jl
  9.574071 seconds (2.15 G allocations: 82.415 GiB, 15.59% gc time, 6.09% compilation time)
177.261652 seconds (40.34 G allocations: 1.501 TiB, 23.15% gc time)

The same simulation takes 29x longer on v1.11.1 than v1.10.5.

When investigating the problem, I narrowed it down to the calculation of the deformation gradient:
https://github.com/kaipartmann/Peridynamics.jl/blob/f9b8fe60e743d828cf96d2366610c6a69d7ac054/src/physics/correspondence.jl#L174-L194

MWE

The same behavior can be reproduced with a small MWE:

using LinearAlgebra, StaticArrays, BenchmarkTools

function mwe1(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * X * X'
    end
    return K
end

@btime mwe1(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.070 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  129.890 ms (7000000 allocations: 289.92 MiB)

I think the problem is related to allocations when using += with a SMatrix or SVector.

As a quick fix, rewriting the calculation by hand leads to even slightly better performance on v1.11:

function mwe2(a, X, n)
    K11, K12, K13 = 0.0, 0.0, 0.0
    K21, K22, K23 = 0.0, 0.0, 0.0
    K31, K32, K33 = 0.0, 0.0, 0.0
    X1, X2, X3 = X[1], X[2], X[3]
    for i in 1:n
        k = a * i
        K11 += k * X1 * X1
        K12 += k * X1 * X2
        K13 += k * X1 * X3
        K21 += k * X2 * X1
        K22 += k * X2 * X2
        K23 += k * X2 * X3
        K31 += k * X3 * X1
        K32 += k * X3 * X2
        K33 += k * X3 * X3
    end
    K = SMatrix{3,3}(K11, K21, K31, K12, K22, K32, K13, K23, K33)
    return K
end

@btime mwe2(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.211 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  1.072 ms (0 allocations: 0 bytes)
@kaipartmann kaipartmann changed the title Allocations with StaticArrays in Julia v1.11 Allocations with StaticArrays and += in Julia v1.11 Oct 22, 2024
@mateuszbaran
Copy link
Collaborator

Interesting. This looks like a regression in Julia compiler. I'd have expected call-site @inline to fix this but it does not.

@KristofferC
Copy link
Contributor

I think this should have a corresponding issue in the julia repo since it is probably a regression there.

@kaipartmann kaipartmann changed the title Allocations with StaticArrays and += in Julia v1.11 Allocations with StaticArrays in Julia v1.11 Oct 22, 2024
@kaipartmann
Copy link
Author

@KristofferC I opened an issue in the julia repo.
Interestingly, the problem can be solved by changing k * X * X' to k * (X * X'):

function mwe3(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * (X * X')
    end
    return K
end
@btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  805.458 μs (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl
  708.208 μs (0 allocations: 0 bytes)

Since the problem is likely not related to +=, I changed the title accordingly.

@mcabbott
Copy link
Collaborator

Before Julia 1.7, I believe this would have done K += (k * X) * X', which also seems fast now.

After 3-arg *, this calls K += broadcast(*, k, X, X'), which is slow, like mwe1.

Strangely, using K += k .* X .* X' seems to be fast, like mwe3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants