improve performance of `apply!` be rewriting the part that zero out rows #489

KristofferC · 2022-09-27T10:51:24Z

As a consequence the strategy kwarg is no longer useful.

Benchmark code:

grid = generate_grid(Hexahedron, (100, 100, 100));
dim = 3
ip = Lagrange{dim, RefCube, 1}()
qr = QuadratureRule{dim, RefCube}(2)
cellvalues = CellScalarValues(qr, ip);
dh = DofHandler(grid)
push!(dh, :u, 1)
close!(dh);
K = create_sparsity_pattern(dh)
ch = ConstraintHandler(dh);
∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
add!(ch, dbc);
close!(ch)
update!(ch, 0.0);
f = zeros(size(K, 1))

using BenchmarkTools
@btime apply_zero!(K, f, ch)

julia> @btime apply_zero!(K, f, ch) # master
  692.526 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch)
  687.897 ms (4 allocations: 544 bytes)

Time is approx the same but we no longer need to keep two copies of the stiffness matrix in memory at the same time.

KristofferC · 2022-09-27T18:09:55Z

Could someone just verify the performance difference.

koehlerson · 2022-09-27T19:15:41Z

julia> @btime apply_zero!(K, f, ch) #master
  340.741 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) #kc/row_zero
  408.249 ms (4 allocations: 544 bytes)

KnutAM · 2022-09-28T00:04:08Z

  498.754 ms (16 allocations: 423.98 MiB) # master
   665.802 ms (4 allocations: 544 bytes) # kc/row_zero
145.603 ms (4 allocations: 544 bytes) # kam/both_zero

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

The downside is that creating the constraint handler is slower,
even when giving the matrix to close! for added performance (it needs K internally to find the indices, also possible directly from dh, but was too much work:))

138.396 ms (120174 allocations: 77.98 MiB) # master
143.173 ms (120163 allocations: 80.44 MiB) # kc/row_zero
713.404 ms (160598 allocations: 107.49 MiB) # kam/both_zero (with `close!(ch, K)`)
1.824 s (160622 allocations: 2.96 GiB) # kam/both_zero (with `close!(ch)`)

(For the close!(ch) about 1 GB of allocations and 0.3 s can be saved by filling with a Singelton type instead of Float64)

KristofferC · 2022-09-28T06:54:37Z

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

Then you are storing approx the equivalent memory of half a stiffness matrix there which maybe is too much.

KristofferC · 2022-09-28T06:59:42Z

It is interesting we get quite different benchmarks. On my desktop with a pretty beefy CPU (i9-12900K) I get:

julia> @btime apply_zero!(K, f, ch) # master
  407.463 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) # PR
  369.013 ms (4 allocations: 544 bytes)

If I slap a Threads.@threads on it and run with 8 cores I get

julia> @btime apply_zero!(K, f, ch)
  168.278 ms (53 allocations: 4.89 KiB)

KnutAM · 2022-09-28T07:40:52Z

Perhaps you already discussed why not storing the indices ofnzvals directly today, but at least it seems faster :)

Then you are storing approx the equivalent memory of half a stiffness matrix there which maybe is too much.

Perhaps I misunderstand here, but wouldn't it be (on average) 2*num_prescribed_dofs*bandwidth (and only Ints)?
(But it was more for fun, I suppose this will always be dominated by linear solve and your solution is less invasive)

KristofferC · 2022-09-28T08:51:37Z

Yeah, I'm wrong, you only need the mapping for the constrained dofs (of course). Caching this mapping in the constraint handler makes sense to me I think, it would just move a part of the logic in the new function here to there. I guess the only drawback is that we typically do not send in the stiffness matrix to the constraint handler when it is created...

koehlerson · 2022-09-28T11:18:26Z

It is interesting we get quite different benchmarks. On my desktop with a pretty beefy CPU (i9-12900K) I get:
julia> @btime apply_zero!(K, f, ch) # master
  407.463 ms (16 allocations: 423.98 MiB)

julia> @btime apply_zero!(K, f, ch) # PR
  369.013 ms (4 allocations: 544 bytes)
If I slap a Threads.@threads on it and run with 8 cores I get
julia> @btime apply_zero!(K, f, ch)
  168.278 ms (53 allocations: 4.89 KiB)

I used my laptop with 12th Gen i5-1240p but I can retry with threads If you want

termi-official · 2022-09-28T11:41:24Z

Probably stupid question, but if the performance of applying (affine) constraints is of concern, why don't we apply the constraints on element-level as in deal.ii? (see e.g. https://www.dealii.org/current/doxygen/deal.II/classAffineConstraints.html#a373fbdacd8c486e675b8d2bff8943192 and https://www.dealii.org/current/doxygen/deal.II/step_27.html#Creatingthesparsitypattern)

lijas · 2022-09-28T12:12:45Z

Probably stupid question, but if the performance of applying (affine) constraints is of concern, why don't we apply the constraints on element-level as in deal.ii?

Why not both :D

I think this PR can be merged, even though it seems to be a bit slower on @koehlerson benchmarks.

fredrikekre · 2022-09-29T20:09:15Z

master:

@btime apply_zero!($K, $f, $ch)
794.046 ms (16 allocations: 423.98 MiB)

kc/row_zero:

@btime apply_zero!($K, $f, $ch)
643.115 ms (4 allocations: 544 bytes)

KristofferC · 2022-09-29T20:10:23Z

I should probably be get rid of the identity hashing thing. It is a bit unclear if that is optimizing for the specific benchmark.

This patch improves the performance of apply! and apply_zero! by rewriting the part that zero out rows of the matrix. As a result, the `strategy` keyword argument is obsolete and thus ignored. Co-authored-by: Kristoffer Carlsson <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>

codecov-commenter · 2022-10-26T12:45:26Z

Codecov Report

Base: 92.20% // Head: 92.28% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (2c261c0) compared to base (f3057f3).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #489      +/-   ##
==========================================
+ Coverage   92.20%   92.28%   +0.07%     
==========================================
  Files          22       22              
  Lines        3783     3783              
==========================================
+ Hits         3488     3491       +3     
+ Misses        295      292       -3

Impacted Files	Coverage Δ
src/Dofs/ConstraintHandler.jl	`95.08% <100.00%> (+0.35%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ApplyStrategy was deprecated in #489, but kept for backwards compat (having no effect). This PR removes them completely.

KristofferC force-pushed the kc/row_zero branch from 22a73b6 to 1c883e7 Compare September 27, 2022 11:26

fredrikekre approved these changes Sep 27, 2022

View reviewed changes

fredrikekre force-pushed the kc/row_zero branch from 1c883e7 to 2c261c0 Compare October 26, 2022 12:44

fredrikekre merged commit 17f993a into master Oct 26, 2022

fredrikekre deleted the kc/row_zero branch October 26, 2022 13:06

KnutAM added a commit that referenced this pull request Jul 31, 2024

remove ApplyStrategy enums (#1013)

f7f2592

ApplyStrategy was deprecated in #489, but kept for backwards compat (having no effect). This PR removes them completely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance of `apply!` be rewriting the part that zero out rows #489

improve performance of `apply!` be rewriting the part that zero out rows #489

KristofferC commented Sep 27, 2022 •

edited

Loading

KristofferC commented Sep 27, 2022 •

edited

Loading

koehlerson commented Sep 27, 2022

KnutAM commented Sep 28, 2022

KristofferC commented Sep 28, 2022

KristofferC commented Sep 28, 2022 •

edited

Loading

KnutAM commented Sep 28, 2022 •

edited

Loading

KristofferC commented Sep 28, 2022

koehlerson commented Sep 28, 2022

termi-official commented Sep 28, 2022

lijas commented Sep 28, 2022

fredrikekre commented Sep 29, 2022

KristofferC commented Sep 29, 2022

codecov-commenter commented Oct 26, 2022 •

edited

Loading

improve performance of apply! be rewriting the part that zero out rows #489

improve performance of apply! be rewriting the part that zero out rows #489

Conversation

KristofferC commented Sep 27, 2022 • edited Loading

KristofferC commented Sep 27, 2022 • edited Loading

koehlerson commented Sep 27, 2022

KnutAM commented Sep 28, 2022

KristofferC commented Sep 28, 2022

KristofferC commented Sep 28, 2022 • edited Loading

KnutAM commented Sep 28, 2022 • edited Loading

KristofferC commented Sep 28, 2022

koehlerson commented Sep 28, 2022

termi-official commented Sep 28, 2022

lijas commented Sep 28, 2022

fredrikekre commented Sep 29, 2022

KristofferC commented Sep 29, 2022

codecov-commenter commented Oct 26, 2022 • edited Loading

Codecov Report

improve performance of `apply!` be rewriting the part that zero out rows #489

improve performance of `apply!` be rewriting the part that zero out rows #489

KristofferC commented Sep 27, 2022 •

edited

Loading

KristofferC commented Sep 27, 2022 •

edited

Loading

KristofferC commented Sep 28, 2022 •

edited

Loading

KnutAM commented Sep 28, 2022 •

edited

Loading

codecov-commenter commented Oct 26, 2022 •

edited

Loading