Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 3D support for P4estMesh #637

Merged
merged 46 commits into from
Jun 24, 2021
Merged

Conversation

efaulhaber
Copy link
Member

@efaulhaber efaulhaber commented Jun 10, 2021

This PR implements all 2D features of P4estMesh in 3D (eighth point of #584).
The only thing that's not working yet is FSP on a non-conforming mesh, but I'm afraid that requires a bit more work. I'll do this in another PR.

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #637 (8a34799) into main (b2d0d98) will increase coverage by 0.03%.
The diff coverage is 93.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #637      +/-   ##
==========================================
+ Coverage   93.40%   93.44%   +0.03%     
==========================================
  Files         161      169       +8     
  Lines       16002    16503     +501     
==========================================
+ Hits        14947    15421     +474     
- Misses       1055     1082      +27     
Flag Coverage Δ
unittests 93.44% <93.70%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...2d/elixir_advection_amr_p4est_unstructured_flag.jl 100.00% <ø> (ø)
...s/2d/elixir_advection_p4est_non_conforming_flag.jl 100.00% <ø> (ø)
...dvection_p4est_non_conforming_flag_unstructured.jl 100.00% <ø> (ø)
examples/2d/elixir_euler_free_stream_p4est.jl 100.00% <ø> (ø)
src/Trixi.jl 83.33% <ø> (ø)
src/auxiliary/auxiliary.jl 90.00% <ø> (+3.33%) ⬆️
src/callbacks_step/amr_dg2d.jl 98.69% <ø> (-0.16%) ⬇️
src/solvers/dg_curved/dg_3d.jl 97.20% <ø> (ø)
src/solvers/dg_tree/dg.jl 87.96% <ø> (ø)
src/solvers/dg_tree/dg_3d.jl 97.48% <ø> (ø)
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2d0d98...8a34799. Read the comment docs.

@ranocha
Copy link
Member

ranocha commented Jun 12, 2021

You should update some 3D code following the changes I made in #638 to get better performance when you're ready for AMR in 3D.

@efaulhaber
Copy link
Member Author

efaulhaber commented Jun 14, 2021

I had to optimize the AMR code here because the performance was truly terrible (about 100x slower than TreeMesh, while allocating multiple GB). The core problem here was calc_contravariant_vectors! of CurvedMesh.

Here's a benchmark before optimizing:

julia> @benchmark Trixi.calc_contravariant_vectors!($semi.cache.elements.contravariant_vectors, 1, $semi.cache.elements.jacobian_matrix, $semi.cache.elements.node_coordinates, $solver.basis)
BenchmarkTools.Trial: 
  memory estimate:  137.25 KiB
  allocs estimate:  1008
  --------------
  minimum time:     64.400 μs (0.00% GC)
  median time:      96.800 μs (0.00% GC)
  mean time:        98.911 μs (8.82% GC)
  maximum time:     11.350 ms (99.19% GC)
  --------------
  samples:          10000
  evals/sample:     1

I rewrote this function to plain loops instead of vectorized multiplications. Then, I put each of the six original operations in an @turbo loop, like this (the old code is just commented out):

# The general form is
# Jaⁱₙ = 0.5 * ( ∇ × (Xₘ ∇ Xₗ - Xₗ ∇ Xₘ) )ᵢ where (n, m, l) cyclic and ∇ = (∂/∂ξ, ∂/∂η, ∂/∂ζ)ᵀ
# # Calculate the first summand of the cross product in each dimension
# for n in 1:3, j in eachnode(basis), i in eachnode(basis)
# # (n, m, l) cyclic
# m = (n % 3) + 1
# l = ((n + 1) % 3) + 1
# # Calc only the first summand 0.5 * (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_η of
# # Ja¹ₙ = 0.5 * [ (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_η - (Xₘ Xₗ_η - Xₗ Xₘ_η)_ζ ]
# @views contravariant_vectors[n, 1, i, :, j, element] = 0.5 * basis.derivative_matrix * (
# node_coordinates[m, i, :, j, element] .* jacobian_matrix[l, 3, i, :, j, element] .-
# node_coordinates[l, i, :, j, element] .* jacobian_matrix[m, 3, i, :, j, element])
# # Calc only the first summand 0.5 * (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_ζ of
# # Ja²ₙ = 0.5 * [ (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_ζ - (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_ξ ]
# @views contravariant_vectors[n, 2, i, j, :, element] = 0.5 * basis.derivative_matrix * (
# node_coordinates[m, i, j, :, element] .* jacobian_matrix[l, 1, i, j, :, element] .-
# node_coordinates[l, i, j, :, element] .* jacobian_matrix[m, 1, i, j, :, element])
# # Calc only the first summand 0.5 * (Xₘ Xₗ_η - Xₗ Xₘ_η)_ξ of
# # Ja³ₙ = 0.5 * [ (Xₘ Xₗ_η - Xₗ Xₘ_η)_ξ - (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_η ]
# @views contravariant_vectors[n, 3, :, i, j, element] = 0.5 * basis.derivative_matrix * (
# node_coordinates[m, :, i, j, element] .* jacobian_matrix[l, 2, :, i, j, element] .-
# node_coordinates[l, :, i, j, element] .* jacobian_matrix[m, 2, :, i, j, element])
# end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
ii in indices((contravariant_vectors, derivative_matrix), (4, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 3, 4))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, i, jj, j, element] * jacobian_matrix[l, 3, i, jj, j, element] -
node_coordinates[l, i, jj, j, element] * jacobian_matrix[m, 3, i, jj, j, element])
end
contravariant_vectors[n, 1, i, ii, j, element] = result
end
end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (4, 3, 4)),
ii in indices((contravariant_vectors, derivative_matrix), (5, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 4, 5))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, i, j, jj, element] * jacobian_matrix[l, 1, i, j, jj, element] -
node_coordinates[l, i, j, jj, element] * jacobian_matrix[m, 1, i, j, jj, element])
end
contravariant_vectors[n, 2, i, j, ii, element] = result
end
end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
ii in indices((contravariant_vectors, derivative_matrix), (3, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (4, 3, 4))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 2, 3))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, jj, i, j, element] * jacobian_matrix[l, 2, jj, i, j, element] -
node_coordinates[l, jj, i, j, element] * jacobian_matrix[m, 2, jj, i, j, element])
end
contravariant_vectors[n, 3, ii, i, j, element] = result
end
end
# Calculate the second summand of the cross product in each dimension
# for n in 1:3, j in eachnode(basis), i in eachnode(basis)
# # (n, m, l) cyclic
# m = (n % 3) + 1
# l = ((n + 1) % 3) + 1
# # Calc only the second summand -0.5 * (Xₘ Xₗ_η - Xₗ Xₘ_η)_ζ of
# # Ja¹ₙ = 0.5 * [ (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_η - (Xₘ Xₗ_η - Xₗ Xₘ_η)_ζ ]
# @views contravariant_vectors[n, 1, i, j, :, element] -= 0.5 * basis.derivative_matrix * (
# node_coordinates[m, i, j, :, element] .* jacobian_matrix[l, 2, i, j, :, element] .-
# node_coordinates[l, i, j, :, element] .* jacobian_matrix[m, 2, i, j, :, element])
# # Calc only the second summand -0.5 * (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_ξ of
# # Ja²ₙ = 0.5 * [ (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_ζ - (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_ξ ]
# @views contravariant_vectors[n, 2, :, i, j, element] -= 0.5 * basis.derivative_matrix * (
# node_coordinates[m, :, i, j, element] .* jacobian_matrix[l, 3, :, i, j, element] .-
# node_coordinates[l, :, i, j, element] .* jacobian_matrix[m, 3, :, i, j, element])
# # Calc only the second summand -0.5 * (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_η of
# # Ja³ₙ = 0.5 * [ (Xₘ Xₗ_η - Xₗ Xₘ_η)_ξ - (Xₘ Xₗ_ξ - Xₗ Xₘ_ξ)_η ]
# @views contravariant_vectors[n, 3, i, :, j, element] -= 0.5 * basis.derivative_matrix * (
# node_coordinates[m, i, :, j, element] .* jacobian_matrix[l, 1, i, :, j, element] .-
# node_coordinates[l, i, :, j, element] .* jacobian_matrix[m, 1, i, :, j, element])
# end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (4, 3, 4)),
ii in indices((contravariant_vectors, derivative_matrix), (5, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 4, 5))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, i, j, jj, element] * jacobian_matrix[l, 2, i, j, jj, element] -
node_coordinates[l, i, j, jj, element] * jacobian_matrix[m, 2, i, j, jj, element])
end
contravariant_vectors[n, 1, i, j, ii, element] -= result
end
end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
ii in indices((contravariant_vectors, derivative_matrix), (3, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (4, 3, 4))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 2, 3))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, jj, i, j, element] * jacobian_matrix[l, 3, jj, i, j, element] -
node_coordinates[l, jj, i, j, element] * jacobian_matrix[m, 3, jj, i, j, element])
end
contravariant_vectors[n, 2, ii, i, j, element] -= result
end
end
@turbo for n in 1:3
# (n, m, l) cyclic
m = (n % 3) + 1
l = ((n + 1) % 3) + 1
for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
ii in indices((contravariant_vectors, derivative_matrix), (4, 1)),
i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
result = zero(eltype(contravariant_vectors))
for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 3, 4))
result += 0.5 * derivative_matrix[ii, jj] * (
node_coordinates[m, i, jj, j, element] * jacobian_matrix[l, 1, i, jj, j, element] -
node_coordinates[l, i, jj, j, element] * jacobian_matrix[m, 1, i, jj, j, element])
end
contravariant_vectors[n, 3, i, ii, j, element] -= result
end
end
return contravariant_vectors

Here's a benchmark. I wasn't able to improve these times any more.

julia> @benchmark Trixi.calc_contravariant_vectors!($semi.cache.elements.contravariant_vectors, 1, $semi.cache.elements.jacobian_matrix, $semi.cache.elements.node_coordinates, $solver.basis)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.100 μs (0.00% GC)
  median time:      2.144 μs (0.00% GC)
  mean time:        2.156 μs (0.00% GC)
  maximum time:     4.311 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

Now, I did a few readability optimizations, which unfortunately slowed the code down by a factor of 4.
I'll document exactly what I did, so you don't have to check out each commit to do a benchmark.

The first thing I did was to rename the loop variables and make all loops run over eachnodes(basis) (the latter also prepared for the following readability optimizations):

-    for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
-        ii in indices((contravariant_vectors, derivative_matrix), (4, 1)),
-        i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
-
+    for k in eachnode(basis), j in eachnode(basis), i in eachnode(basis)
       result = zero(eltype(contravariant_vectors))
-      for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 3, 4))
-        result += 0.5 * derivative_matrix[ii, jj] * (
-          node_coordinates[m, i, jj, j, element] * jacobian_matrix[l, 3, i, jj, j, element] -
-          node_coordinates[l, i, jj, j, element] * jacobian_matrix[m, 3, i, jj, j, element])
+
+      for ii in eachnode(basis)
+        result += 0.5 * derivative_matrix[j, ii] * (
+          node_coordinates[m, i, ii, k, element] * jacobian_matrix[l, 3, i, ii, k, element] -
+          node_coordinates[l, i, ii, k, element] * jacobian_matrix[m, 3, i, ii, k, element])
       end
 
-      contravariant_vectors[n, 1, i, ii, j, element] = result
+      contravariant_vectors[n, 1, i, j, k, element] = result
     end

This had negligible impact on performance:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     2.167 μs (0.00% GC)
  median time:      2.222 μs (0.00% GC)
  mean time:        2.225 μs (0.00% GC)
  maximum time:     4.667 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     9

The next change had a huge impact on performance. I used to compute the second summand of each vector entry later (because it was the only way when using vectorized multiplications like I did before). Now, I could move these two summands into one loop to make everything a lot more readable:

for k in eachnode(basis), j in eachnode(basis), i in eachnode(basis)
result = zero(eltype(contravariant_vectors))
for ii in eachnode(basis)
# First summand 0.5 * (Xₘ Xₗ_ζ - Xₗ Xₘ_ζ)_η
# Multiply derivative_matrix to j-dimension to differentiate wrt η
result += 0.5 * derivative_matrix[j, ii] * (
node_coordinates[m, i, ii, k, element] * jacobian_matrix[l, 3, i, ii, k, element] -
node_coordinates[l, i, ii, k, element] * jacobian_matrix[m, 3, i, ii, k, element])
# Second summand -0.5 * (Xₘ Xₗ_η - Xₗ Xₘ_η)_ζ
# Multiply derivative_matrix to k-dimension to differentiate wrt ζ
result -= 0.5 * derivative_matrix[k, ii] * (
node_coordinates[m, i, j, ii, element] * jacobian_matrix[l, 2, i, j, ii, element] -
node_coordinates[l, i, j, ii, element] * jacobian_matrix[m, 2, i, j, ii, element])
end
contravariant_vectors[n, 1, i, j, k, element] = result
end
end

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.233 μs (0.00% GC)
  median time:      8.400 μs (0.00% GC)
  mean time:        8.466 μs (0.00% GC)
  maximum time:     16.233 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

The last two things I did were to move @turbo to the inner loop,

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.100 μs (0.00% GC)
  median time:      8.267 μs (0.00% GC)
  mean time:        8.334 μs (0.00% GC)
  maximum time:     17.733 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

which allowed me to then move all three loops into one outer loop over n:

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.133 μs (0.00% GC)
  median time:      8.333 μs (0.00% GC)
  mean time:        8.380 μs (0.00% GC)
  maximum time:     16.933 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     3

@ranocha, do you have any ideas how I can get the performance of the first version with the readability I have now (i.e., with both summands computed right next to each other)?

src/mesh/p4est_mesh.jl Outdated Show resolved Hide resolved
@ranocha
Copy link
Member

ranocha commented Jun 15, 2021

do you have any ideas how I can get the performance of the first version with the readability I have now (i.e., with both summands computed right next to each other)?

Not really. The memory access pattern is significantly less regular/localized when you try to pack both summands in one loop, so it's expected that this impacts the performance.

src/auxiliary/p4est.jl Outdated Show resolved Hide resolved
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this! Please find below some comments and questions.

examples/3d/elixir_advection_amr_p4est.jl Outdated Show resolved Hide resolved
examples/3d/elixir_advection_basic_p4est.jl Outdated Show resolved Hide resolved
examples/3d/elixir_advection_p4est_non_conforming.jl Outdated Show resolved Hide resolved
examples/3d/elixir_euler_free_stream_p4est.jl Outdated Show resolved Hide resolved
src/auxiliary/p4est.jl Outdated Show resolved Hide resolved
src/mesh/p4est_mesh.jl Show resolved Hide resolved
src/mesh/p4est_mesh.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/containers_2d.jl Outdated Show resolved Hide resolved
test/test_examples_3d_p4est.jl Outdated Show resolved Hide resolved
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very impressive work, @efaulhaber! I have a few comments and questions, but overall I don't see any major issues.

src/solvers/dg_p4est/dg_2d.jl Show resolved Hide resolved
src/callbacks_step/amr.jl Show resolved Hide resolved
src/callbacks_step/amr.jl Show resolved Hide resolved
src/solvers/dg_p4est/dg_3d.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/dg_3d.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/containers.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/containers_3d.jl Show resolved Hide resolved
src/solvers/dg_p4est/containers_3d.jl Outdated Show resolved Hide resolved
src/solvers/dg_p4est/dg_3d.jl Outdated Show resolved Hide resolved
@efaulhaber efaulhaber requested a review from sloede June 23, 2021 13:44
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding all changes!

@ranocha ranocha merged commit e6aa4bd into trixi-framework:main Jun 24, 2021
@efaulhaber efaulhaber deleted the p4est-3d branch September 25, 2021 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants