-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement 3D support for P4estMesh
#637
Conversation
Codecov Report
@@ Coverage Diff @@
## main #637 +/- ##
==========================================
+ Coverage 93.40% 93.44% +0.03%
==========================================
Files 161 169 +8
Lines 16002 16503 +501
==========================================
+ Hits 14947 15421 +474
- Misses 1055 1082 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
You should update some 3D code following the changes I made in #638 to get better performance when you're ready for AMR in 3D. |
I had to optimize the AMR code here because the performance was truly terrible (about 100x slower than Here's a benchmark before optimizing:
I rewrote this function to plain loops instead of vectorized multiplications. Then, I put each of the six original operations in an Trixi.jl/src/solvers/dg_curved/containers_3d.jl Lines 76 to 250 in 98781f0
Here's a benchmark. I wasn't able to improve these times any more.
Now, I did a few readability optimizations, which unfortunately slowed the code down by a factor of 4. The first thing I did was to rename the loop variables and make all loops run over - for j in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (5, 4, 5)),
- ii in indices((contravariant_vectors, derivative_matrix), (4, 1)),
- i in indices((contravariant_vectors, node_coordinates, jacobian_matrix), (3, 2, 3))
-
+ for k in eachnode(basis), j in eachnode(basis), i in eachnode(basis)
result = zero(eltype(contravariant_vectors))
- for jj in indices((derivative_matrix, node_coordinates, jacobian_matrix), (2, 3, 4))
- result += 0.5 * derivative_matrix[ii, jj] * (
- node_coordinates[m, i, jj, j, element] * jacobian_matrix[l, 3, i, jj, j, element] -
- node_coordinates[l, i, jj, j, element] * jacobian_matrix[m, 3, i, jj, j, element])
+
+ for ii in eachnode(basis)
+ result += 0.5 * derivative_matrix[j, ii] * (
+ node_coordinates[m, i, ii, k, element] * jacobian_matrix[l, 3, i, ii, k, element] -
+ node_coordinates[l, i, ii, k, element] * jacobian_matrix[m, 3, i, ii, k, element])
end
- contravariant_vectors[n, 1, i, ii, j, element] = result
+ contravariant_vectors[n, 1, i, j, k, element] = result
end This had negligible impact on performance:
The next change had a huge impact on performance. I used to compute the second summand of each vector entry later (because it was the only way when using vectorized multiplications like I did before). Now, I could move these two summands into one loop to make everything a lot more readable: Trixi.jl/src/solvers/dg_curved/containers_3d.jl Lines 85 to 104 in 7dd259b
The last two things I did were to move
which allowed me to then move all three loops into one outer loop over
@ranocha, do you have any ideas how I can get the performance of the first version with the readability I have now (i.e., with both summands computed right next to each other)? |
Not really. The memory access pattern is significantly less regular/localized when you try to pack both summands in one loop, so it's expected that this impacts the performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! Please find below some comments and questions.
Co-authored-by: Hendrik Ranocha <[email protected]>
Co-authored-by: Hendrik Ranocha <[email protected]>
Co-authored-by: Hendrik Ranocha <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very impressive work, @efaulhaber! I have a few comments and questions, but overall I don't see any major issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding all changes!
This PR implements all 2D features of
P4estMesh
in 3D (eighth point of #584).The only thing that's not working yet is FSP on a non-conforming mesh, but I'm afraid that requires a bit more work. I'll do this in another PR.