-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorization of RANS2P2D #1232
base: main
Are you sure you want to change the base?
Conversation
5ed0393
to
9a49a6c
Compare
Codecov Report
@@ Coverage Diff @@
## main #1232 +/- ##
==========================================
+ Coverage 47.56% 52.74% +5.18%
==========================================
Files 90 531 +441
Lines 71776 109533 +37757
==========================================
+ Hits 34140 57777 +23637
- Misses 37636 51756 +14120
Continue to review full report at Codecov.
|
253f88b
to
e32dd55
Compare
97a516a
to
b04f9c3
Compare
@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the Can you confirm that this does not hurt performance? |
I ran a cutfem-based 2D case using this branch and the master branch. There was no major difference in the running times. |
Nice work! @jhcollins you might have some parallel jobs set up where you could do a timing comparison as well. My allocations on HPC are not ready yet, but I'll test some compute intensive jobs on mac osx and linux. @JohanMabille did you make this conversion by hand or did you write a python script? If via script, it would be nice if you could add that to the scripts directory for future use. |
I did this one by hand because I wanted to see if I could add other simplifications (like replacing initialization loops). I can work on a Python script for the other files. |
@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin? |
Sorry, just saw this. I think we need to verify the performance on something we've run a lot and load up the cores with mesh nodes. Maybe a dambreak or one of your wave flume simulations and try it on 2 or 3 core counts so you can get maybe 1000 vertices per core, 2000 vertices per core and 4000 vertices per core. In 2D you can likely get more like 20,000 vertices per core. If you run it with --profiling you should get a list of the top 20 functions. Typically the residual and jacobian for RANS2P will make it onto the list. The PETSc solve and preconditioner setup would be the top costs, in the 80-90% range, then below that we should see the calculateResidual and calculateJacobian functions. If yuou have go-to FSI simulation, like a floating caisson with ALE, that would be handy because it tests more of the functionality. |
My timings are looking great @JohanMabille. I'll merge this tomorrow once a few large jobs run on HPC platforms from Cray and SGI, and I confirm the results are identical and timings equivalent. So far I see some cases where the new implementation appears faster, but it may just be some kind of load fluctuations (though these tests are done on dedicated nodes). |
@JohanMabille and @jhcollins I verified that the numerical results are essentially identical on a 2D dambreak (two-phase) and 2D FSI (two-phase with mesh deformation/ALE). There are some differences on the order of 1e-22, which I suspect have to do with the compiler taking different paths at the aggressive -O3 optimization level. For both a "standard load" of 2000 vertices per core and a heavier load of about 10,000 vertices per core, the new indexing is actually slightly faster. @jhcollins let me know if you are able to identify the issue where you found non-trivial differences in the computed solutions. I tested on a Cray XC40 with gnu 7.3.0 compilers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 👍
Mandatory Checklist
Please ensure that the following criteria are met:
As a general rule of thumb, try to follow PEP8 guidelines.
Description