-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace pthreads with openmp #37
base: master
Are you sure you want to change the base?
Conversation
Changes Unknown when pulling 1843574 on mode89:feat/openmp into ** on UCI-CARL:master**. |
Why do you want to replace pthreads with openmp? pthreads is lower level multi-threading API and can hard assign threads to cores and that is something we want. Create two new branches:
Run benchmark4 on both the two branches using mulicore machines (1, 4, 8, 16, 32 cores) and share the comparison. |
Hi, @hkashyap. I'll do benchmarking. I did the replacement because:
|
Hi, @hkashyap I ran benchmark4 on a machine with 8 logical cores. I didn't want to wait too much time, that's why I ran simulations with 300 and 400 synapses only. Here is run_benchmark4 that I used. Here is the summary: OpenMP
Output and record.csv files. pthreads
Output and record.csv files. I've created branches feat/benchmark-openmp and feat/benchmark-pthreads. |
Hi Andrew @mode89, |
Hi Ting-Shuo @tingshuc Yeah, the results are interesting. On Saturday I tried to run some simulations on a 4-core machine and as far as I remember OpenMP's implementation outperformed pthreads. Later I will run the benchmark on that machine and share the results with you. I've implemented building with OpenMP in CMake only. If you plan on using Make for building, then you will probably have to enable support of OpenMP by passing some additional compiler/linker flags. If you use GCC and default GNU's OpenMP implementation than the compiler flag should be Thank you for acknowlegment. I can be contacted via [email protected] |
Changes Unknown when pulling 1843574 on mode89:feat/openmp into ** on UCI-CARL:master**. |
Hi @mode89 thank you for the benchmark comparison. We will double check them using 500 and 600 synapses and with more cores (on a cluster) when we get some time. We will definitely do an analysis. |
Talking about cluster, we run these multicore simulations on clusters with many nodes, not on one node with many cores. Using openmp will mean that we will be restricted to single node. If we really need to use many nodes on a HPC setting, don't we need something like MPI. |
Actually we need OpenMP + MPI. If this is something @mode89 would help, we can jump into CARLsim5. |
Hey guys. I've never got my hands on MPI, but it sounds interesting and I'm keen on helping you with this. Hirak @hkashyap yes, you are right that OpenMP is bound to a single node, but I think it can work in conjunction with MPI, when MPI launches a single process per a node and each process parallels jobs through all of the node's CPUs using OpenMP. |
Here are the results of running benchmark4 on Intel Core i5-4210U with 4 logical cores.
Output for OpenMP and pthreads |
Hi @mode89 what do you mean by running 16 core simulations using 4 logical cores? You need 16 physical cores (on one node in case of OpenMP) to run the SNN simulations on 16 cores. |
Hi @hkashyap I meant 4 logical cores of CPU. And word "cores" in the table means the number which is passed to benchmark executable: it defines amount of used partitions. In run_benchmark4 script it's referenced as "number of cores". I've replaced word "Cores" with "Partitions" in the tables above. |
My personal opinion on this numbers is that the difference is minor. And the bigger amount of neurons we have, the smaller the difference is. Actually, pthread is quite low-level and I believe pthread-based CARLsim can be optimized to meet performance of OpenMP. I think, GNU's OpenMP even based on pthreads under the hood, because libgomp is linked against libpthread. My main concerns were readability and maintainability. Pthread-based implementation requires more code and more variables to keep track of, and additional helper functions. Actually, from the statistics of this pull request we can see 100 lines against 800 lines. Plus, OpenMP is cross-platform and all popular compilers support it out-of-the-box. |
@mode89 now I see what's going on here. My best guess is that your CPU has 4 core level parallelization. So you are not improving anything beyond 4 cores. OpenMP automatically assigns to these four cores for any number of runtimes >= 4. On the otherhand, since with pthreads we manually try to hard assign 8/16 threads over four cores, the performance goes down. Anyway, as I already explained the need to assign to multiple physical cores on multiple cluster nodes, which is the main focus above all. Thank you for detailing. |
Hi @mode89, I am finalizing this pull request by comparing against pthreads. Sorry for the delay, as I had a very busy summer. I have two questions:
Openmp: number of cores is 1 | | | | pthreads: number of cores is 1 | | | | |
Hi @hkashyap, no problem!
|
No description provided.