This tutorial shows how to build a simple network model using NEURON and simulating it using CoreNEURON.
For up to date NEURON installation instructions with CoreNEURON, see documentation here. As CoreNEURON rely on auto-vectorisation compiler, make sure to use vendor compilers like Intel, Cray, AMD, NVIDIA HPC SDK.
Once NEURON is installed, and you set PATH
and PYTHONPATH
environmental variables then you should be able to do:
python -c "from neuron import h; from neuron import coreneuron"
If you get ImportError
then make sure PYTHONPATH
is set up correctly and python
version is same as the one used for NEURON installation.
Now we will start building the ringtest
model. Clone the GitHub repository as:
git clone https://github.com/nrnhines/ringtest.git
cd ringtest
This repository contains a mod
sub-directory which has MOD files (for the gap-junction related test). Using the standard NEURON workflow, we can build a special
executable using nrnivmodl
and -coreneuron
option. Make sure to load necessary compiler and MPI modules:
nrnivmodl -coreneuron mod
This will create x86_64/special
executable in the ringtest
directory (where x86_64
is platform architecture).
As described in the documentation here, one may have to make minor modifications to make a model compatible with CoreNEURON.
In ringtest.py
, the relevant changes have already been made:
h.cvode.cache_efficient(1)
if use_coreneuron:
from neuron import coreneuron
coreneuron.enable = True
coreneuron.gpu = coreneuron_gpu
so the -coreneuron
(use_coreneuron
) and -gpu
(coreneuron_gpu
) arguments can be used to enable CoreNEURON.
By using coreneuron
module, one can enable CoreNEURON as shown above. Note that CoreNEURON requires NEURON's internal data structures to be in cache efficient form and hence the cvode.cache_efficient(1)
method must be executed prior to initialization of the model i.e. h.stdinit()
.
We are ready to run this model using NEURON or CoreNEURON. To run with NEURON, you can do:
mpiexec -n 2 ./x86_64/special -mpi -python ringtest.py -tstop 100
This will run NEURON for 100 milliseconds
using 2 MPI processes and writes spike output to the spk2.std
file. These are standard steps for running simulations with NEURON (which you are already familiar with).
Note that ringtest.py
has a prun
method which internally calls pc.psolve(tstop)
. If coreneuron.enable
is set to True
then NEURON will internally use CoreNEURON to simulate the model. We can now use -coreneuron
CLI parameter to run the model using CoreNEURON as:
mpiexec -n 2 ./x86_64/special -mpi -python ringtest.py -tstop 100 -coreneuron
This will run simulation using CoreNEURON for 100 milliseconds
and writes spike output to the spk2.std
file. If you compare the spikes generated by NEURON run and CoreNEURON run then should be identical. Make sure to sort spikes using sortspike
command provided by NEURON:
sortspike spk2.std spk2.coreneuron.std
If you have compiled NEURON+CoreNEURON with GPU support, you can run the model on GPU using -gpu
CLI option:
mpiexec -n 2 ./x86_64/special -mpi -python ringtest.py -tstop 100 -coreneuron -gpu
We typically use the number of MPI ranks per node equal to the number of GPUs per node. So, if you have X
number of nodes where each node has Y
GPUs then the total number of MPI ranks are X x Y
.
NEURON uses PThread
to support threading whereas CoreNEURON uses OpenMP. In order to enable thread usage you have to set appropriate number of threads using ParallelContext.nthread(). With this, if you run simulation using CoreNEURON, for each thread on NEURON side CoreNEURON will create an equivalent OpenMP thread. With this ring test we have CLI option -nt
that you can use to enable threads:
mpiexec -n 2 ./x86_64/special -mpi -python ringtest.py -tstop 100 -coreneuron -nt 2
With -nt 2
, each MPI process will start 2 OpenMP threads on CoreNEURON side for simulation.
Here are some additional points if you want to compare performance between NEURON and CoreNEURON :
- Make sure to use optimization flags depending upon your compiler suite (see CoreNEURON page)
- Prefer Intel/Cray/PGI compilers over GCC and Clang (specifically for CoreNEURON to enable vectorisation)
- In order to compare performance, your model should be sufficiently large (and not a trivial test). For example, in the above tutorial, we built a small network with 16 rings each with 8 cells. You can build a larger network using additional command line arguments as:
mpirun -n 4 ./x86_64/special -mpi -python ringtest.py -tstop 100 -nring 1024 -ncell 128 -branch 32 64
See command line arguments for more information about arguments:
→ python ringtest.py -h
usage: ringtest.py [-h] [-nring N] [-ncell N] [-npt N] [-branch N N]
[-compart N N] [-tstop float] [-gran N] [-rparm]
[-filemode] [-gpu] [-show] [-gap] [-coreneuron] [-nt N]
[-multisplit]
optional arguments:
-h, --help show this help message and exit
-nring N number of rings (default 16)
-ncell N number of cells per ring (default 8)
-npt N number of cells per type (default 8)
-branch N N range of branches per cell (default 10 20)
-compart N N range of compartments per branch (default [1,1])
-tstop float stop time (ms) (default 100.0)
-gran N global Random123 index (default 0)
-rparm randomize parameters
-filemode Run CoreNEURON with file mode
-gpu Run CoreNEURON on GPU
-show show type topologies
-gap use gap junctions
-coreneuron run coreneuron
-nt N nthread
-multisplit intra-rank thread balance. All pieces of cell on same rank.
In order to compare the performance of NEURON and CoreNEURON, we compiled both simulators using Intel compilers. Note that the ringtest is not ideal for benchmarking due to low comutational complexity of each cell (cell.hoc
uses HH
channel in soma compartment).
NOTE : When you run simulation using NEURON and call pc.psolve(tstop)
, it will directly start execution of timesteps. But in case of CoreNEURON, when you call pc.psolve(tstop)
, NEURON will internally do the following steps:
- Copy in-memory model to CoreNEURON
- CoreNEURON will copy data to GPU if GPU is enabled
- CoreNEURON will run timesteps
- CoreNEURON will copy back results from CPU to GPU if GPU is enabled
- NEURON will copy results back from CoreNEURON
These steps are typically fast but if you are running a very small model for short duration then you might see an overhead. So make sure to check timing stats printed by CoreNEURON. For example, CoreNEURON prints simulation time (i.e. Step 3) in the form of Solver Time : X Seconds
.
For comparison of NEURON and CoreNEURON performance, here are some executions with NEURON and CoreNEURON running on single core:
# NEURON CPU Run
$ mpiexec -n 1 ./x86_64/special -mpi -python ringtest.py -tstop 10 -nring 128 -ncell 128 -branch 32 64
.........
runtime=20.86 load_balance=100.0% avg_comp_time=20.8548
.......
# CoreNEURON CPU Run
$ mpiexec -n 1 ./x86_64/special -mpi -python ringtest.py -tstop 10 -nring 128 -ncell 128 -branch 32 64 -coreneuron
.........
Solver Time : 15.846
.........
# CoreNEURON GPU Run
$ mpiexec -n 1 ./x86_64/special -mpi -python ringtest.py -tstop 10 -nring 128 -ncell 128 -branch 32 64 -coreneuron -gpu
..........
Solver Time : 0.431226
..........
For above test the execution time is reduced from 174sec to 78sec for CPU and to 22.7sec for GPU. This speedup is still less than expected due to lower computational complexity of the model. But also note that we are running CPU execution with single core. Make sure to use all CPU resources for actual comparison. Try with your model and if you see any performance issues, please open an issue on NEURON GitHub repository.