Replace synRemap mechanism with much simpler one #511
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In older versions of GeNN which used a plain CSR matrix format for sparse connectivity, figuring out what presynaptic and postsynaptic neuron a synapse dynamics kernel thread was associated with, required a data structure called
indInG
. When we switched to the new 'ragged' format, this was no longer necessary but, for some reason, I introduced a new (synRemap
) lookup structure which maps contiguous (0, numSynapses] thread indices to indices into the ragged matrix. This adds another 32-bits of memory per sparse synapse, significantly slows down synapse dynamics as each synapse has to read the 32-bit from global memory and makes doing structural plasticity much harder than it needs to be as you need to keep thesynRemap
structure up to date with all the other bits of synapse state. Theoretically this could have saved a few idle threads but, it doesn't because GeNN has no way of knowing how many synapses there are at compile time so it launchessg.getSrcNeuronGroup()->getNumNeurons() * sg.getMaxConnections()
anyway. ThesynRemap
structure is an internal detail which users shouldn't ever be messing with so I have totally removed it and, in this PR, just divide the same number of threads into pre and post indices with / and % respectively (duh). On my machine this reduces the time spent in synapse dynamics kernels by around 25%!