-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model population container type with scoring methods via settable prediction class attributes. #199
Comments
@russelljjarvis OK, so the main idea here is that you would like for the SciUnit model being judged, and all its methods being called, to be a population of smaller, possible heterogeneous models. In the concrete example of neuron simulations, the population is a population of neurons. As you may know, you can already set Feature extraction could also be handled outside of SciUnit, and the What I might be misunderstanding is whether there is any specific role for the notion of populations in SciUnit or not. Since in this scenario the only thing left for SciUnit to do is to compare predictions and observations, then the corresponding Would this do the trick? |
"scenario the only thing left for SciUnit to do is to compare predictions and observations, then the corresponding Test.compute_score would just operate on arrays of predictions and observations rather than single observations and predictions, and return an array of scores (or a ScoreArray indexed by individual models in the population). So this might require a Test.compute_scores which explicitly takes prediction and observation arrays and returns score arrays, and internally runs map or dask.map to vectorize that process." -- Yes! I totally agree with this. In the end I think I bypassed models completely and just did this (with observations, and predictions settable you can bypass models). Using StaticModels+dask seemed cumbersome, and it meant more lines of code than necessary. The only problem is the sciunit philosophy is very model orientated up until now, in that I am not aware of any documentation where scores are obtained in the absence of models. Below is how I used to do things, but I think its probably not memory efficient to create a heap of original SciUnit model objects, and SciUnit score objects. A lookup cache might work. As you suggest generating a list of predictions/features outside sciunit is often the more pragmatic choice. Pre-implemented
We implement
It's 3 where the problem is. I have used the 3a approach successfully in the past, it's flexible, the main problems are:
|
Related to half merged PR: (see #173).
Although, sciunit has great existing methods for scoring collections of models. When neuron network simulators are applied to the task of single-neuron model optimization, 3 notable simulators (PyNN, NEST, SNN.jl [and Brian2]) make it more convenient to model a population of cells (N>1), than a single cell (N=1). Indeed the noted simulators appear to be byte code optimized for evaluating a cell population.
Note there are pre-existing alternative, implementations that solve this problem for example the Numba JIT adaptive exponential and Izhikevich models, mean that you can avoid using network level simulators to run single cell models.
In the context of Genetic Algorithm Optimization a NEST neuron population can conveniently map straight onto a GA chromosome population, so in theory NeuronUnit+GA should be well suited to exploit the efficiency of network simulators to evaluate whole populations.
Unfortunately in practice NEST and PyNN sim platforms are a bad match for the current design pattern of Sciunit/NeuronUnit/BluePyOpt. To clarify collecting models by putting them inside a sciunit model collection container is essentially a separate and different task to collecting models in a NEST/PyNN/SNN.jl population.
To confound matters, very many so-called Python neuron simulators are not Python native. When trying to store collections of external code sim models inside sciunit containers, memory problems ensure, byte code efficiency is poor, and parallel distributed methods are very unlikely to work.
The problem is PyNN and NEST have their own preferred and safe ways to parallelize population model evaluations, but when you try to contain many single cell models (in a sciunit collection) and distribute this code (using dask, which is not what NEST expected). Relative to the external simulator, sciunit container class seems foreign and unsuited, distributing the sciunit container using
dask
in python is not safe because of nested parallelization. PyNN and NEST don't really support modelling a single cell so much as creating cell populations of size N=1.I propose a second design of sciunit model collections called SimPopulationModel. In the second design a simulator population is simply contained by a new sciunit class, and it inherits sciunit attributes (RunnableModel) etc.
In the SimPopulationModel sciunit class, model predictions and observations are just regular getter and setter based attributes that have been decoupled from sciunit generate_prediction/extract_feature methods. This means simulator users are completely free to get generate predictions/features, independently from the sciunit infrastructure. This is again necessary to exploit the efficiency of NEST to act on populations fast, which will mean that features are obtained in parallel in a highly NEST/PyNN/SNN.jl specific way.
By the time SimPopulationModel is ready to be judged all the computationally heavy work of feature extraction has been completed, and sciunit is free to do what it does best as predictions (features) and observations have now been provided by other infrastructure.
This method expects users to have pre-existing code that performs something like judge generate prediction. Prediction is then just treated as a settable object attribute, which can be updated. Doing things this way is more amenable to high throughput feature generation. Ie if EFEL generates 100 new features per model.
@all-contributors please add @russelljjarvis for ideas
The text was updated successfully, but these errors were encountered: