-
Notifications
You must be signed in to change notification settings - Fork 663
GSoC 2016 Project Ideas
MDAnalysis already comes with a range of different standard analysis tools but currently lacks an implementation of a general dimension reduction algorithm, that can select an arbitrary number of dimensions of interest. 3 common general techniques are
- Principle Component Analysis
- Time Independent Component Analysis
- [Diffusion Maps] (http://arxiv.org/abs/1506.06259)
There are python implementations for all of these algorithms but none of them currently work with MDAnalysis out of the box. This is because the current python impementatoins work on normal numpy arrays that stores a complete trajectory in memory, but MDAnalysis never loads the whole trajectory but only one frame at a time. This approach allows MDAnalysis to treat very large system on a normal laptop or workstation.
Of course you can also suggest us another dimension reduction algorithm that you would like to implement.
MDAnalysis already supports a wide range of different MD-formats. But we are still missing some like the new TNG file format from Gromacs. You can also a format that you want to use personally in MDAnalysis.
To check if a new analysis-method works as intended it is often a good idea to use it with a random walk in different simple energy landscapes (A flat energy, harmonic well, double well). In this project you would develop a 'Reader' that produces random trajectories.
Python 3 is getting adopted by a wider range of users and unix distributions are starting to switch. MDAnalysis can't run right now under Python 3 mostly due to it's C/Cython extensions, we currently try to move our C-extensions to cython which supports Python 2 and 3 with one source. See also #260
None of the current devs has a Windows environment. But some research groups do use Windows and it would be nice they could use MDAnalysis as well. Since neither of us has experience with python extensions on windows we don't know what exactly is needed to make this happen.
Most MD-simulations produce way more data that we could fit into the RAM, even with a modern computer. To cope with this, MDAnalysis never loads a full trajectory but only one frame at a time. This comes with a performance penalty. There are new Python packages like Dask and Blaze that can potentially help us here. You should look into the different distributed computation numerical array libraries in python and implement a reader using it during the summer.
Help us implement a general pipeline to use multiple CPU-cores for analysis tasks (Dask, MPI, or even a hybrid approach)
Work with domain-decomposition algorithms to improve our distance search algorithms (cell grids) and/or implement distance search on GPUs.
Implement a formal flexible parser for atom selections (using pyparsing; see also our discussion on it).
Combine spatial densities (e.g. from time averaged quantities or experimental data such as electron densities) with atom-based queries in order to aid multiscaling approaches and comparisons between experiment and simulation.
Raise an issue in the Issue Tracker or contact us via the developer Google group.