-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new topology system post #41
Conversation
Included some of what @richardjgowers wrote on the wiki way back when.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great posts thanks. I made some minor comments. I don't currently have time to work on them myself.
@@ -0,0 +1,102 @@ | |||
--- | |||
layout: post | |||
title: A shiny, new topology system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A shiny, new and faster topology system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. :D
title: A shiny, new topology system | ||
--- | ||
|
||
With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did our job well they should notice that MDAnalysis has become faster!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kain88-de good point; replacing this statement with something less pointed.
|
||
For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458). | ||
|
||
## External changes that may affect how you use MDAnalysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start with the external changes. They are most interesting to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure there haven't been any deprecations that we definitely removed with this. I remember that some old attributes do not exist any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first also thought to start with the external changes. But in this case I would start with the internal ones because this is really what changed and it provides a positive motivation for the incompatibilities described in this section. Start on a positive note ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah starting with the positives makes sense from a selling point.
|
||
``` | ||
"Atoms->Residues"[[2, 0, 1, 2]] --> [1, 0, 2, 1] | ||
"Resnames"[[1, 0, 2, 1]] --> ['LYS', 'GLU', 'ALA', 'LYS'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use valid mdanalysis code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just meant to illustrate what's happening internally; this isn't something that should be done by a user, though the equivalent would be to do:
u.residues[u.atoms[[2, 0, 1, 2]].resindices].resnames
I've adjusted the statement to be more clear that this is internally how the machinery works.
3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname. | ||
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object. | ||
|
||
For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating the notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebook says
Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,
Please fix this in the notebook: the vesicles are available as stated in the README:
A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DeprecationWarning
in the notebook are really ugly. Can you get rid of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orbeckst fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notebook is now ok.
- It links to the vesicle library and there is a note pointing to doi:10.6084/m9.figshare.3406708 so that should suffice.
- deprecation warnings are gone
For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms? | ||
Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)). | ||
|
||
### How to work around this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Work around sounds like we introduced a bug. I would just say "working with the new system".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes: How to work with the new system
### How to work around this | ||
|
||
To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor. | ||
For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is would also be good to mention what residue.positions
returns now. As a direct contrast to above to show the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... what does residue.positions
return??? AttributeError
... I am not sure what you're thinking of, @kain88-de .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it returns an attribute error now. I was adding the remark because above @dotsdl mentions that this was allowed but not well defined in the old topology. It may be worth mentioning that we have removed ill defined attributes like ResidueGroup.positions
What @kain88-de says... will have a closer look later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, see comments inline. Some notes on the gist performance notebook which I am posting here and not there.
--- | ||
|
||
With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well. | ||
In fall 2015, @richardjgowers and I set to work on redesigning the topology system from scratch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write in 3rd person and replace "I" with "@dotsdl"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the image of the blackboard for some additional history...
https://pbs.twimg.com/media/CwsdYTCVQAUUQsV.jpg
https://twitter.com/orbeckst/status/795762560767180800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little amazed at how close to our first picture (I think this is a tidied version though) we managed to get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include it in the post?
Now, over a year later, the finishing touches on this work are being prepared for release. | ||
This post is meant to serve as a brief view to what has changed internally, what has changed externally, and what benefits this gives us looking forward to the future. | ||
|
||
## Internal changes that shouldn't affect external behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike @kain88-de I actually come to like starting with the internals better. However, you make it sound dull. Rather use a heading such as
## Invisible changes that make working with MDAnalysis *faster*
Most of the changes are (or should be) invisible to the user. But they made some of the most fundamental operations in MDAnalysis quite a bit *faster*. Although this section is mostly of interest to developers, it is useful for all users to know the operations that MDAnalysis can now do much faster than before (and why).
|
||
In the new system, each atom is a member of exactly one residue, and each residue is a member of exactly one segment. | ||
The new `Topology` object keeps an array giving the residue membership of each atom, and likewise an array giving segment membership of each residue. | ||
Getting the resname of the residue of a group of atoms, then, is achieved by taking the indices of these atoms to fancy-index the `Atoms->Residues` array, and then using the result of this to fancy-index the `Resnames` array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link fancy-index to the numpy docs on indexing with arrays?
2. **Memory**. We don't store, for example, a resname for each atom, but instead store attributes at the level they make sense for. | ||
3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname. | ||
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Topologies become serializable and changes to topologies can be easily saved and communicated around. (TODO...)
3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname. | ||
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object. | ||
|
||
For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The notebook says
Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,
Please fix this in the notebook: the vesicles are available as stated in the README:
A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.
``` | ||
|
||
Now each object only contains information pertaining to that particular object. | ||
A ``Residue`` object only yields information about the residue; to get to the atoms, use ``Residue.atoms``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, to get the atoms from a Segment
or a SegmentGroup
use Segment.atoms
or SegmentGroup.atoms
. As before, you can get all residues associated with a group with Group.residues
(which returns a ResidueGroup
) and all segments with Group.segments
(a SegmentGroup
). Bottom line: you should now always be explicit about what you want.
### How to work around this | ||
|
||
To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor. | ||
For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... what does residue.positions
return??? AttributeError
... I am not sure what you're thinking of, @kain88-de .
For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms? | ||
Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)). | ||
|
||
### How to work around this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes: How to work with the new system
|
||
A major benefit of the new topology system is that information about the topology of a ``Universe`` is now completely encapsulated in the ``Topology`` object. | ||
This not only makes development and maintenance easier, but also opens the door to some exciting new possibilities as simulation systems grow larger. | ||
A single ``Topology`` object can now be cleanly shared by multiple ``Universe`` instances, each with their own trajectory reader(s), making common operations such as fitting a trajectory to a reference structure or doing parallel analysis of many trajectories more feasible for large systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"more feasible" is weird in a post like this: either it make it feasible or not (and if not, just strike the sentence)
We look forward to the benefits this brings not only to the project, but also to all our users going forward. | ||
We hope you like what we've done here. | ||
|
||
-- @dotsdl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially if this based on @richardjgowers 's wiki page then I think you and him should sign it – in any case, that seems appropriate as you're both the architects of the new topology system.
@richardjgowers you should sign the bottom, too. :D |
The only thing I can see that I think you've missed is that moving from array of structs to struct of arrays (AoS to SoA) also gives more flexibility of what topology information we do include. I think most of the memory gains come from fewer total Python objects as Atoms are spawned lazily, and the Atom "struct" not holding every possible attribute, only those loaded in. (from being able to add another array running parallel to all others). This also gives Universe and AtomGroup a lot more flexibility in what they can show (ie adding Similarly the performance gains are because our access patterns are predominantly comparing a single attribute across atoms rather than values within an atom (favouring SoA not AoS). Or maybe that a bit computer sciency to include, and we should keep the focus on how it feels to use it? We could write up a separate technical blog where we talk about how we arrived at our crazy AtomGroup transplanting pattern |
What is holding this blog post back? |
|
@dotsdl can you attend to the outstanding changes, please? This is all that is holding up the post. |
Also change the date on the post to the day when you push your changes + 1. |
@orbeckst I addressed your comments |
Added some minor changes and changed date to April 3, 2017. Please merge if you're happy. |
@dotsdl and @richardjgowers Is the post OK with you guys like that? |
Yep LGTM |
Included some of what @richardjgowers wrote on the wiki way back when. Feel free to add material or edit accordingly.