Added new topology system post #41

dotsdl · 2017-02-27T21:51:18Z

Included some of what @richardjgowers wrote on the wiki way back when. Feel free to add material or edit accordingly.

@richardjgowers

Included some of what @richardjgowers wrote on the wiki way back when.

kain88-de

Great posts thanks. I made some minor comments. I don't currently have time to work on them myself.

kain88-de · 2017-02-27T22:48:31Z

_posts/2017-03-06-new-topology-system.md

@@ -0,0 +1,102 @@
+---
+layout: post
+title: A shiny, new topology system 


A shiny, new and faster topology system

kain88-de · 2017-02-27T22:49:17Z

_posts/2017-03-06-new-topology-system.md

+title: A shiny, new topology system 
+---
+
+With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well.


If we did our job well they should notice that MDAnalysis has become faster!

@kain88-de good point; replacing this statement with something less pointed.

kain88-de · 2017-02-27T22:53:04Z

_posts/2017-03-06-new-topology-system.md

+
+For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).
+
+## External changes that may affect how you use MDAnalysis


start with the external changes. They are most interesting to the user.

Sure there haven't been any deprecations that we definitely removed with this. I remember that some old attributes do not exist any more.

I first also thought to start with the external changes. But in this case I would start with the internal ones because this is really what changed and it provides a positive motivation for the incompatibilities described in this section. Start on a positive note ;-).

OK yeah starting with the positives makes sense from a selling point.

kain88-de · 2017-02-27T22:54:13Z

_posts/2017-03-06-new-topology-system.md

+
+```
+"Atoms->Residues"[[2, 0, 1, 2]] --> [1, 0, 2, 1]
+"Resnames"[[1, 0, 2, 1]]        --> ['LYS', 'GLU', 'ALA', 'LYS']


can you use valid mdanalysis code here?

This is just meant to illustrate what's happening internally; this isn't something that should be done by a user, though the equivalent would be to do:

u.residues[u.atoms[[2, 0, 1, 2]].resindices].resnames

I've adjusted the statement to be more clear that this is internally how the machinery works.

kain88-de · 2017-02-27T22:55:42Z

_posts/2017-03-06-new-topology-system.md

+3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
+4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.
+
+For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).


thanks for updating the notebook

The notebook says

Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,

Please fix this in the notebook: the vesicles are available as stated in the README:

A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.

The DeprecationWarning in the notebook are really ugly. Can you get rid of them?

@orbeckst fixed.

Notebook is now ok.

It links to the vesicle library and there is a note pointing to doi:10.6084/m9.figshare.3406708 so that should suffice.

deprecation warnings are gone

kain88-de · 2017-02-27T22:59:52Z

_posts/2017-03-06-new-topology-system.md

+For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms?
+Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)).
+
+### How to work around this


Work around sounds like we introduced a bug. I would just say "working with the new system".

yes: How to work with the new system

kain88-de · 2017-02-27T23:00:25Z

_posts/2017-03-06-new-topology-system.md

+### How to work around this
+
+To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor.
+For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`.


Here is would also be good to mention what residue.positions returns now. As a direct contrast to above to show the benefit.

Yes... what does residue.positions return??? AttributeError... I am not sure what you're thinking of, @kain88-de .

Yes it returns an attribute error now. I was adding the remark because above @dotsdl mentions that this was allowed but not well defined in the old topology. It may be worth mentioning that we have removed ill defined attributes like ResidueGroup.positions

orbeckst · 2017-02-27T23:17:40Z

What @kain88-de says... will have a closer look later.

orbeckst

Thanks a lot, see comments inline. Some notes on the gist performance notebook which I am posting here and not there.

orbeckst · 2017-02-28T05:46:09Z

_posts/2017-03-06-new-topology-system.md

+---
+
+With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well.
+In fall 2015, @richardjgowers and I set to work on redesigning the topology system from scratch.


Maybe write in 3rd person and replace "I" with "@dotsdl"?

Include the image of the blackboard for some additional history...

https://pbs.twimg.com/media/CwsdYTCVQAUUQsV.jpg
https://twitter.com/orbeckst/status/795762560767180800

I'm still a little amazed at how close to our first picture (I think this is a tidied version though) we managed to get

Can you include it in the post?

orbeckst · 2017-02-28T06:09:36Z

_posts/2017-03-06-new-topology-system.md

+Now, over a year later, the finishing touches on this work are being prepared for release.
+This post is meant to serve as a brief view to what has changed internally, what has changed externally, and what benefits this gives us looking forward to the future.
+
+## Internal changes that shouldn't affect external behavior


Unlike @kain88-de I actually come to like starting with the internals better. However, you make it sound dull. Rather use a heading such as

## Invisible changes that make working with MDAnalysis *faster* Most of the changes are (or should be) invisible to the user. But they made some of the most fundamental operations in MDAnalysis quite a bit *faster*. Although this section is mostly of interest to developers, it is useful for all users to know the operations that MDAnalysis can now do much faster than before (and why).

orbeckst · 2017-02-28T06:12:17Z

_posts/2017-03-06-new-topology-system.md

+
+In the new system, each atom is a member of exactly one residue, and each residue is a member of exactly one segment.
+The new `Topology` object keeps an array giving the residue membership of each atom, and likewise an array giving segment membership of each residue.
+Getting the resname of the residue of a group of atoms, then, is achieved by taking the indices of these atoms to fancy-index the `Atoms->Residues` array, and then using the result of this to fancy-index the `Resnames` array.


Maybe link fancy-index to the numpy docs on indexing with arrays?

orbeckst · 2017-02-28T06:14:08Z

_posts/2017-03-06-new-topology-system.md

+2. **Memory**. We don't store, for example, a resname for each atom, but instead store attributes at the level they make sense for.
+3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
+4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.
+


Topologies become serializable and changes to topologies can be easily saved and communicated around. (TODO...)

orbeckst · 2017-02-28T06:18:51Z

_posts/2017-03-06-new-topology-system.md

+3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
+4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.
+
+For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).


The notebook says

Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,

Please fix this in the notebook: the vesicles are available as stated in the README:

A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.

orbeckst · 2017-02-28T06:35:50Z

_posts/2017-03-06-new-topology-system.md

+```
+
+Now each object only contains information pertaining to that particular object.
+A ``Residue`` object only yields information about the residue; to get to the atoms, use ``Residue.atoms``.


Similarly, to get the atoms from a Segment or a SegmentGroup use Segment.atoms or SegmentGroup.atoms. As before, you can get all residues associated with a group with Group.residues (which returns a ResidueGroup) and all segments with Group.segments (a SegmentGroup). Bottom line: you should now always be explicit about what you want.

orbeckst · 2017-02-28T06:39:28Z

_posts/2017-03-06-new-topology-system.md

+### How to work around this
+
+To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor.
+For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`.


Yes... what does residue.positions return??? AttributeError... I am not sure what you're thinking of, @kain88-de .

orbeckst · 2017-02-28T06:40:31Z

_posts/2017-03-06-new-topology-system.md

+For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms?
+Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)).
+
+### How to work around this


yes: How to work with the new system

orbeckst · 2017-02-28T06:44:58Z

_posts/2017-03-06-new-topology-system.md

+
+A major benefit of the new topology system is that information about the topology of a ``Universe`` is now completely encapsulated in the ``Topology`` object.
+This not only makes development and maintenance easier, but also opens the door to some exciting new possibilities as simulation systems grow larger.
+A single ``Topology`` object can now be cleanly shared by multiple ``Universe`` instances, each with their own trajectory reader(s), making common operations such as fitting a trajectory to a reference structure or doing parallel analysis of many trajectories more feasible for large systems.


"more feasible" is weird in a post like this: either it make it feasible or not (and if not, just strike the sentence)

orbeckst · 2017-02-28T06:46:51Z

_posts/2017-03-06-new-topology-system.md

+We look forward to the benefits this brings not only to the project, but also to all our users going forward.
+We hope you like what we've done here.
+
+-- @dotsdl


Especially if this based on @richardjgowers 's wiki page then I think you and him should sign it – in any case, that seems appropriate as you're both the architects of the new topology system.

dotsdl · 2017-03-01T04:57:42Z

@richardjgowers you should sign the bottom, too. :D

richardjgowers · 2017-03-01T11:15:52Z

The only thing I can see that I think you've missed is that moving from array of structs to struct of arrays (AoS to SoA) also gives more flexibility of what topology information we do include.

I think most of the memory gains come from fewer total Python objects as Atoms are spawned lazily, and the Atom "struct" not holding every possible attribute, only those loaded in. (from being able to add another array running parallel to all others). This also gives Universe and AtomGroup a lot more flexibility in what they can show (ie adding AtomGroup.happiness)

Similarly the performance gains are because our access patterns are predominantly comparing a single attribute across atoms rather than values within an atom (favouring SoA not AoS).

Or maybe that a bit computer sciency to include, and we should keep the focus on how it feels to use it? We could write up a separate technical blog where we talk about how we arrived at our crazy AtomGroup transplanting pattern

jbarnoud · 2017-03-16T00:04:51Z

What is holding this blog post back?

orbeckst · 2017-03-16T00:10:34Z

fixing the notebook https://gist.github.com/dotsdl/0e0fbd409e3e102d0458#gistcomment-2028604
addressing other requested changes (~~reorganization~~ see comments)

orbeckst · 2017-03-29T00:57:46Z

@dotsdl can you attend to the outstanding changes, please? This is all that is holding up the post.

orbeckst · 2017-03-29T00:59:13Z

Also change the date on the post to the day when you push your changes + 1.

kain88-de · 2017-04-02T10:57:28Z

@orbeckst I addressed your comments

orbeckst · 2017-04-03T05:52:11Z

Added some minor changes and changed date to April 3, 2017. Please merge if you're happy.

kain88-de · 2017-04-03T12:11:17Z

@dotsdl and @richardjgowers Is the post OK with you guys like that?

richardjgowers · 2017-04-03T12:44:01Z

Yep LGTM

Added new topology system post.

2ed2568

Included some of what @richardjgowers wrote on the wiki way back when.

kain88-de reviewed Feb 27, 2017

View reviewed changes

kain88-de mentioned this pull request Feb 28, 2017

Blog Post for release 0.16.0 #26

Merged

6 tasks

orbeckst requested changes Feb 28, 2017

View reviewed changes

Update 2017-03-06-new-topology-system.md

36995a0

address comments

be74d77

orbeckst approved these changes Apr 3, 2017

View reviewed changes

orbeckst added 2 commits April 2, 2017 22:48

topology posts: minor edits

aab7587

topology post: changed date to tomorrow

e388195

richardjgowers approved these changes Apr 3, 2017

View reviewed changes

kain88-de merged commit 5eb2f9c into master Apr 3, 2017

kain88-de deleted the newtopology branch April 3, 2017 12:47


		For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).

		## External changes that may affect how you use MDAnalysis

Added new topology system post #41

Added new topology system post #41

Conversation

dotsdl commented Feb 27, 2017

kain88-de left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Feb 27, 2017

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotsdl commented Mar 1, 2017

richardjgowers commented Mar 1, 2017

jbarnoud commented Mar 16, 2017

orbeckst commented Mar 16, 2017 • edited Loading

orbeckst commented Mar 29, 2017

orbeckst commented Mar 29, 2017

kain88-de commented Apr 2, 2017

orbeckst commented Apr 3, 2017

kain88-de commented Apr 3, 2017

richardjgowers commented Apr 3, 2017

orbeckst commented Mar 16, 2017 •

edited

Loading