Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added new topology system post #41

Merged
merged 5 commits into from
Apr 3, 2017
Merged

Added new topology system post #41

merged 5 commits into from
Apr 3, 2017

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Feb 27, 2017

Included some of what @richardjgowers wrote on the wiki way back when. Feel free to add material or edit accordingly.

Included some of what @richardjgowers wrote on the wiki way back when.
Copy link
Member

@kain88-de kain88-de left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great posts thanks. I made some minor comments. I don't currently have time to work on them myself.

@@ -0,0 +1,102 @@
---
layout: post
title: A shiny, new topology system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A shiny, new and faster topology system

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. :D

title: A shiny, new topology system
---

With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did our job well they should notice that MDAnalysis has become faster!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kain88-de good point; replacing this statement with something less pointed.


For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).

## External changes that may affect how you use MDAnalysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start with the external changes. They are most interesting to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure there haven't been any deprecations that we definitely removed with this. I remember that some old attributes do not exist any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first also thought to start with the external changes. But in this case I would start with the internal ones because this is really what changed and it provides a positive motivation for the incompatibilities described in this section. Start on a positive note ;-).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah starting with the positives makes sense from a selling point.


```
"Atoms->Residues"[[2, 0, 1, 2]] --> [1, 0, 2, 1]
"Resnames"[[1, 0, 2, 1]] --> ['LYS', 'GLU', 'ALA', 'LYS']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use valid mdanalysis code here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just meant to illustrate what's happening internally; this isn't something that should be done by a user, though the equivalent would be to do:

u.residues[u.atoms[[2, 0, 1, 2]].resindices].resnames

I've adjusted the statement to be more clear that this is internally how the machinery works.

3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.

For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating the notebook

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook says

Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,

Please fix this in the notebook: the vesicles are available as stated in the README:

A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeprecationWarning in the notebook are really ugly. Can you get rid of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebook is now ok.

  • It links to the vesicle library and there is a note pointing to doi:10.6084/m9.figshare.3406708 so that should suffice.
  • deprecation warnings are gone

For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms?
Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)).

### How to work around this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Work around sounds like we introduced a bug. I would just say "working with the new system".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: How to work with the new system

### How to work around this

To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor.
For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is would also be good to mention what residue.positions returns now. As a direct contrast to above to show the benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... what does residue.positions return??? AttributeError... I am not sure what you're thinking of, @kain88-de .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it returns an attribute error now. I was adding the remark because above @dotsdl mentions that this was allowed but not well defined in the old topology. It may be worth mentioning that we have removed ill defined attributes like ResidueGroup.positions

@orbeckst
Copy link
Member

What @kain88-de says... will have a closer look later.

@kain88-de kain88-de mentioned this pull request Feb 28, 2017
6 tasks
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, see comments inline. Some notes on the gist performance notebook which I am posting here and not there.

---

With MDAnalysis 0.16.0 on the horizon, we wanted to showcase a major development that most users will probably not notice if we've done our job well.
In fall 2015, @richardjgowers and I set to work on redesigning the topology system from scratch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write in 3rd person and replace "I" with "@dotsdl"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the image of the blackboard for some additional history...

https://pbs.twimg.com/media/CwsdYTCVQAUUQsV.jpg
https://twitter.com/orbeckst/status/795762560767180800

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little amazed at how close to our first picture (I think this is a tidied version though) we managed to get

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include it in the post?

Now, over a year later, the finishing touches on this work are being prepared for release.
This post is meant to serve as a brief view to what has changed internally, what has changed externally, and what benefits this gives us looking forward to the future.

## Internal changes that shouldn't affect external behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike @kain88-de I actually come to like starting with the internals better. However, you make it sound dull. Rather use a heading such as

## Invisible changes that make working with MDAnalysis *faster*

Most of the changes are (or should be) invisible to the user. But they made some of the most fundamental operations in MDAnalysis quite a bit *faster*. Although this section is mostly of interest to developers, it is useful for all users to know the operations that MDAnalysis can now do much faster than before (and why).


In the new system, each atom is a member of exactly one residue, and each residue is a member of exactly one segment.
The new `Topology` object keeps an array giving the residue membership of each atom, and likewise an array giving segment membership of each residue.
Getting the resname of the residue of a group of atoms, then, is achieved by taking the indices of these atoms to fancy-index the `Atoms->Residues` array, and then using the result of this to fancy-index the `Resnames` array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link fancy-index to the numpy docs on indexing with arrays?

2. **Memory**. We don't store, for example, a resname for each atom, but instead store attributes at the level they make sense for.
3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Topologies become serializable and changes to topologies can be easily saved and communicated around. (TODO...)

3. **Consistency**. Since attributes are stored in one place, we avoid cases where the topology is in an inconsistent state, e.g. two atoms in the same residue give a different resname.
4. **No staleness**. Because e.g. `ResidueGroup`s are only an array of indices, not a list of `Residue` objects generated upon creation of the group, changes of resiude-level properties by another `ResidueGroup` are always reflected consistently by every other one. Data is not duplicated anywhere in this scheme, and is all contained in the `Topology` object.

For further performance comparisons, check out this [notebook](http://nbviewer.jupyter.org/gist/dotsdl/0e0fbd409e3e102d0458).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook says

Our systems were vesicle systems using repeats of vesicles from the vesicle library publicly hosted on github. At the moment these particular systems can't simply be downloaded because they are rather large,

Please fix this in the notebook: the vesicles are available as stated in the README:

A set of large vesicle systems, ranging in size from 1.75 M to 10 M particles are made available under doi:10.6084/m9.figshare.3406708.

```

Now each object only contains information pertaining to that particular object.
A ``Residue`` object only yields information about the residue; to get to the atoms, use ``Residue.atoms``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, to get the atoms from a Segment or a SegmentGroup use Segment.atoms or SegmentGroup.atoms. As before, you can get all residues associated with a group with Group.residues (which returns a ResidueGroup) and all segments with Group.segments (a SegmentGroup). Bottom line: you should now always be explicit about what you want.

### How to work around this

To access atom-level information from anything that isn't an ``AtomGroup``, use the `.atoms` level accessor.
For example, changing all `.positions` calls on anything that isn't an `AtomGroup` to `.atoms.positions`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... what does residue.positions return??? AttributeError... I am not sure what you're thinking of, @kain88-de .

For example, does ``ResidueGroup.charges`` give the charge of the residues or the atoms?
Also, it was unclear what size a given output would be (see [issue 411](https://github.com/MDAnalysis/mdanalysis/issues/411)).

### How to work around this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: How to work with the new system


A major benefit of the new topology system is that information about the topology of a ``Universe`` is now completely encapsulated in the ``Topology`` object.
This not only makes development and maintenance easier, but also opens the door to some exciting new possibilities as simulation systems grow larger.
A single ``Topology`` object can now be cleanly shared by multiple ``Universe`` instances, each with their own trajectory reader(s), making common operations such as fitting a trajectory to a reference structure or doing parallel analysis of many trajectories more feasible for large systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"more feasible" is weird in a post like this: either it make it feasible or not (and if not, just strike the sentence)

We look forward to the benefits this brings not only to the project, but also to all our users going forward.
We hope you like what we've done here.

-- @dotsdl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially if this based on @richardjgowers 's wiki page then I think you and him should sign it – in any case, that seems appropriate as you're both the architects of the new topology system.

@dotsdl
Copy link
Member Author

dotsdl commented Mar 1, 2017

@richardjgowers you should sign the bottom, too. :D

@richardjgowers
Copy link
Member

The only thing I can see that I think you've missed is that moving from array of structs to struct of arrays (AoS to SoA) also gives more flexibility of what topology information we do include.

I think most of the memory gains come from fewer total Python objects as Atoms are spawned lazily, and the Atom "struct" not holding every possible attribute, only those loaded in. (from being able to add another array running parallel to all others). This also gives Universe and AtomGroup a lot more flexibility in what they can show (ie adding AtomGroup.happiness)

Similarly the performance gains are because our access patterns are predominantly comparing a single attribute across atoms rather than values within an atom (favouring SoA not AoS).

Or maybe that a bit computer sciency to include, and we should keep the focus on how it feels to use it? We could write up a separate technical blog where we talk about how we arrived at our crazy AtomGroup transplanting pattern

@jbarnoud
Copy link
Contributor

What is holding this blog post back?

@orbeckst
Copy link
Member

orbeckst commented Mar 16, 2017

@orbeckst
Copy link
Member

@dotsdl can you attend to the outstanding changes, please? This is all that is holding up the post.

@orbeckst
Copy link
Member

Also change the date on the post to the day when you push your changes + 1.

@kain88-de
Copy link
Member

@orbeckst I addressed your comments

@orbeckst
Copy link
Member

orbeckst commented Apr 3, 2017

Added some minor changes and changed date to April 3, 2017. Please merge if you're happy.

@kain88-de
Copy link
Member

@dotsdl and @richardjgowers Is the post OK with you guys like that?

@richardjgowers
Copy link
Member

Yep LGTM

@kain88-de kain88-de merged commit 5eb2f9c into master Apr 3, 2017
@kain88-de kain88-de deleted the newtopology branch April 3, 2017 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants