Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecations on all components to disappear or change in Topology Refactor #599

Closed
11 tasks done
dotsdl opened this issue Dec 27, 2015 · 77 comments
Closed
11 tasks done

Comments

@dotsdl
Copy link
Member

dotsdl commented Dec 27, 2015

The 0.16 release features a complete overhaul of the topology system, giving better performance and consistency in a major part of MDAnalysis' core functionality. This also includes the disappearance of some API elements that no longer made sense; these must be marked as deprecated in the 0.15 release.

Add to this list, and check when marked deprecated in develop branch.

Deprecate and recommend alternative:

  • AtomGroup.get_* getters
  • AtomGroup.set_* setters
  • ResidueGroup.get_* getters
  • ResidueGroup.set_* setters
  • SegmentGroup.get_* getters
  • SegmentGroup.set_* setters

Deprecate and indicate workaround (eg use RG.atoms.names)

  • Residue as AtomGroup
  • ResidueGroup as AtomGroup
  • Segment as AtomGroup
  • SegmentGroup as AtomGroup

Indicate when the deprecation will happen

@richardjgowers
Copy link
Member

Ahh this is a good point. We'll have to start a list on the wiki or something... once everything is finalised

@orbeckst
Copy link
Member

Wiki/Issue363-Changes by @richardjgowers :

@richardjgowers
Copy link
Member

@orbeckst Yes. If you're using Residues, ResidueGroups Segments or SegmentGroups as AtomGroups, (accessing atom level properties such as positions off them) then yes it will break things.

WRT #411, an XGroup now always returns things of len(XGroup), with the exception being .atoms .residues and .segments which return a different XGroup

RG.atoms returns a set* of the Atoms in that ResidueGroup
RG.atoms.positions returns all the positions of all the atoms in the RG

  • I think we made all level shifts (.atoms, .residues and .segments) return sets

@kain88-de
Copy link
Member

@richardjgowers @dotsdl the changes seem to be very nice. For the changes page. Could you maybe create some real-code example how things will change from the current Topology Model to the new Model. I also would'nt mind if you start doing that in a notebook that you upload and link somewhere (a gist or maybe already as a new blog-post).

@dotsdl
Copy link
Member Author

dotsdl commented Jan 4, 2016

I think we made all level shifts (.atoms, .residues and .segments) return sets

Correct. Doing

AG.atoms

will give an AtomGroup containing only one instance of each atom represented within AG in atom index order. The same goes for:

RG.atoms
SG.atoms

The same principle applies for .residues and .segments from any Group. You always get what amounts to a set.

@richardjgowers
Copy link
Member

I've been thinking about this, and it's a little complicated to add in deprecation warnings, but I think it's possible. So for example, RG.masses currently just inherits the AG.masses method, so you can't actually decorate the RG version as it doesn't exist as its own object.

But if we made a DeprecatedAtomGroup class, we could then deprecate everything...

class AtomGroup
    # mostly unchanged

class DeprecatedAtomGroup(AtomGroup)
    # add in some magic here to give deprecation warnings
    # maybe even hacking into getattr.......

class Residue(DeprecatedAtomGroup)

class RG(DeprecatedAtomGroup)

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

On 5 Jan, 2016, at 05:35, Richard Gowers wrote:

I've been thinking about this, and it's a little complicated to add in deprecation warnings, but I think it's possible. So for example, RG.masses currently just inherits the AG.masses method, so you can't actually decorate the RG version as it doesn't exist as its own object.

But if we made a DeprecatedAtomGroup class, we could then deprecate everything...

There's still @kain88-de's valid question as to should we deprecate if we cannot provide an alternative yet.

I think the answer is that this will be a second API break and we need to think about how to mitigate this. The last break already annoyed users, as I was being told.

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

Question about

I think we made all level shifts (.atoms, .residues and .segments) return sets

Are these groups unordered or ordered, i.e. like real sets (random order) or lists with unique elements in defined order?

@richardjgowers
Copy link
Member

I don't think @dotsdl and I really 100% decided on the set idea, it just seemed like the neatest solution.

So considering a trivial system with 4 atoms 2 residues:

(a1, r1)
(a2, r1)
(a3, r2)
(a4, r2)

I think upshifts should (and currently do?) return a set:

AtomGroup(a1, a2, a3).residues = RG(r1, r2)

AtomGroup(a1, a3, a2).residues = RG(r1, r2)

It's the downshifts that are a little confusing, so this seems clear

RG(r1, r2).atoms - AG(a1, a2, a3, a4)

But then others are more confusing

RG(r1, r2, r1).atoms = AG(a1, a2, a3, a4) or AG(a1, a2, a3, a4, a1, a2)

Or more generically:

RG.atoms = set(R.atoms for R in RG) or [R.atoms for R in RG]

None of this will require much work to change on the new branch, so a discussion is welcome!

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

The current implementation does something like

RG.atoms = set(R.atoms for R in RG)

I also wondered if you would want duplicate atoms for RG(r1, r2, r1).atoms and I am tending towards the existing behavior that is also semantically more clear: "Give me the atoms contained in these residues". If you want specific atom orderings then you need to use + or select_atoms(sel1, sel2, sel1). However: discuss!

EDIT P.S.: If sets then I would want ordered sets, i.e. atoms in ascending atom index order.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

Are these groups unordered or ordered, i.e. like real sets (random order) or lists with unique elements in defined order?

They are always ordered by {atom,residue,segment} index.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

To elaborate further on the why...

Making *Group.atoms, *Group.residues, and *Group.segments always yield ordered sets, no matter the level coming from or going to, is the simplest rule that also accommodates an intuitive use. If one wants the resids of each atom in an AtomGroup, one can already get that with:

AG.resids

If one wants the resids of the residues represented among the atoms in the AtomGroup, one can then do:

AG.residues.resids

Likewise, if one wants an AtomGroup with no duplicates, that's easy with:

AG.atoms

The point is, making .atoms, .residues, .segments always yield ordered sets is a simple rule that also affords functionality that is gained, but not lost anywhere.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

TLDR: Any other scheme requires probably different rules for what you get depending on which level you are coming from, which gets confusing and unintuitive real fast. We considered not using sets for about a day before we realized other schemes were hard to explain even to ourselves.

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

On 5 Jan, 2016, at 12:47, David Dotson wrote:

Likewise, if one wants an AtomGroup with no duplicates, that's easy with:

AG.atoms

The point is, making .atoms, .residues, .segments always yield ordered sets is a simple rule that also affords functionality that is gained, but not lost anywhere.

I am not so sure that I like the fact that you can have AG != atoms... at the moment

ag = u.atoms[[0,1,0]]
ag.atoms == ag

You might break code in rather subtle ways.

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

On 5 Jan, 2016, at 12:50, David Dotson wrote:

TLDR: Any other scheme requires probably different rules for what you get depending on which level you are coming from, which gets confusing and unintuitive real fast. We considered not using sets for about a day before we realized other schemes were hard to explain even to ourselves.

Fair enough, but I think the discussion here really shows that we have to be careful how we introduce this and how we prepare users. It's all good to be excited about a sleek new implementation (which is great, don't get me wrong!) but you also have a responsibility towards the user base � and most of them hate having to debug code that "used to work". Thus, it should be very clear what changes (and the wiki page is a starting point). I also liked the idea of writing a blog post before a merge, something like a vision for what needs to be changed and why and explain some of the rationale.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

I am not so sure that I like the fact that you can have AG != atoms... at the moment

ag = u.atoms[[0,1,0]]
ag.atoms == ag

You might break code in rather subtle ways.

What is the point of ag.atoms if it gives you exactly the same object as simply ag? I get that there's probably a lot of code that uses these semantics, but I think it makes about as much sense as if a numpy array had a .array property that just pointed to itself.

@orbeckst
Copy link
Member

orbeckst commented Jan 5, 2016

Because the underlying principle was "everything is an AtomGroup", it was very convenient to either use the object, let's call it g itself or g.atoms, if you were not sure that g was actually a base AtomGroup and not e.g. a ResidueGroup. It made it much more convenient to use groups at different levels of the hierarchy.

What your are proposing for .atoms changes the meaning quite dramatically, from "these are the atoms as we have them recorded and arranged in this group" to "unique atoms in the group".

I am not saying that we can't change this but one has to be a bit sensitive to current usage.

@kain88-de
Copy link
Member

How do you say an atom is unique? Do you go with the ID of the atom?

I still think it would be very nice to have an example where old code has to be changed or would be broken by the changes. That would make it easier to see how the changes would affect users.

@richardjgowers
Copy link
Member

@kain88-de yeah, an Atom is just an index. Or rather, all Atoms/Residues/Segments are uniquely defined according to (Universe, Level (A/R/S), Index). The class defines the first two, so within the class you just worry about indices.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

@orbeckst I think there are a few possible options for this whole scheme we can debate over to help focus discussion

1. Everything gives a set

Same rule for everything, in which a call to .atoms, .residues, .segments always gives a group of unique elements represented at that level.

Call Gives
AtomGroup.atoms ordered AtomGroup of unique atoms represented in AtomGroup
AtomGroup.residues ordered ResidueGroup of unique residues represented in AtomGroup
AtomGroup.segments ordered SegmentGroup of unique segments represented in AtomGroup
ResidueGroup.atoms ordered AtomGroup of unique atoms represented in ResidueGroup
ResidueGroup.residues ordered ResidueGroup of unique residues represented in ResidueGroup
ResidueGroup.segments ordered SegmentGroup of unique segments represented in ResidueGroup
SegmentGroup.atoms ordered AtomGroup of unique atoms represented in SegmentGroup
SegmentGroup.residues ordered ResidueGroup of unique residues represented in SegmentGroup
SegmentGroup.segments ordered SegmentGroup of unique segments represented in SegmentGroup

2. Everything gives a set except for the same level

Same as 1, with the exception that at the same level you get back the same object. Calling ResidueGroup.atoms gets you a set always, whereas AtomGroup.atoms does not.

Call Gives
AtomGroup.atoms AtomGroup identical to the calling AtomGroup
AtomGroup.residues ordered ResidueGroup of unique residues represented in AtomGroup
AtomGroup.segments ordered SegmentGroup of unique segments represented in AtomGroup
ResidueGroup.atoms ordered AtomGroup of unique atoms represented in ResidueGroup
ResidueGroup.residues ResidueGroup identical to the calling ResidueGroup
ResidueGroup.segments ordered SegmentGroup of unique segments represented in ResidueGroup
SegmentGroup.atoms ordered AtomGroup of unique atoms represented in SegmentGroup
SegmentGroup.residues ordered ResidueGroup of unique residues represented in SegmentGroup
SegmentGroup.segments SegmentGroup identical to the calling SegmentGroup

3. Order and repeat elements preserved as much as possible

Adds unique method/property to each *Group class that returns a set of the object.

At the atom level, calling .atoms, .residues, or .segments gets a group with an element for each atom.

At the residue level, calling .atoms gives an AtomGroup with repeats if the calling ResidueGroup itself had repeats, and atoms are given in the same order as the residues in the ResidueGroup. Calling .residues and .segments gets a group with an element for each residue.

At the segment level, calling .residues gives a ResidueGroup with repeats if the calling SegmentGroup itself had repeats, and residues are given in the same order as the segments in the SegmentGroup. Calling .atoms gives an AtomGroup of the atoms represented by each residue in residue order in each segment in segment order in the calling SegmentGroup, including repeats. Calling segments gets a group with an element for each segment.

Call Gives
AtomGroup.atoms AtomGroup identical to the calling AtomGroup
AtomGroup.residues ResidueGroup of len(AtomGroup) giving the residue for each atom
AtomGroup.segments SegmentGroup of len(AtomGroup) giving the segment for each atom
ResidueGroup.atoms AtomGroup giving the atoms for each residue in the calling ResidueGroup's order
ResidueGroup.residues ResidueGroup identical to the calling ResidueGroup
ResidueGroup.segments SegmentGroup of len(ResidueGroup) giving the segment for each residue
SegmentGroup.atoms AtomGroup giving the atoms for each segment in the calling SegmentGroup's order
SegmentGroup.residues ResidueGroup giving the residues for each segment in the calling SegmentGroup's order
SegmentGroup.segments SegmentGroup identical to the calling SegmentGroup

4 Upwards give sets, downwards gives list comprehensions

Similar to 3 except Giving AtomGroup.residues as len(AG) is a little silly, so use 1's rule for upwards shifts.

Adds unique to return a set of the object.

1's downwards rules can be done using ResidueGroup.unique.atoms (but 1 can't really recreate the downward shifts here).

Call Gives
AtomGroup.atoms AtomGroup back again (AtomGroup.atoms == AtomGroup)
AtomGroup.unique set(AtomGroup)
AtomGroup.residues set([Atom.residue for Atom in AG])
AtomGroup.segments set([Atom.segment for Atom in AG])
ResidueGroup.atom concatenate([Residue.atoms for Residue in RG])
ResidueGroup.unique set(ResidueGroup)
ResidueGroup.residues ResidueGroup
ResidueGroup.segments set([R.segment for R in ResidueGroup])
SegmentGroup.atoms concatenate([S.atoms for S in SegmentGroup])
SegmentGroup.unique set(SegmentGroup)
SegmentGroup.residues concatenate([S.residues for S in SegmentGroup])
SegmentGroup.segments SegmentGroup

@dotsdl
Copy link
Member Author

dotsdl commented Jan 5, 2016

I'm an advocate for 1, although the others can technically work.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jan 6, 2016

The main difference between 1 and 2 is that, in 1, AtomGroup.atoms removes duplicates, while 2 keeps them. Am I right?

To what extent are AtomGroups mutable? Indeed, the other difference I see is that, in 1, AtomGroup.atoms return a copy, it does not in 2.

I foresee less surprise with 1 as we can say that these properties always return a new object ans that they always deduplicate elements.

I am curious about 3. Is it possible to get (even not directly), with 1, what 3 would output?

@richardjgowers
Copy link
Member

@jbarnoud yes I think you're correct about 1 vs 2

What I don't like in 1 is that AG.atoms doesn't explicitly say it's a set, I'd much rather that AG.atoms == AG and then add AG.unique == set(AG)

I think I'd like to invent 4, which is identical to 3 except AG.residues returns a set (and AG.segments, RG.segments)

@tylerjereddy
Copy link
Member

I'll concede to just skimming this issue, but do let me know if I need to write any lib2to3 fixers for API changes. Extending ten2eleven to provide warning messages where the deprecation architecture is complicated or philosophically not correct might be an idea.

@hainm
Copy link
Contributor

hainm commented Feb 13, 2016

And what does segment do? Why not only atoms and residues?

@orbeckst
Copy link
Member

On 12 Feb, 2016, at 17:31, Hai Nguyen wrote:

And what does segment do? Why not only atoms and residues?

Larger units, such as, say actual molecules.

Atoms are real, molecules are real, residues are convenient ;-)

(But of course, the whole solvent or the membrane could also conveniently be grouped as a segment, so that's why we are not calling it a molecule... that's what fragments are for, but they don't fit into the hierarchy.)

@dotsdl
Copy link
Member Author

dotsdl commented Feb 13, 2016

@hainm our new topology system is more performant, both in speed and memory usage, and more flexible (no hardcodes of attributes) than either the current MDAnalysis topology system or that of mdtraj. There is no reason to regress on this front. The new system also preserves much of the existing intuitive API of MDAnalysis that works quite well, but internally eliminates the possibility of staleness along with the performance benefits.

The only things that need deciding are what attribute access down the hierarchy returns. See the conversation earlier and my gist on the question.

@dotsdl
Copy link
Member Author

dotsdl commented Feb 13, 2016

@orbeckst

So to sidestep the ambiguity of accessing lower-level attributes from a single object (Segment, Residue), we've so far left out properties for these. That means that there is no Segment.names or Residue.names, even though there is a SegmentGroup.names and ResidueGroup.names.

Instead, accessing names from a Segment can be done with either Segment.atoms.names or Segment.residues.names. Accessing names from a Segment in this way requires one to be explicit in the level desired, which under my proposal would yield iterables structured according to the level used.

Is this a satisfying resolution? In short, we leave out lower-level properties from Residue and Segment, requiring explicit level specification. This is internally consistent with the idea that what is returned is an iterable of equal length to the calling object, since Residue and Segment aren't iterables at all.

@dotsdl
Copy link
Member Author

dotsdl commented Feb 13, 2016

Ah, should have re-read your summary:

Thus I propose to resolve any ambiguous cases by referring to properties in the next lower tier in the hierarchy.

You're in favor of making Segment behave roughly like ResidueGroup and Residue behave roughly like AtomGroup, then? I think that's my reading of it. I think it's doable, but would love to hear what @richardjgowers thinks.

@hainm
Copy link
Contributor

hainm commented Feb 13, 2016

@hainm our new topology system is more performant, both in speed and memory usage, and more flexible (no hardcodes of attributes) than either the current MDAnalysis topology system or that of mdtraj. There is no reason to regress on this front. The new system also preserves much of the existing intuitive API of MDAnalysis that works quite well, but internally eliminates the possibility of staleness along with the performance benefits.

yeah, I only make comment based on what I look at MDA and mdtraj tutorials. So I don't know what users want to use. Can you propose a detail example about new topology? Whether the new design will be widely used from users? How often it is used? I am afraid that we are trying to introduce many idea but ending no one ever uses (I throwed away 1/2 pytraj code that I wrote for 1/2 year to get simplicity :D).

@hainm
Copy link
Contributor

hainm commented Feb 13, 2016

For example: how often people use '_setter' method for SegmentGroup? If this design is for Topology editit, should let user to use another more-dedicated program?

@dotsdl
Copy link
Member Author

dotsdl commented Feb 13, 2016

Can you propose a detail example about new topology?

The new topology system works under the hood. Most users won't notice it, save for the performance gains and the API breaks here and there (which we are discussing now, in this very thread, and deprecating ahead of time what we can).

See the wiki for some details on what's changed and why. More will be written about it as we get closer to the merge of issue-363 into develop. You can give it a spin if you like, too. Most things work already.

@dotsdl
Copy link
Member Author

dotsdl commented Feb 13, 2016

For example: how often people use '_setter' method for SegmentGroup? If this design is for Topology editit, should let user to use another more-dedicated program?

Huh? I have no idea what you're asking. At any rate, this thread is pretty focused on a particular question, and I'd rather we keep it focused on that question until it's resolved. You are welcome to have a look at the issue-363 branch yourself to see what's going on there.

@hainm
Copy link
Contributor

hainm commented Feb 13, 2016

thanks.

sorry to dilute this issue.

@richardjgowers
Copy link
Member

Hey

I think the nested returns are new since we last spoke, but I think they make sense. The only problem I see with them is that some atom properties don't work like that (mass and charge) and so it's not completely smooth. Ie RG.Atomprop isnt always always the same shape (but is always the same len or shape[0])

@orbeckst
Copy link
Member

@dotsdl ,

You're in favor of making Segment behave roughly like ResidueGroup and Residue behave roughly like AtomGroup, then? I think that's my reading of it. I think it's doable, but would love to hear what @richardjgowers thinks.

Yes, together with your proposal to use nested lists (or iterables) where appropriate.

It is true that this breaks orthogonality (as @richardjgowers pointed out) but I think we can gain more in functionality. At a minimum we do guarantee that for plural attributes one always gets an iterable of len(group), with the items either being aggregated (charge, mass, ...) or being iterables themselves.

@dotsdl dotsdl modified the milestones: 0.14, 0.14.1 - Bugfixes Feb 18, 2016
@dotsdl
Copy link
Member Author

dotsdl commented Mar 11, 2016

@orbeckst given your comment here, I take it this means we're fine with leaving out things like Residue.names that go downward in the hierarchy? That is, getting an atom-level property from a residue means doing e.g. Residue.atoms.names? If so, I'm in favor and will move forward with this (actually, I think we're already there...).

@orbeckst
Copy link
Member

Yes.

Oliver Beckstein
email: [email protected]

Am Mar 10, 2016 um 19:18 schrieb David Dotson [email protected]:

@orbeckst given your comment here, I take it this means we're fine with leaving out things like Residue.names that go downward in the hierarchy? That is, getting an atom-level property from a residue means doing e.g. Residue.atoms.names? If so, I'm in favor and will move forward with this (actually, I think we're already there...).


Reply to this email directly or view it on GitHub.

@dotsdl
Copy link
Member Author

dotsdl commented Mar 11, 2016

Excellent. There's a fairly clear path forward then. Will keep hammering to get everything in the right place. As I said, I think we're at least 90% there already.

@mnmelo
Copy link
Member

mnmelo commented Mar 11, 2016

I'm also cool with that, but what is then the recommended way to have a list of all residue names? (Unduplicated, one name per residue)

@dotsdl
Copy link
Member Author

dotsdl commented Mar 11, 2016

@mnmelo no matter what group you're using to start with, you'd do grp.residues.resnames.

@mnmelo
Copy link
Member

mnmelo commented Mar 11, 2016

Ok, suspected as much. Cool with me.

@richardjgowers
Copy link
Member

@dotsdl if you had a RG, doesn't RG.residues just return itself? I think grp.residues.unique is the most foolproof way?

@dotsdl
Copy link
Member Author

dotsdl commented Mar 11, 2016

@richardjgowers ah yeah, fair enough. doing grp.residues will give you a ResidueGroup formed according to the table (4).

@mnmelo if you want resnames for each residue but with absolutely no duplicate residues, you would do grp.residues.unique.resnames to get this.

dotsdl added a commit that referenced this issue Mar 28, 2016
@dotsdl
Copy link
Member Author

dotsdl commented Mar 30, 2016

Once #698 is merged, I'll fill in the checklist and we'll close. :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants