Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contact improvements #792

Merged
merged 3 commits into from
Apr 22, 2016
Merged

Contact improvements #792

merged 3 commits into from
Apr 22, 2016

Conversation

kain88-de
Copy link
Member

@kain88-de kain88-de commented Mar 20, 2016

This continues the work in #708. I have stripped down Contacts to the absolute bare essential (besides single_frame everything is gone, renamed or moved to it's own function). I think this results in a cleaner implementation in the end and also one that is more flexible. I'm aware that I might have removed useful functionality but I'm happy to add that again if someone explains to me why they need it.

I've created a ipython notebook in the root-directory where I outlined the normal fraction of native contacts analysis and a new analysis for an easy binary decision algorithm if a protein is folded or two partners are bound. I plan to keep this notebook always at the top comment and remove it before I merge. I would like some help if this is all the usage that people want and what use cases are currently missing as example, or if they can't be done with the class right now)

The plots are now removed as well because I don't think the plot_qavg contained a lot of content (basically only a complicated way to visualize np.mean(q)). The timeseries plot is removed because with arbitrary functions allowed as methods it might now represent accurately the data that is plotted.

Changes made in this Pull Request:

  • removed all untested code from Contacts
  • Contacts now accepts arbitrary functions as method kwarg
  • old contact analysis classes are moved to be deprecated soon

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • [ ] Issue raised/referenced?
  • deprecate old contact classes
  • [ ] pbc keyword

@@ -0,0 +1,276 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we moved all the examples out into the cookbook, so this should probably go there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I think this way it is currently easier to have the use-case notebook with the changes in the contacts class. This is thought of a convenience for me and testers. I will remove this before I merge the PR and collect the examples in the module docstring.

@kain88-de
Copy link
Member Author

I had a look for the suggested speed improvements in _single_frame and to add pbc for the distance calculations. While both are possible I don't think I have time to implement both right now, this is because the minimal image functions for the different pbc boxes aren't accessible in python-code right now.

I would use the slow distance_array for now and add a pbc keyword. The speed improvements could be done later.

@richardjgowers
Copy link
Member

@kain88-de no worries. Which function needs access to minimum image?

@kain88-de
Copy link
Member Author

minimum_image_triclinic and minimum_image, plus a nice python wrapper that selects the appropriate one of the two depending on the box.

@richardjgowers
Copy link
Member

I can't see where you need to use those? Aren't you just using
distance_array for all the distances in this module?

On Fri, 25 Mar 2016 06:41 kain88-de, [email protected] wrote:

minimum_image_triclinic

static void minimum_image_triclinic(double *dx, coordinate* box, float* box_half)

and minimum_image
static void minimum_image(double *x, float *box, float *inverse_box)
,
plus a nice python wrapper that selects the appropriate one of the two
depending on the box.


You are receiving this because you were assigned.

Reply to this email directly or view it on GitHub
#792 (comment)

@kain88-de
Copy link
Member Author

Not when I want to implement your speed improvement suggestions

def  _single_frame(self):
    r = np.linalg.norm(self.grpA.positions[self.idx_grpA] - self.grpB.positions[self.idx_grpB], axis=1)
    r0 = self.r0[self.initial_contacs]
    ...

Where self.idx_grp* is calculated in __init__ based on the initial contacts. For the test case example this reduces the distance calculations for each from from 40*70 to 60, an order of magnitude less then pure distance_array!

But to have this also with periodic boundary conditions I need a function to calculate the minimal image convention distance before I calculate the norm.

@richardjgowers
Copy link
Member

So you want to zip over the distances? Calc_bonds does this

On Fri, 25 Mar 2016 08:15 kain88-de, [email protected] wrote:

Not when I want to implement your speed improvement suggestions

def _single_frame(self):
r = np.linalg.norm(self.grpA.positions[self.idx_grpA] - self.grpB.positions[self.idx_grpB], axis=1)
r0 = self.r0[self.initial_contacs]
...

Where self.idx_grp* is calculated in init based on the initial
contacts. For the test case example this reduces the distance calculations
for each from from 40*70 to 60, an order of magnitude less then pure
distance_array!

But to have this also with periodic boundary conditions I need a function
to calculate the minimal image convention distance before I calculate the
norm.


You are receiving this because you were assigned.

Reply to this email directly or view it on GitHub
#792 (comment)

@richardjgowers
Copy link
Member

Anyway is this ready to merge?

On Fri, 25 Mar 2016 08:29 Richard Gowers, [email protected] wrote:

So you want to zip over the distances? Calc_bonds does this

On Fri, 25 Mar 2016 08:15 kain88-de, [email protected] wrote:

Not when I want to implement your speed improvement suggestions

def _single_frame(self):
r = np.linalg.norm(self.grpA.positions[self.idx_grpA] - self.grpB.positions[self.idx_grpB], axis=1)
r0 = self.r0[self.initial_contacs]
...

Where self.idx_grp* is calculated in init based on the initial
contacts. For the test case example this reduces the distance calculations
for each from from 40*70 to 60, an order of magnitude less then pure
distance_array!

But to have this also with periodic boundary conditions I need a function
to calculate the minimal image convention distance before I calculate the
norm.


You are receiving this because you were assigned.

Reply to this email directly or view it on GitHub
#792 (comment)

@kain88-de
Copy link
Member Author

Anyway is this ready to merge?

No docs are missing. There is still a bit of work left. I have to do some other stuff for my PhD right now. I hope I have some time next week to get some things done. The big 4 that still need doing are [docs, tests, pbc, deprecations for old classes].

While we are at it. What is a good deprecation release?

@jandom Does this still work for you now? I removed quite a lot of things.

@orbeckst You were against removing the old contact classes because of some missing functionality. Could you tell me what that was and how it was used, the old code is not that easy to understand.

@orbeckst
Copy link
Member

  1. Deprecated ContactAnalysis1 should go eventually (in 0.15 or later?), @jandom 's Contacts is the new hotness.
  2. ContactAnalysis does a q1-q2 analysis with native contacts relative to an initial and a final state. This functionality is not available in Contacts (unless you find a way to compute native contacts for multiple reference structures). It's fugly, without a question, but it does get used...

@kain88-de
Copy link
Member Author

unless you find a way to compute native contacts for multiple reference structures

This means I have to allow several (ref_groupA, ref_groupB) tuples to be passed to the initialization?

If I understand you correctly the q1-q2 analysis would currently be calculated going over the trajectory twice.

refA = u.select_atoms('...')
refB = u.setlect_atoms('...')

q1 = contacts.Contacts(u, (selA, selB), (refA, refB))

u.trajectory[-1]
q2 = contacts.Contacts(u,  (selA, selB), (refA, refB))

What you would want is something like this?

q = contacts.Contacts(u, (selA, selB), ((refA_initial, refB_initial),
                                        (refA_final, refB_final)))

If that is correct it should be easy to extend. The preparation of the reference AtomGroup is a little bit more involved though then the current one. Essentially you would need to load the Universe twice and go to the last frame once. This would be made easier if we could get freeze_copy of a AtomGroup so that it unlinks from any trajectory/universe object.

@orbeckst
Copy link
Member

Yes to all.

Oliver Beckstein
email: [email protected]

Am Mar 26, 2016 um 8:57 schrieb kain88-de [email protected]:

unless you find a way to compute native contacts for multiple reference structures

This means I have to allow several (ref_groupA, ref_groupB) tuples to be passed to the initialization?

@kain88-de
Copy link
Member Author

Yes to all.

Nice. That shouldn't be complicated.

I'm only waiting for @jandom to answer if I deleted anything he really needed.

@kain88-de kain88-de force-pushed the contact-improvements branch from 1cd2b97 to 60a25e1 Compare March 28, 2016 18:04
@kain88-de
Copy link
Member Author

It is now possible to create several reference groups to compare against (not only two). And I have added a small wrapper to quickly create a q1-q2 analysis.

@orbeckst Have a look at the notebook. The q1q2 analysis is in the last section. Does this satisfy you?

@orbeckst
Copy link
Member

The q1-q2 looks good.

As a last test, could you please also run it as the original example http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/contacts.html#two-dimensional-contact-analysis-q1-q2 (you might need to reverse-engineer the default values for the selections, I can't remember if these were C-alphas or all heavy or all backbone) and see if it reproduces the figure http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3144279/figure/F5/ ?

@kain88-de kain88-de force-pushed the contact-improvements branch from 60a25e1 to b82a105 Compare March 28, 2016 19:53
@kain88-de
Copy link
Member Author

yeah that doesn't work at all -.-. I've updated the notebook and I'm quite sure about the selections since the old code matches with the publication figure.

But When I look at the code the old contact analysis does things a little bit different from me. I'm not sure why that is. I appreciate some help with this.

@orbeckst
Copy link
Member

Did you post the output from your code for the AdK problem? I did not see it (only the salt bridge example).

@kain88-de kain88-de force-pushed the contact-improvements branch from 3fa3172 to 802ba4f Compare April 3, 2016 19:46
@kain88-de
Copy link
Member Author

@orbeckst I should have fixed the q1q2 wrapper function now. The problem was that the hard_cut_q doesn't calculate the same as the old ContactAnalysis. In short the old class counted a native contact it if was formed in the reference and the distance shorter then some radius r. The normal hard_cut_q will only count a native contact if the distance is closer then in the reference. I updated the notebook as weil for you to look at.

This brings me to another thing I would like some help with naming.

  • hard_cut_q count a contact if the distance is closer then the reference distance
  • min_cut count a contact if the distance is smaller then some radius r in the current frame and closer then r in the reference
  • soft_cut_q do a soft-potential cut on the native contacts formed in the reference, being 0.5 at the exact refreence distance

I think min_cut is wrong but I can't think of anything better.

@orbeckst
Copy link
Member

orbeckst commented Apr 5, 2016

The updated q1-q2 looks good now.
What you named min_cut is what I had found as fraction of native contacts in

J. Franklin, P. Koehl, S. Doniach, and M. Delarue. MinActionPath: maximum likelihood trajectory for large- scale structural transitions in a coarse-grained locally harmonic energy landscape. Nucleic Acids Res., 35 (Web Server issue):W477–W482, 2007. doi: 10.1093/nar/gkm342.

Maybe just call it FKDD_q or Franklin_Koehl_Doniach_Delarue_q?

soft_cut_q is still Best-Hummer, right? Maybe keep the explicit name because there are, in principle, many ways to implement a soft cutoff. (mdtraj called it best_hummer_q).

@kain88-de
Copy link
Member Author

I don't like to give this stuff the author names. It requires people to know the literature very well. I can imagine some new students that don't use this because they don't read the docs and don't know the names or papers, so they don't have any idea of what the function does and so rather go with what they know from the name). I rather have some more intuitive name that gives a sense of what is being done.

@orbeckst
Copy link
Member

orbeckst commented Apr 5, 2016

On 4 Apr, 2016, at 23:34, kain88-de wrote:

I don't like to give this stuff the author names. It requires people to know the literature very well. I can imagine some new students that don't use this because they don't read the docs and don't know the names or papers, so they don't have any idea of what the function does and so rather go with what they know from the name). I rather have some more intuitive name that gives a sense of what is being done.

Fair enough.

@kain88-de kain88-de force-pushed the contact-improvements branch 2 times, most recently from 13d7faa to 04f76c3 Compare April 7, 2016 21:47
@kain88-de
Copy link
Member Author

I've gotten a little bit further. Here is an overview of what is still to do. I'm happy for help as this really has to go into 0.15.0 because the current implementation doesn't work correct, all of which is fixed in this version.

Questions

  1. What do we need to cite? I have the Best & Hummer paper and the Franklin paper so far. Am I missing someone?
  2. Should we only calculate the fraction of contacts or also the absolute number of contacts?

Still TODO

  • update module documentation (notebook should serve as example reference)
  • write new tests for updated class
    anything else?

Saved for Later

  • Periodic Boundary Conditions (lets do that in a separate issue)
  • performance improvements

@orbeckst
Copy link
Member

orbeckst commented Apr 8, 2016

Re questions:

  1. That's the only two references that I remember looking at.
  2. Typically, fractions are reported, aren't they? I think if we need absolute numbers we'll raise an issue...

Main thing is docs and fixing tests (are you getting rid of ContactAnalysis1 because then the test failures will go away...). I would drop ContactAnalysis1 cold turkey and only leave a stub in place that says that this functionality is now provided by .... and link to the docs.

@kain88-de kain88-de force-pushed the contact-improvements branch from 04f76c3 to 2612ae4 Compare April 10, 2016 21:16
@kain88-de
Copy link
Member Author

So docs are done. I have included examples for the normal usage , the q1q2 analysis and the new option to pass custom method functions. Comments are welcome.

time. This is especially useful in order to look at native contacts during
an equilibrium simulation where one can also look at the average matrix of
native contacts (see :meth:`ContactAnalysis1.plot_qavg`).
2. *Soft Cut*: Tha atom pair *i* and *j* is assigned based on a soft potential
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The

@richardjgowers
Copy link
Member

Looks good to me apart from a few doc comments, just remember to make a new issue for PBC for this as it's a nice starter issue for someone.

This is a list of all the changes done in this commit

- PEP8 changes
- allow arbitrary method to be passed to Contacts
- contact_matrix is a separate function
- simplifies Contacts class, cut everything non essential
- update docs
- removed plot functions
- analysis saving is done with extra function call
- deprecate ContactAnalysis and ContactAnalysis1
- Use python3 division in module
- add explicit q1q2 analysis
This updates all the old test with some refactoring. Each function gets
its own test now as well. According to coverage all code that isn't
deprecated is covered to 100%
@kain88-de kain88-de force-pushed the contact-improvements branch from 2612ae4 to 0ebbbe6 Compare April 16, 2016 18:33
@kain88-de
Copy link
Member Author

OK on my side this is done now. The docs are updated for Contacts and there are more tests, the examples have been added as tests as well. For the changelog I have updated the message from @jandom.

@jandom any comments on the changes I made?

@kain88-de kain88-de mentioned this pull request Apr 18, 2016
@orbeckst
Copy link
Member

This seems good to go. One last ping @jandom but if there are not other comments over the next day we can merge.

@kain88-de did you raise an issue for the PBC specific stuff and referenced this PR (just to have a trail of breadcrumbs)?

@kain88-de
Copy link
Member Author

Not yet I will do once this is merged.

@orbeckst orbeckst assigned orbeckst and unassigned richardjgowers Apr 22, 2016
@orbeckst orbeckst merged commit 0ebbbe6 into develop Apr 22, 2016
@orbeckst orbeckst deleted the contact-improvements branch April 22, 2016 06:23
@orbeckst orbeckst changed the title WIP: Contact improvements Contact improvements Apr 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants