Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pbc conditions to in mapping coordinates to grid. #36

Closed
2 tasks
ojeda-e opened this issue Jun 25, 2021 · 17 comments · Fixed by #48
Closed
2 tasks

Add pbc conditions to in mapping coordinates to grid. #36

ojeda-e opened this issue Jun 25, 2021 · 17 comments · Fixed by #48
Assignees
Labels
refactoring code rewrites without functional changes testing testing framework

Comments

@ojeda-e
Copy link
Member

ojeda-e commented Jun 25, 2021

Periodic Boundary Conditions (PBC) are not applied when mapping coordinates to grid (grid_map).

To fix this issue:

  • Add PBC conditions to map coordinates.
  • Add test using dummy coordinates to map values that exceed by defect or excess the boundaries of the grid.

For example:

  • System with 9 beads with coordinates of x -1, 0, 1 and y -1, 0 ,1
  • System with 16 beads with coordinates of x -2, -1, 0, 1, and y -2, -1, 0 ,1, 2.
@ojeda-e ojeda-e self-assigned this Jun 25, 2021
@ojeda-e ojeda-e added refactoring code rewrites without functional changes testing testing framework labels Jun 25, 2021
@richardjgowers
Copy link
Member

Try and avoid implementing this yourself, check out lib.distances.apply_PBC

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 12, 2021

I was considering to apply PBC is in AnalysisBase, in __init__:

        # Apply PBC conditions
        if self.pbc == True:
            self.ag.wrap()
        else:
            warnings.warn(" `PBC == False` may result in inaccurate calculation "
                          "of membrane curvature. Surfaces will be derived from "
                          "a reduced number of atoms.")

I think this is a better approach than applying it in _single_frame or in the get_z_surface function (but I may be wrong). If you are ok with this approach, I can add it with respective tests to the current PR #41 and this issue may get fixed.

What do you think @IAlibay, @lilyminium, @orbeckst, @fiona-naughton?

@IAlibay
Copy link
Member

IAlibay commented Jul 12, 2021

I was considering to apply PBC is in AnalysisBase, in __init__:

        # Apply PBC conditions
        if self.pbc == True:
            self.ag.wrap()
        else:
            warnings.warn(" `PBC == False` may result in inaccurate calculation "
                          "of membrane curvature. Surfaces will be derived from "
                          "a reduced number of atoms.")

I think this is a better approach than applying it in _single_frame or in the get_z_surface function (but I may be wrong). If you are ok with this approach, I can add it with respective tests to the current PR #41 and this issue may get fixed.

What do you think @IAlibay, @lilyminium, @orbeckst, @fiona-naughton?

I might be misunderstanding what you're trying to do here, but isn't ag.wrap() only applied to that timestep as it is being accessed? I.e. when you traverse through the trajectory the wrap() stops being applied right?

This was referenced Jul 13, 2021
@orbeckst
Copy link
Member

Following up on #48 (comment) : Before I dive into the code in PR #48 I'd like you to describe your algorithm for solving the PBC problem in words and equations and perhaps pseudo code. State your assumptions about your data (e.g., is the input trajectory raw, centered, rotated, ...; are molecules whole or broken across PBCs, are atoms inside the primary unit cell, ...).

This might not be a trivial problem so it's worthwhile to step back and be clear about what needs to be done. It helps you and your reviewers.

The description in words makes your intent clear and reveals potential logic gaps. Adding equations forces you to be precise about what you really need to do.

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 17, 2021

Thanks for the comments @orbeckst.
Before I go into the pseudo code, and to clarify PR #48, I realized too late that I didn't mention two fundamental points.

  1. Broadly speaking, there are two main types of systems to consider in MembraneCurvature.
    1.1 Membrane only.
    1.2 Membrane-protein.

And in 1.2, broadly speaking, we can have two possible scenarios, which depend on how the trajectory was performed.
1.2.1 Protein with positions restraints. The protein does not move, does not rotate.
1.2.2 Protein with no position restraints. The protein translates, rotates, etc.

In both cases, 1.2.1 and 1.2.2, although it depends on the choice of n_bins, undefined values will appear, as described in #45

  1. Since MembraneCurvature derives surfaces from AtomGroup, the atoms there included shouldn't be broken. We can fix it with a preprocessing. For example, in gromacs one would process the trajectory with trjconv -pbc whole.

Now, depending on the type of system, and the purpose of the calculation. If the interest is to asses what is the curvature induced by the protein, the most difficult scenario to deal with is 1.2.2, because it would require additional preprocessing.
We need to rotate and center the protein. For example, in gromacs, the trajectory would be processed with trjconv -pbc whole + trjconv fit -rot+trans.
As a consequence, we will have a system that rotates or "accomodates" around the membrane. I may have not enough expertise, but in 100% of the systems I have worked on in the 1.2.2 category, the resulting system is a rhomboid. And I may be mistaking here, but then in this situation, wrapping atoms wouldn't make sense in the calculation because it must be taken into account in the preprocessing part.

I didn't mean #48 was the complete solution, it was intended to cover the type of PBC conditions for systems of the type 1.1, and 2.2.1.

I have been thinking about the best way to address this limitation.
One option is, maybe by adding to __init__ a boolean protein_fit attribute that can takes the input of preprocessed trajectories of 2.2.1 and treats the surface accordingly.

For example,

def __init__(self, universe, select='all',
                 n_x_bins=100, n_y_bins=100,
                 x_range=None,
                 y_range=None,
                 protein_fit=False 
                 pbc=True, **kwargs):

and then do the respective calculation accordingly. Then the treatment wouldn't have wrapped coordinates.
I will post the pseudo code in a different comment

@orbeckst
Copy link
Member

From the above it wasn't clear to me if you want to have

  1. all atoms inside the primary unitcell (ag.wrap()), OR
  2. all lipid molecules (fragments) whole (ag.unwrap(compound="fragment") if I remember correctly) ?

Either way relies on the PBC code, which cannot deal with rotation-fitted systems. (The problem is not the triclinic shape of the cell — our PBC routines can handle it (it was A LOT of work for @richardjgowers and others to get that right) but rather that the unitcell representation is not rotated with the rotational fit.)

It wasn't clear to me that PR #48 would only deal with a subset of scenarios, namely 1.1 and 1.2.1 (you might want to make an unwrap step optional because it is expensive and if you already have a properly processed trajectory; I think wrap is fairly cheap). A fully processed 1.2.2 (membrane+protein unwrapped (or wrapped?) and rotational fitted is also fine) --- you just use the trajectory as is. You'll get regions that are not covered and your membrane around the region is best cut out as a circle (as you showed us in your presentation) but that's ok. This is how we typically deal with volumetric densities.

I am throwing the following out for discussion (and I might have missed something so please correct me/explain): I know that we stressed the need for dealing with PBC. But looking at your options here perhaps all you can really do is ensure that your input trajectory is already correctly processed: Either using external tools like gmx trjconv -pbc whole -ur compact -c; gmx trjconv -fit rot+transxy or using MDAnalysis on-the-fly transformations (although on-the-fly transformations cannot (yet) make -ur compact boxes). You could leave it to the user to provide properly preprocessed trajectories (at least in the initial version) and then later add code for doing the preprocessing.

@lilyminium
Copy link
Member

lilyminium commented Jul 18, 2021 via email

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 19, 2021

A quick answer to @orbeckst, what I was thinking was to have all the atoms in the primary unit cell, so using ag.wrap().
Now, let me give more details about the PBC I was considering. I'll go by parts.

Recap from the previous conversation:
We can have three possible scenarios when calculating curvature:
1.1 Membrane only.
1.2 Membrane-protein, which can be
1.2.1 Protein with positions restraints. The protein does not move, does not rotate.
1.2.2 Protein with no position restraints. The protein translates, rotates, etc.

image

For the three possible outcomes, I will start with the two "trivial" ones. 1.1 and 1.2.1. When passing a Universe with a trajectory that has only lipids or lipids with a fixed protein, what we need to guarantee is PBC. Meaning, wrapping the coordinates in order to get those "extra" atoms in the same primary unit cell, as it is shown in the figure:

image

In this way I guarantee two things:

  • I have more points to provide as reference to derive the surface. This will strongly improve sampling.
  • Empty places in the raw trajectory will be populated, and therefore we will avoid np.nans. This means our gradients won't have lots of nans.
  • Additionally, by adding this PBC, we can address one of the points that were highlighted earlier in the project about how do we do when the membrane is too close to the edge of the simulation box, that we would eventually have the atoms of references split in lower/upper side of the simulation box. (Raised by @lilyminium, suggesting to have negative coordinates in each dimension to run tests in the now called get_z_surface Add test for the generation of z_ref (core_fast_leaflet) on toy system #22 )

In summary, what we need is to wrap coordinate to run MembraneCurvature in the 1.1 and 1.2.1 cases.

  • Then, how to apply PBC?
    We use ag.wrap() in base.py. When we initialize, to check if PBC was called "True", and then in _single_frame. Which was as the code submitted in PR Add coordinate wrapping  #48.

In the scenario in which I am a very distracted user, and I preprocess the trajectory with PBC instead of the raw trajectory, applying coordinate wrapping won't hurt. Unless I am missing a point here, applying ag.wrap() won't hurt.

Now, let's consider the general landscape and skip the details: what we need is either apply PBC / ag.wrap, OR further processing/calculations.
However, and in the way I think and conceptualize the problem is: How do we know when to use either way? How do we know that we either apply PBC and that suffices to run MembraneCurvature, or either do we do something different?
That first limitation, and again, as I see the problems, goes like this: "hey user, let me know what type of system I am going to help calculating curvature".

So we can initialize knowing what we can use as Universe. I haven't come to a key name for that attribute, but let's say something like protein_fit=False. Then we know that under the condition of having a system like 1.1 or 1.2.1, we don't perform more than coordinate wrapping to run MembraneCurvature().

__init__(self, universe, select='all',
                 n_x_bins=100, n_y_bins=100,
                 x_range=None,
                 y_range=None,
                 protein_fit=False  # alternatively, center_protein=False, or similar.
                 pbc=True, **kwargs):

So far so good? @orbeckst @richardjgowers @IAlibay @lilyminium @fiona-naughton
I would like to know we are on the same page, at least for these two "trivial" cases.

I will post a different comment for the treatment of case 1.2.1.

@orbeckst
Copy link
Member

orbeckst commented Jul 19, 2021 via email

@ojeda-e
Copy link
Member Author

ojeda-e commented Jul 19, 2021

Now, in more detail, let's see the case of systems 1.2.2
Two main points to consider:

  • In this type of system, the scientific question behind this type of analysis is "What is the curvature induced by the protein?"
  • While the protein is diffusing, the principal axes of the protein change in every frame, as the sequence below shows.

image

If we pass the trajectory by applying wrapping coordinates only, then the question stated above can't be answered because the protein diffuses and therefore the output obtained from MembraneCurvature would be meaningless. Hence, we need to treat this system in a particular fashion.
Again, in the way I see the problem, there are two options.
image

I will briefly describe each one, and highlight pros and cons:

- Solution No. 1

  • Trajectory needs preprocessing. Example, using gromacs,
    gmx trjconv -pbc whole -ur compact -c, to center the protein
    gmx trjconv -fit rot+transxy, to move the rest of the system to get a trajectory with the protein as a reference.
    Important to note, since PBC are applied in Step 1, pbc=False should be passed in __init__.
    -The resulting trajectory would be passed to calculate MembraneCurvature.
  • A final clipping would be also needed (step 3 in the figure below).
    To clarify, clipping is needed only in the very last step for plotting. I just have no handy figure to illustrate.
    image

I thought of additional improvements to this approach in order to avoid clipping and, instead, adding periodic images around the original box, but realized things can get very complicated because it would be box-dependent etc. So I won't elaborate on this unless I am asked to.

- Solution No. 2

  • Raw trajectory is passed to init.
  • We have to pass PBC true in order to get all the atoms in the same primary unit cell.
  • Apply on-the-fly transformations to rotate and translate the atoms of reference with respect to the protein.
  • We can use the principal axes of the protein (see the first figure) to identify how to rotate the system.
  • Finally, we rotate and translate the lipids in every frame of the trajectory, with respect to the protein.

So it would look something like this

def _on_the_fly(self):
        self.protein_reference =  #  we would need a reference provided by the user too
        self.ag.wrap() # wrap coordinates  
        self.center_in_box(self.protein_reference) # center protein
        self.translate(self.ag)  # translate 
        self.rotate.rotateby(angle_from_protein_reference, direction = principal_axes_of_protein_reference, ag=self.ag)
  • Pros:
    Solution 1: I would think that the user knows what they are doing when preprocessing on gromacs. It would force the user to understand the type of trajectory that is passing (or I would expect so).
    Solution 2: Completely independent of third parties for preprocessing. Full MDA dependent.

Cons:
Solution 1: Clipping when plotting is mandatory. Further improvement is not straightforward (I didn't elaborate on this).
Solution 2: I suspect the beginner user will be tempted to pass the raw trajectory and just kind of "boom! let's see what does this gives me as an output" instead of caring about the calculation behind it. (But again, that my perception and also questionable.).

Edit: This is my view of the problem. The first option supports my laziness and makes the user accountable for what is giving as input. The second demands a little bit more work and would fully rely on MDAnalysis.

Any objections here to any of the solutions suggested? @orbeckst @lilyminium @IAlibay @fiona-naughton @richardjgowers
Probably I am missing something, mostly due to the fact I've never worked with on_the_fly transformations on MDAnalysis, so suggestions are welcome.

@orbeckst
Copy link
Member

Thanks for the detailed explanation — I'd say you have your next blog post already written, including nice illustrations.

I don't see your two solutions as exclusive. I'd initially work with solution 1 (make the user input the preprocessed trajectory) and work on the core parts of your problem. Once that is done, go back and make life easier for users (add solution 2).

@orbeckst
Copy link
Member

orbeckst commented Jul 19, 2021

More detailed comments

  • We can discuss your ideas to avoid clipping on the next call. My hunch is that this is ultimately related to the problem that we have been mentioning about tracking rotations to the unit cell.
  • I don't understand why you'd need the principal axes for anything? Why would you not use the rotation/translation fit? Then the protein will be rotated to the frame of reference provided by the reference structure, which is by definition fixed. (Also, in my experience at least ( in particular in some of @fiona-naughton 's work), the principal components are pretty floppy, especially the ones in the X-Y plane, and can easily change so they are not particularly robust for orienting molecules.)
  • I'd also try OTF-transformations. In principle, they are a very flexible, powerful and unique way to work with MDAnalysis. The User Guide: On the fly transformations with the link to the blog post is a good starting point to learn more.
  • (EDIT) Even with the normal rotation and PBC wrapping by the OTF transformation, you will still want to clip the circular region in the figure: that's not any different from pre-processing.
  • (EDIT) The only way that I see for avoiding the clipping is keeping track of the rot/fit transformation and then repacking the system into a new unitcell in the fixed frame of the reference structure. That was, more or less, my suggestion for another GSOC project...

@lilyminium
Copy link
Member

lilyminium commented Jul 19, 2021

Just some feedback on your solutions.

Meaning, wrapping the coordinates in order to get those "extra" atoms in the same primary unit cell, as it is shown in the figure:

If the trajectory is not unwrapped (as is assumed in 1.1 and 1.2.1), I don't think this will do anything. I think you need to consider the edge rows to be continuous with each other, so you don't end up with np.nans in the curvature arrays as is currently implemented. That's what I would consider treating PBC.

Additionally, by adding this PBC, we can address one of the points that were highlighted earlier in the project about how do we do when the membrane is too close to the edge of the simulation box, that we would eventually have the atoms of references split in lower/upper side of the simulation box.

Simply wrapping coordinates on the z-axis won't fix this in the case that the membrane is split between the bottom and the top of the box. In fact, unwrapping would be a better way to deal with it. Can we unwrap only along certain axes @richardjgowers @orbeckst?

  • Then, how to apply PBC?
    We use ag.wrap() in base.py.

I don't really think using wrap() will treat PBC. I'm interpreting the PBC problem on the x-y plane to be that you have np.nans on your edges because you only consider the minimum image, when the system is actually continuous. On the z-axis, that the membrane might split over the edge and end up with really weird curvature values. Are we talking about the same thing?

Completely independent of third parties for preprocessing. Full MDA dependent.

I really like this perk of the second solution -- users do often just treat analysis packages as magic boxes. However, I would think that clipping is also necessary for the second solution, as you're just doing the same thing as solution 1 but within MDA.

The only way that I see for avoiding the clipping is keeping track of the rot/fit transformation and then repacking the system into a new unitcell in the fixed frame of the reference structure. That was, more or less, my suggestion for another GSOC project...

In #36 (comment) I somewhat briefly proposed something similar that doesn't need the new implementation of the box -- get the rot+trans matrices of every frame, tile it around the defined center (probably "protein") and shift the surface generated in derive_surface.

@lilyminium
Copy link
Member

PS I love your pictures, and flow charts, very clear!

@orbeckst
Copy link
Member

Thank you for reminding me about the issues with the gradient, @lilyminium .

@ojeda-e do the "trivial" cases include NPT simulations where the box size fluctuates or are they only NVT?

I'd say that NVT is the only case where the "trivial" case is actually trivial and wrapping atoms into the primary unitcell will be sufficient. Then you get a surface that is well defined on all grid points (except protein) and you can easily calculate a gradient under PBC by wrapping the finite difference around the edges.

In NPT it's already a question how the fluctuating box is handled at the edges (even if you don't have to keep track of rotations) so you'll already get NaNs. If you were to compute gradients for each frame then that would work, except that these gradients would be awfully noisy. If you compute them from the average surface then the surface is not guaranteed to be PBC-continuous (I think) and hence the gradients are not well defined at the edges.

It seems that there are whole lot of different considerations to be made. What is the smallest "unit of problem" here? Can you identify it, work on it, and simply note any other points as a current limitation? Then work on reducing the known limitations.

@orbeckst
Copy link
Member

I think @lilyminium 's #36 (comment) idea to manually do the rot+trans fit to obtain the rot/trans matrix M and then apply it to the surface could work, at least for NVT where the histogramming grid stays the same. You also might need an interpolating function when you rotate the discrete grid or figure out an algorithm to rebin correctly (something like the algorithms used for rotating pixel images). In general (NPT), I think, you might have to rotate the system and replicate the coordinates of the lipids and then histogram the PBC-extended coordinates on the fixed grid. That would work in all cases.

@lilyminium
Copy link
Member

lilyminium commented Jul 19, 2021

Sorry, yes, that sentence was very vague and "it" was doing a lot of work. A naive step breakdown of what I was thinking could look something like:

  1. Define a grid around the central protein
    • this could be user provided
    • or default to half the box-length of the first frame
    • or default to half the minimum box-length of the entire trajectory
  2. for each frame, calculate the rotation + translation matrices needed to fit around the protein rotationally + translationally (which MDAnalysis already has tools for)
  3. tile the coordinates of the lipid atoms into a 3x3 grid as below

Screen Shot 2021-07-19 at 12 49 03 pm

3b. optionally perform singular value decomposition to get the plane of best fit for each frame, so any protein tilt doesn't make the whole membrane slant across the z-axis (edit: this should come before the tiling, it scales horribly edit: in fact, performing the SVD on the surface of averaged z-coordinates would be better -- this could be the last step. edit: or a much simpler way could be to make sure that the edge borders of the averaged z-surface grid are the same.)
4. apply the transformation matrices to these coordinates
5. "clip" by binning into the original grid in step 1 around the protein

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring code rewrites without functional changes testing testing framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants