-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pbc conditions to in mapping coordinates to grid. #36
Comments
Try and avoid implementing this yourself, check out lib.distances.apply_PBC |
I was considering to apply PBC is in
I think this is a better approach than applying it in What do you think @IAlibay, @lilyminium, @orbeckst, @fiona-naughton? |
I might be misunderstanding what you're trying to do here, but isn't |
Following up on #48 (comment) : Before I dive into the code in PR #48 I'd like you to describe your algorithm for solving the PBC problem in words and equations and perhaps pseudo code. State your assumptions about your data (e.g., is the input trajectory raw, centered, rotated, ...; are molecules whole or broken across PBCs, are atoms inside the primary unit cell, ...). This might not be a trivial problem so it's worthwhile to step back and be clear about what needs to be done. It helps you and your reviewers. The description in words makes your intent clear and reveals potential logic gaps. Adding equations forces you to be precise about what you really need to do. |
Thanks for the comments @orbeckst.
And in 1.2, broadly speaking, we can have two possible scenarios, which depend on how the trajectory was performed. In both cases, 1.2.1 and 1.2.2, although it depends on the choice of
Now, depending on the type of system, and the purpose of the calculation. If the interest is to asses what is the curvature induced by the protein, the most difficult scenario to deal with is 1.2.2, because it would require additional preprocessing. I didn't mean #48 was the complete solution, it was intended to cover the type of PBC conditions for systems of the type 1.1, and 2.2.1. I have been thinking about the best way to address this limitation. For example,
and then do the respective calculation accordingly. Then the treatment wouldn't have wrapped coordinates. |
From the above it wasn't clear to me if you want to have
Either way relies on the PBC code, which cannot deal with rotation-fitted systems. (The problem is not the triclinic shape of the cell — our PBC routines can handle it (it was A LOT of work for @richardjgowers and others to get that right) but rather that the unitcell representation is not rotated with the rotational fit.) It wasn't clear to me that PR #48 would only deal with a subset of scenarios, namely 1.1 and 1.2.1 (you might want to make an unwrap step optional because it is expensive and if you already have a properly processed trajectory; I think wrap is fairly cheap). A fully processed 1.2.2 (membrane+protein unwrapped (or wrapped?) and rotational fitted is also fine) --- you just use the trajectory as is. You'll get regions that are not covered and your membrane around the region is best cut out as a circle (as you showed us in your presentation) but that's ok. This is how we typically deal with volumetric densities. I am throwing the following out for discussion (and I might have missed something so please correct me/explain): I know that we stressed the need for dealing with PBC. But looking at your options here perhaps all you can really do is ensure that your input trajectory is already correctly processed: Either using external tools like |
I know this was not in the original scope, but Irfan and I were chatting about it. What about _not_ preprocessing the original trajectory for rotation and translation but instead defining an atom group (typically the protein) to be the center? Then for each frame, you could obtain the rotation and translation matrix needed to fit the surface to the reference (typically the first frame). Because you could define the protein as the center and take the PBC-tiledmembrane around it, this would eliminate the need for nans.
I don’t think that the molecules _must_ be made whole, as firstly you are averaging the z coordinates, and secondly the typical atom selection I’ve seen in your examples and tests has been one bead anyway.
Cheers,
Lily
… On Jul 17, 2021, at 3:14 PM, Oliver Beckstein ***@***.***> wrote:
From the above it wasn't clear to me if you want to have
all atoms inside the primary unitcell (ag.wrap()), OR
all lipid molecules (fragments) whole (ag.unwrap(compound="fragment") if I remember correctly) ?
Either way relies on the PBC code, which cannot deal with rotation-fitted systems. (The problem is not the triclinic shape of the cell — our PBC routines can handle it (it was A LOT of work for @richardjgowers and others to get that right) but rather that the unitcell representation is not rotated with the rotational fit.)
It wasn't clear to me that PR #48 would only deal with a subset of scenarios, namely 1.1 and 1.2.1 (you might want to make an unwrap step optional because it is expensive and if you already have a properly processed trajectory; I think wrap is fairly cheap). A fully processed 1.2.2 (membrane+protein unwrapped (or wrapped?) and rotational fitted is also fine) --- you just use the trajectory as is. You'll get regions that are not covered and your membrane around the region is best cut out as a circle (as you showed us in your presentation) but that's ok. This is how we typically deal with volumetric densities.
I am throwing the following out for discussion (and I might have missed something so please correct me/explain): I know that we stressed the need for dealing with PBC. But looking at your options here perhaps all you can really do is ensure that your input trajectory is already correctly processed: Either using external tools like gmx trjconv -pbc whole -ur compact -c; gmx trjconv -fit rot+transxy or using MDAnalysis on-the-fly transformations (although on-the-fly transformations cannot (yet) make -ur compact boxes). You could leave it to the user to provide properly preprocessed trajectories (at least in the initial version) and then later add code for doing the preprocessing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
A quick answer to @orbeckst, what I was thinking was to have all the atoms in the primary unit cell, so using Recap from the previous conversation: For the three possible outcomes, I will start with the two "trivial" ones. 1.1 and 1.2.1. When passing a Universe with a trajectory that has only lipids or lipids with a fixed protein, what we need to guarantee is PBC. Meaning, wrapping the coordinates in order to get those "extra" atoms in the same primary unit cell, as it is shown in the figure: In this way I guarantee two things:
In summary, what we need is to wrap coordinate to run MembraneCurvature in the 1.1 and 1.2.1 cases.
In the scenario in which I am a very distracted user, and I preprocess the trajectory with PBC instead of the raw trajectory, applying coordinate wrapping won't hurt. Unless I am missing a point here, applying Now, let's consider the general landscape and skip the details: what we need is either apply PBC / So we can initialize knowing what we can use as Universe. I haven't come to a key name for that attribute, but let's say something like
So far so good? @orbeckst @richardjgowers @IAlibay @lilyminium @fiona-naughton I will post a different comment for the treatment of case 1.2.1. |
Your treatment of the “trivial” cases makes sense to me.
Applying wrapping will only be a problem if it’s a rotationally-fit trajectory so as you say, the problem is how to know about that case. For a start it’s perfectly acceptable to put the onus onto the user to provide correct input and correctly declare it.
… Am 7/18/21 um 23:35 schrieb Estefania Barreto-Ojeda ***@***.***>:
A quick answer to @orbeckst, what I was thinking was to have all the atoms in the primary unit cell, so using ag.wrap().
Now, let me give more details about the PBC I was considering. I'll go by parts.
Recap from the previous conversation:
We can have three possible scenarios when calculating curvature:
1.1 Membrane only.
1.2 Membrane-protein, which can be
1.2.1 Protein with positions restraints. The protein does not move, does not rotate.
1.2.2 Protein with no position restraints. The protein translates, rotates, etc.
For the three possible outcomes, I will start with the two "trivial" ones. 1.1 and 1.2.1. When passing a Universe with a trajectory that has only lipids or lipids with a fixed protein, what we need to guarantee is PBC. Meaning, wrapping the coordinates in order to get those "extra" atoms in the same primary unit cell, as it is shown in the figure:
In this way I guarantee two things:
I have more points to provide as reference to derive the surface. This will strongly improve sampling.
Empty places in the raw trajectory will be populated, and therefore we will avoid np.nans. This means our gradients won't have lots of nans.
Additionally, by adding this PBC, we can address one of the points that were highlighted earlier in the project about how do we do when the membrane is too close to the edge of the simulation box, that we would eventually have the atoms of references split in lower/upper side of the simulation box. (Raised by @lilyminium, suggesting to have negative coordinates in each dimension to run tests in the now called get_z_surface #22 )
In summary, what we need is to wrap coordinate to run MembraneCurvature in the 1.1 and 1.2.1 cases.
Then, how to apply PBC?
We use ag.wrap() in base.py. When we initialize, to check if PBC was called "True", and then in _single_frame. Which was as the code submitted in PR #48.
In the scenario in which I am a very distracted user, and I preprocess the trajectory with PBC instead of the raw trajectory, applying coordinate wrapping won't hurt. Unless I am missing a point here, applying ag.wrap() won't hurt.
Now, let's consider the general landscape and skip the details: what we need is either apply PBC / ag.wrap, OR further processing/calculations.
However, and in the way I think and conceptualize the problem is: How do we know when to use either way? How do we know that we either apply PBC and that suffices to run MembraneCurvature, or either do we do something different?
That first limitation, and again, as I see the problems, goes like this: "hey user, let me know what type of system I am going to help calculating curvature".
So we can initialize knowing what we can use as Universe. I haven't come to a key name for that attribute, but let's say something like protein_fit=False. Then we know that under the condition of having a system like 1.1 or 1.2.1, we don't perform more than coordinate wrapping to run MembraneCurvature().
__init__(self, universe, select='all',
n_x_bins=100, n_y_bins=100,
x_range=None,
y_range=None,
protein_fit=False # alternatively, center_protein=False, or similar.
pbc=True, **kwargs):
So far so good? @orbeckst @richardjgowers @IAlibay @lilyminium @fiona-naughton
I would like to know we are on the same page, at least for these two "trivial" cases.
I will post a different comment for the treatment of case 1.2.1.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Now, in more detail, let's see the case of systems 1.2.2
If we pass the trajectory by applying wrapping coordinates only, then the question stated above can't be answered because the protein diffuses and therefore the output obtained from MembraneCurvature would be meaningless. Hence, we need to treat this system in a particular fashion. I will briefly describe each one, and highlight pros and cons: - Solution No. 1
I thought of additional improvements to this approach in order to avoid clipping and, instead, adding periodic images around the original box, but realized things can get very complicated because it would be box-dependent etc. So I won't elaborate on this unless I am asked to. - Solution No. 2
So it would look something like this
Cons: Edit: This is my view of the problem. The first option supports my laziness and makes the user accountable for what is giving as input. The second demands a little bit more work and would fully rely on MDAnalysis. Any objections here to any of the solutions suggested? @orbeckst @lilyminium @IAlibay @fiona-naughton @richardjgowers |
Thanks for the detailed explanation — I'd say you have your next blog post already written, including nice illustrations. I don't see your two solutions as exclusive. I'd initially work with solution 1 (make the user input the preprocessed trajectory) and work on the core parts of your problem. Once that is done, go back and make life easier for users (add solution 2). |
More detailed comments
|
Just some feedback on your solutions.
If the trajectory is not unwrapped (as is assumed in 1.1 and 1.2.1), I don't think this will do anything. I think you need to consider the edge rows to be continuous with each other, so you don't end up with
Simply wrapping coordinates on the z-axis won't fix this in the case that the membrane is split between the bottom and the top of the box. In fact, unwrapping would be a better way to deal with it. Can we unwrap only along certain axes @richardjgowers @orbeckst?
I don't really think using
I really like this perk of the second solution -- users do often just treat analysis packages as magic boxes. However, I would think that clipping is also necessary for the second solution, as you're just doing the same thing as solution 1 but within MDA.
In #36 (comment) I somewhat briefly proposed something similar that doesn't need the new implementation of the box -- get the rot+trans matrices of every frame, tile it around the defined center (probably "protein") and shift the surface generated in |
PS I love your pictures, and flow charts, very clear! |
Thank you for reminding me about the issues with the gradient, @lilyminium . @ojeda-e do the "trivial" cases include NPT simulations where the box size fluctuates or are they only NVT? I'd say that NVT is the only case where the "trivial" case is actually trivial and wrapping atoms into the primary unitcell will be sufficient. Then you get a surface that is well defined on all grid points (except protein) and you can easily calculate a gradient under PBC by wrapping the finite difference around the edges. In NPT it's already a question how the fluctuating box is handled at the edges (even if you don't have to keep track of rotations) so you'll already get NaNs. If you were to compute gradients for each frame then that would work, except that these gradients would be awfully noisy. If you compute them from the average surface then the surface is not guaranteed to be PBC-continuous (I think) and hence the gradients are not well defined at the edges. It seems that there are whole lot of different considerations to be made. What is the smallest "unit of problem" here? Can you identify it, work on it, and simply note any other points as a current limitation? Then work on reducing the known limitations. |
I think @lilyminium 's #36 (comment) idea to manually do the rot+trans fit to obtain the rot/trans matrix M and then apply it to the surface could work, at least for NVT where the histogramming grid stays the same. You also might need an interpolating function when you rotate the discrete grid or figure out an algorithm to rebin correctly (something like the algorithms used for rotating pixel images). In general (NPT), I think, you might have to rotate the system and replicate the coordinates of the lipids and then histogram the PBC-extended coordinates on the fixed grid. That would work in all cases. |
Sorry, yes, that sentence was very vague and "it" was doing a lot of work. A naive step breakdown of what I was thinking could look something like:
3b. optionally perform singular value decomposition to get the plane of best fit for each frame, so any protein tilt doesn't make the whole membrane slant across the z-axis (edit: |
Periodic Boundary Conditions (PBC) are not applied when mapping coordinates to grid (
grid_map
).To fix this issue:
For example:
x
-1, 0, 1 andy
-1, 0 ,1x
-2, -1, 0, 1, andy
-2, -1, 0 ,1, 2.The text was updated successfully, but these errors were encountered: