occupancy and b-factors not updated when importing multi-frame PBD #3825

AntonJansen96 · 2022-09-12T16:07:01Z

MDAnalysis uses the occupancies/b-factors of the first frame for all subsequent frames when importing a multi-frame PDB.

Code to reproduce the behavior

import MDAnalysis

u   = MDAnalysis.Universe('traj.pdb')
sel = u.select_atoms("resid 178 and segid A and name HD2")

for _ in u.trajectory:
   print(sel.positions)

for _ in u.trajectory:
   print(list(sel.occupancies))

for _ in u.trajectory:
   print(list(sel.tempfactors))

Also,

grep 'HD2 ASPTA 178' traj.pdb                                
ATOM   2808  HD2 ASPTA 178      81.480  35.990  49.120  0.00  0.00           H
ATOM   2808  HD2 ASPTA 178      83.700  34.570  50.290  1.00  1.00           H
ATOM   2808  HD2 ASPTA 178      80.540  40.020  49.060  0.06  0.06           H
ATOM   2808  HD2 ASPTA 178      81.100  37.920  49.720  0.20  0.20           H
ATOM   2808  HD2 ASPTA 178      84.130  38.260  50.810  0.09  0.09           H
ATOM   2808  HD2 ASPTA 178      81.790  39.980  49.520  0.04  0.04           H

And link to traj.pdb.

Expected behavior

[[81.48 35.99 49.12]]
[[83.7  34.57 50.29]]
[[80.54 40.02 49.06]]
[[81.1  37.92 49.72]]
[[84.13 38.26 50.81]]
[[81.79 39.98 49.52]]
[0.00]
[1.00]
[0.06]
[0.20]
[0.09]
[0.04]
[0.00]
[1.00]
[0.06]
[0.20]
[0.09]
[0.04]

Actual behavior

[[81.48 35.99 49.12]]
[[83.7  34.57 50.29]]
[[80.54 40.02 49.06]]
[[81.1  37.92 49.72]]
[[84.13 38.26 50.81]]
[[81.79 39.98 49.52]]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]
[0.0]

Current version of MDAnalysis

Which version are you using? 2.3.0
Which version of Python ? 3.10.6
Which operating system? MacOS

The text was updated successfully, but these errors were encountered:

IAlibay · 2022-09-12T16:56:39Z

This is an interesting one. We treat things like the occupancy and bfactors main attributes as a "topology" things, so they don't don't get updated as you read a new frame.

However, specifically for occupancy, we do actually store that time dependent data under ts.data.

So for example if you did:

import MDAnalysis as mda

u = mda.Universe('traj.pdb')

for ts in u.trajectory:
    print(ts.data['occupancy'])

You should be able to access the time dependent occupancy data.

Adding tempfactors to this isn't a very difficult task, and would only require adding similar code bits to what we do with occupancy in this method:

mdanalysis/package/MDAnalysis/coordinates/PDB.py

Line 377 in e7ee5a4

def _read_frame(self, frame):

This offers a quick fix, but doesn't remove the fact that we have two competing attributes and asking folks to look in ts.data over the topology attribute isn't very intuitive / error prone. Hooking back the ts.data values back into the topology attribute might require a bit more work / extra thought. @orbeckst had one option in #3504 which could maybe work (#3504 (comment)) but probably changes the expected behaviour of some attributes.

AntonJansen96 · 2022-09-12T19:38:37Z

Hi, thanks for getting back to me so quickly. I wasn't aware one is able to access the time-depedent data through ts.data. This is very helpful in my particular case.

BradyAJohnston · 2023-01-02T07:23:27Z

I can second the want for this. Lots of people store frame-specific data in the b_factor column for visualisation purposes, and it would be really useful to have easy access to that value. I've just tested out the use of the occupancy column and works well (BradyAJohnston/MolecularNodes#128 (comment)) but having it available for tempfactors would be great.

IAlibay · 2023-01-02T09:26:09Z

@BradyAJohnston for arbitrary frame specific data we do have the auxiliary system, that probably would be the best place to fetch & store such data if looking to interfacing formally with the MDAnalysis API.

BradyAJohnston · 2023-01-02T11:21:29Z

@IAlibay Thanks but the use case I have I use MDAnalysis under the hood so there is no user interfacing with it. The data goes straight from topology / trajectory to 3D model inside of Blender so the arbitrary data needs to already exist inside of the trajectory (in this case it would be multi-model .pdb files usually).

IAlibay · 2023-01-02T11:24:58Z

Dynamically updating a topology is a behaviour change I doubt we could allow any time soon (it's a pretty big overhaul of how we expect things to work), is there any way users could pass in a separate file with timeseries data that could be read alongside? I.e. like a CSV, XVG, or EDR file or maybe some kind of serialised xarray input?

richardjgowers · 2023-01-02T11:37:20Z

So one way to (ab)use the current tooling would be via a Transformation... these are in some ways a callback that is triggered on each frame read, and they could have vision of an AtomGroup (even of all atoms). They are intended to modify the Timestep (positions) based on AtomGroups (i.e. unwrapping coords) but you could flip this relationship

class BFactorUpdater(TransformationBase):
    def __init__(self, ag):
        self._ag = ag

    def _transform(self, ts):
        self._ag.bfactors = ts.data['occupancy']  # assumes this is populated by the Reader, true for PDB
        return ts

u = mda.Universe()

bfactor_updater = BFactorUpdater(u.atoms)
u.trajectory.add_transformations([bfactor_updater])

that's some scrap code off the top of my head, but it should be possible along those lines

BradyAJohnston · 2023-01-02T13:21:05Z

For my use case at least, the accessing via:

for ts in univ.traj:
    ts.data['tempfactors']

With the same setup as the occupancies would work totally fine. I don't need it to be topology related, just a frame-specific data access.

There is potential for reading .csv or other formats alongside the data and adding it via index, but as it's already common practice for a lot of people to use the (now deprecated) .pdb format for storing arbitrary data in the B-factor column, I'm hoping to support that as best I can. For now at least getting them to store the data inside of the occupancy instead will achieve the same result, but having access to ts.data['tempfactor'] would be ideal.

IAlibay · 2023-01-02T14:49:53Z

 but as it's already common practice for a lot of people to use the (now deprecated) .pdb format for storing arbitrary data in the B-factor column,

Herein lies the issue. It's a non-standard use case for a deprecated format. I myself am guilty of abusing the PDB format this way so I'm happy to open a PR with the relevant fix as a temporary stopgap, but I would very much like to open the discussion about how this should be dealt with going forward.

Having these arbitrary changes to readers are both user unfriendly and end up having a reasonably large maintenance cost in the long run. I'd like to be able to offer something that suits the downstream needs that would be the canonical way we encourage folks to use things.

BradyAJohnston · 2023-01-04T00:02:53Z

I would certainly be very appreciative of any direction you wish to take, both in short-term solutions and longer-term design plans.

orbeckst added Format-PDB Component-Topology labels Sep 12, 2022

BradyAJohnston mentioned this issue Jan 2, 2023

Multi-Model b_factor currently not available BradyAJohnston/MolecularNodes#128

Open

richardjgowers added a commit that referenced this issue Jan 4, 2023

taking a swing at #3825

0b4c337

richardjgowers mentioned this issue Jan 4, 2023

taking a swing at #3825 #3988

Merged

4 tasks

richardjgowers added a commit that referenced this issue Feb 19, 2023

taking a swing at #3825

1355297

IAlibay added this to the 2.5.0 milestone Apr 28, 2023

IAlibay closed this as completed in #3988 May 25, 2023

IAlibay closed this as completed in 5ddbc85 May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

occupancy and b-factors not updated when importing multi-frame PBD #3825

occupancy and b-factors not updated when importing multi-frame PBD #3825

AntonJansen96 commented Sep 12, 2022

IAlibay commented Sep 12, 2022

AntonJansen96 commented Sep 12, 2022

BradyAJohnston commented Jan 2, 2023 •

edited

Loading

IAlibay commented Jan 2, 2023

BradyAJohnston commented Jan 2, 2023

IAlibay commented Jan 2, 2023

richardjgowers commented Jan 2, 2023

BradyAJohnston commented Jan 2, 2023

IAlibay commented Jan 2, 2023 •

edited

Loading

BradyAJohnston commented Jan 4, 2023

occupancy and b-factors not updated when importing multi-frame PBD #3825

occupancy and b-factors not updated when importing multi-frame PBD #3825

Comments

AntonJansen96 commented Sep 12, 2022

Code to reproduce the behavior

Expected behavior

Actual behavior

Current version of MDAnalysis

IAlibay commented Sep 12, 2022

AntonJansen96 commented Sep 12, 2022

BradyAJohnston commented Jan 2, 2023 • edited Loading

IAlibay commented Jan 2, 2023

BradyAJohnston commented Jan 2, 2023

IAlibay commented Jan 2, 2023

richardjgowers commented Jan 2, 2023

BradyAJohnston commented Jan 2, 2023

IAlibay commented Jan 2, 2023 • edited Loading

BradyAJohnston commented Jan 4, 2023

BradyAJohnston commented Jan 2, 2023 •

edited

Loading

IAlibay commented Jan 2, 2023 •

edited

Loading