Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes chi1_selections (#4108) #4109

Merged
merged 5 commits into from
Apr 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package/AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ Chronological list of authors
- Xiaoxu Ruan
- Egor Marin
- Shaivi Malik
- Daniel J. Evans

External code
-------------
Expand Down
3 changes: 3 additions & 0 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ The rules for this file:
* 2.5.0

Fixes
* Fix chi1_selections() ignored atom names CG1 OG OG1 SG and incorrectly returned
None for amino acids CYS, ILE, SER, THR, VAL (Issue #4108)
* Fix parsing of box vector in H5MD reader (Issue #4076)
* Fix the misleading 'AtomGroup.guess_bonds()' docs and passed missing
arguments (PR #4059)
Expand All @@ -37,6 +39,7 @@ Fixes
(Issue #3336)

Enhancements
* Improved speed of chi1_selection (PR #4109)
* Add `progressbar_kwargs` parameter to `AnalysisBase.run` method, allowing to modify description, position etc of tqdm progressbars.
* Add a nojump transformation, which unwraps trajectories so that particle
paths are continuous. (Issue #3703, PR #4031)
Expand Down
11 changes: 6 additions & 5 deletions package/MDAnalysis/core/topologyattrs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1233,15 +1233,16 @@ def chi1_selection(residue, n_name='N', ca_name='CA', cb_name='CB',
faster atom matching with boolean arrays.
"""
names = [n_name, ca_name, cb_name, cg_name]
ags = [residue.atoms.select_atoms(f"name {n}") for n in names]
atnames = residue.atoms.names
ags = [residue.atoms[np.in1d(atnames, n.split())] for n in names]
if any(len(ag) != 1 for ag in ags):
return None
return sum(ags)

transplants[Residue].append(('chi1_selection', chi1_selection))

def chi1_selections(residues, n_name='N', ca_name='CA', cb_name='CB',
cg_name='CG'):
cg_name='CG CG1 OG OG1 SG'):
"""Select list of AtomGroups corresponding to the chi1 sidechain dihedral
N-CA-CB-CG.

Expand All @@ -1266,13 +1267,13 @@ def chi1_selections(residues, n_name='N', ca_name='CA', cb_name='CB',
"""
results = np.array([None]*len(residues))
names = [n_name, ca_name, cb_name, cg_name]
keep = [all(sum(r.atoms.names == n) == 1 for n in names)
for r in residues]
keep = [all(sum(np.in1d(r.atoms.names, n.split())) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your updated code.

@IAlibay @richardjgowers are we ok with also updating chi1_selection() with the same solution, which avoids select_atoms()? We could do a separate PR but this seems unnecessary bureaucracy to me, unless we want a cleaner separation of work in CHANGELOG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context #4109 (comment)

I based my fix on chi1_selection, which also uses atom selection. But I agree that this is slow. I wrote an alternate version that avoids atom selection. In my testing, the newest code takes 18 us while the atom selection code takes 147 us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the CHANGELOG; I had a couple ideas:

  1. List the changes to both chi1_selections and chi1_selection in a single entry in the Fixes section. Maybe something like this: "Fix chi1_selections incorrectly returning None (Issue 4108) and chi_selection running unnecessarily slowly".
  2. List the change to chi1_selections as a Fix, and the change to chi1_selection as an enhancement. Maybe "Fix chi1_selections incorrectly returning None (Issue 4108)" in Fixes, and "Improved speed of chi1_selection" in Enhancements.

I think I favor option (2). But I don't know the history or culture behind this project's CHANGELOG, so I don't have a strong opinion. And it's definitely possible there's a better option that I haven't thought of!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) seems better to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanielJamesEvans I haven't heard any opposition to you updating chi1_selection() in this PR. I encourage you to add it to your PR. Once you've done so, please ping me so that I can review again.

Obviously, anyone else is also welcome to review and if dissenting opinions are voiced in other reviews we can address them then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I added the chi1_selection() update to my PR.

for n in names) for r in residues]
keepix = np.where(keep)[0]
residues = residues[keep]

atnames = residues.atoms.names
ags = [residues.atoms[atnames == n] for n in names]
ags = [residues.atoms[np.in1d(atnames, n.split())] for n in names]
results[keepix] = [sum(atoms) for atoms in zip(*ags)]
return list(results)

Expand Down
6 changes: 6 additions & 0 deletions testsuite/MDAnalysisTests/core/test_atomgroup.py
Original file line number Diff line number Diff line change
Expand Up @@ -874,6 +874,12 @@ def test_chi1_selections(self, resgroup):
rssel = [r.chi1_selection() for r in resgroup]
assert_equal(rgsel, rssel)

@pytest.mark.parametrize("resname", ["CYS", "ILE", "SER", "THR", "VAL"])
def test_chi1_selections_non_cg(self, resname, PSFDCD):
resgroup = PSFDCD.select_atoms(f"resname {resname}").residues
rgsel = resgroup.chi1_selections()
assert not any(sel is None for sel in rgsel)

@pytest.mark.parametrize("resname", ["CYSH", "ILE", "SER", "THR", "VAL"])
def test_chi1_selection_non_cg_gromacs(self, resname, TPR):
resgroup = TPR.select_atoms(f"resname {resname}").residues
Expand Down