Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor make_downshift_arrays and enhance the performance of topology construction #3724

Merged

Conversation

yuxuanzhuang
Copy link
Contributor

@yuxuanzhuang yuxuanzhuang commented Jun 19, 2022

Fixes #3721

Changes made in this Pull Request:

  • refactor make_downshift_arrays
  • lazy building Universe._topology.tt. _RA, _SR during construction and serialization.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jun 19, 2022

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-25 08:23:04 UTC

@codecov
Copy link

codecov bot commented Jun 19, 2022

Codecov Report

Merging #3724 (267dcb6) into develop (886cb60) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 267dcb6 differs from pull request most recent head b6c42c6. Consider uploading reports for the commit b6c42c6 to get more accurate results

@@             Coverage Diff             @@
##           develop    #3724      +/-   ##
===========================================
- Coverage    94.31%   94.30%   -0.02%     
===========================================
  Files          192      192              
  Lines        24757    24713      -44     
  Branches      3340     3328      -12     
===========================================
- Hits         23350    23306      -44     
  Misses        1359     1359              
  Partials        48       48              
Impacted Files Coverage Δ
package/MDAnalysis/core/topology.py 99.44% <100.00%> (-0.56%) ⬇️
package/MDAnalysis/coordinates/PDB.py 94.85% <0.00%> (-0.28%) ⬇️
package/MDAnalysis/lib/util.py 90.84% <0.00%> (-0.14%) ⬇️
package/MDAnalysis/lib/distances.py 98.48% <0.00%> (-0.01%) ⬇️
package/MDAnalysis/core/groups.py 98.58% <0.00%> (ø)
package/MDAnalysis/topology/PDBParser.py 100.00% <0.00%> (ø)
package/MDAnalysis/topology/ExtendedPDBParser.py 100.00% <0.00%> (ø)
package/MDAnalysis/core/topologyattrs.py 96.65% <0.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 886cb60...b6c42c6. Read the comment docs.

@yuxuanzhuang yuxuanzhuang marked this pull request as draft June 23, 2022 09:47
@yuxuanzhuang yuxuanzhuang changed the title Refactor make_downshift_arrays and enhance the performance of topology serialization (WIP) Refactor make_downshift_arrays and enhance the performance of topology construction Jun 23, 2022
@yuxuanzhuang yuxuanzhuang marked this pull request as ready for review June 23, 2022 19:19
@yuxuanzhuang
Copy link
Contributor Author

Performance Improvement

from MDAnalysis.tests.datafiles import PDB
import MDAnalysis as mda
from MDAnalysis.core.topology import (
    TransTable,
    make_downshift_arrays,
)

u = mda.Universe(PDB)
n_atoms = u._topology.tt.n_atoms
# 47681 atoms
n_residues = u._topology.tt.n_residues
# 11302 residues
n_segments = u._topology.tt.n_segments
# 1 segment
AR = u._topology.tt._AR
RS = u._topology.tt._RS

%timeit make_downshift_arrays(AR, n_residues)
# original
29.6 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# now
12.1 ms ± 73.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit TransTable(n_atoms, n_residues, n_segments, AR, RS)
# original 
29.8 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# now
17 µs ± 57.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

tt = TransTable(n_atoms, n_residues, n_segments, AR, RS)
%timeit pickle.loads(pickle.dumps(tt))
# original
29.6 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# now
69.7 µs ± 740 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @yuxuanzhuang , this looks great

@IAlibay
Copy link
Member

IAlibay commented Jul 21, 2022

@richardjgowers this is approved but unmerged, is it good to go?

@orbeckst orbeckst assigned orbeckst and richardjgowers and unassigned orbeckst Jul 25, 2022
@orbeckst
Copy link
Member

@richardjgowers I assigned the PR to you — please merge if all is good. Thanks!

@richardjgowers richardjgowers merged commit 86efe83 into MDAnalysis:develop Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Universe serialization performance
5 participants