Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat speed optimization #338

Merged
merged 5 commits into from
Oct 31, 2019
Merged

Feat speed optimization #338

merged 5 commits into from
Oct 31, 2019

Conversation

taoliu
Copy link
Contributor

@taoliu taoliu commented Oct 31, 2019

  1. Remove hashtable.pyx and use py3 dict for pvalue-qvalue checkup Feat: Optimize MACS speed in Python3 #335

    After testing the CPU/mem usage in v2.1.4 (py2.7) and v2.2.4 (py3.7), I found that the old hashtable.pyx implementation copied from Pandas (very old version) doesn't work well in Python3. It slows down the pvalue-qvalue checkup (in CallPeakUnit.pyx) for about 40 times with the identical Cython codes. While testing on my laptop on 5million ChIP vs 5 million Control (newly added testing data in test/ folder), the getitem function in the hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but 148.6s with the same number of calls in MACS2 v2.2.4. Therefore, I fall back to the standard python dictionary implementation for pqtable checkup. It is faster than the old py2 version hashtable.pyx, but uses a bit more memory. As an example, with the new implementation (py3 dict) in the branch feat_speed_optimization can finish 5M reads test 20% faster than MACS2 v2.1.4, and use 15% more memory than MACS2 v2.1.4.

  2. Add 5Million reads ChIP (CTCF from ENCODE2) and Ctrl for testing MACS2 performance. Now the test.sh will output a summary of CPU and memory usage of the 5M test run.

PS: May relate to #334

@taoliu taoliu added the v2.2.5 label Oct 31, 2019
@taoliu taoliu self-assigned this Oct 31, 2019
@codecov
Copy link

codecov bot commented Oct 31, 2019

Codecov Report

Merging #338 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #338   +/-   ##
=======================================
  Coverage   94.54%   94.54%           
=======================================
  Files           8        8           
  Lines         440      440           
=======================================
  Hits          416      416           
  Misses         24       24

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e871a8...46ec485. Read the comment docs.

@taoliu taoliu merged commit 88b322f into master Oct 31, 2019
@taoliu taoliu mentioned this pull request Oct 31, 2019
@taoliu taoliu deleted the feat_speed_optimization branch September 6, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant