Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove hashtable.pyx and use py3 dict for pvalue-qvalue checkup Feat: Optimize MACS speed in Python3 #335
After testing the CPU/mem usage in v2.1.4 (py2.7) and v2.2.4 (py3.7), I found that the old hashtable.pyx implementation copied from Pandas (very old version) doesn't work well in Python3. It slows down the pvalue-qvalue checkup (in CallPeakUnit.pyx) for about 40 times with the identical Cython codes. While testing on my laptop on 5million ChIP vs 5 million Control (newly added testing data in test/ folder), the getitem function in the hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but 148.6s with the same number of calls in MACS2 v2.2.4. Therefore, I fall back to the standard python dictionary implementation for pqtable checkup. It is faster than the old py2 version hashtable.pyx, but uses a bit more memory. As an example, with the new implementation (py3 dict) in the branch
feat_speed_optimization
can finish 5M reads test 20% faster than MACS2 v2.1.4, and use 15% more memory than MACS2 v2.1.4.Add 5Million reads ChIP (CTCF from ENCODE2) and Ctrl for testing MACS2 performance. Now the test.sh will output a summary of CPU and memory usage of the 5M test run.
PS: May relate to #334