-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster filter #215
Faster filter #215
Conversation
So I don't get how it does the diff, but this branch is 3 commits ahead and 61 commits behind master. If you switch to the branch on the code page on GitHub it shows that. |
Yeah, weird. I'll look into this tomorrow. |
Apparently my faster code makes the tests run too slow? lol https://travis-ci.org/VDBWRAIR/ngs_mapper/builds/118045722#L2242 edit: Indeed, my "faster" code was written in a delusional state, apparently. It's excessively slow, and not lazy at all because of the use of |
dropRead = hasN or indexIsBad | ||
total += 1 | ||
if not dropRead: | ||
keptReads = chain([read], keptReads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible this use of itertools.chain
is slow.
edit: it's very slow.
Assuming the second build passes, this is ready for review. If not I'll fix it up monday. |
|
Weird unrelated test failure: https://travis-ci.org/VDBWRAIR/ngs_mapper/builds/119042228#L2233-L2242 Note that slow_tests pass anyway |
Wow that failure was literally a fluke: https://travis-ci.org/VDBWRAIR/ngs_mapper/builds/119050624#L2233-L2235 |
@@ -104,14 +104,25 @@ def is_valid(fn): | |||
msg= "Skipped files %s that were not within chosen platforms %s" % ( plat_files, platforms) | |||
if not files: | |||
raise ValueError("No fastq or sff files found in directory %s" % readsdir + '\n' + msg) | |||
if parallel: | |||
if threads > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it would work fine, but I'm wondering why you took this approach instead of just pool = multiprocessing.Pool(threads)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about that option
Closes #204
This changes
ngs_filter
to:In theory this should be significantly faster. But it still looks at every sequence and index sequence, and biopython will load and convert quality, etc. Further optimizations would be possible by skipping that process