Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is ngs_filter doing? #204

Closed
necrolyte2 opened this issue Mar 23, 2016 · 3 comments
Closed

what is ngs_filter doing? #204

necrolyte2 opened this issue Mar 23, 2016 · 3 comments
Assignees
Labels
Milestone

Comments

@necrolyte2
Copy link
Member

I know what it can do, but the stage takes a long time to run even though nothing is selected for it to do.

Not a huge deal, but from an outsider perspective it is confusing

@averagehat
Copy link
Contributor

I changed ngs_filter to simply copy the file over if the settings are set such that nothing would be done. Is this what you're looking for?

@necrolyte2
Copy link
Member Author

I think we should investigate that stage some more. I saw very high CPU and memory usage while that stage was running which is what concerned me. It also added about 5-10 minutes to the analysis.

I'm wondering if we should consider somehow combining filter and trim out put otherwise you end up with input fastq + filter fastq + trim fastq + bam

Essentially 4x data size

@averagehat
Copy link
Contributor

The code uses lists rather than generators for simplicity.

reads = list(fq_open(readpath))
except AssertionError, E:
print "skipping biopython assertion error"
print readpath
#sys.stderr.write(str(E))
if index and idxQualMin:
idxreads = fq_open(index)
intermediate = [r for r, idx in izip(reads, idxreads) if idx_filter(r, idx, idxQualMin) ]
else:
intermediate = reads
if dropNs:
hasNs = lambda rec: 'N' in str(rec.seq).upper()
filtered = list(ifilterfalse(hasNs, intermediate))
else:
filtered = intermediate
total, badIndex, hadN = len(reads), len(reads) - len(intermediate), \
len(intermediate) - len(filtered)
return filtered, total, badIndex, hadN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants