Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add complexity filter? #13

Closed
jfy133 opened this issue Jun 6, 2018 · 6 comments
Closed

Add complexity filter? #13

jfy133 opened this issue Jun 6, 2018 · 6 comments
Labels
enhancement New feature or request feature

Comments

@jfy133
Copy link
Member

jfy133 commented Jun 6, 2018

In our group we've noticed that we regularly get lots of poly G reads from NextSeq data which don't get discarded by the sequencer or demultiplexer. This can mess up some downstream statistics if not thrown out.

Maybe we could consider having as a module some form of complexity filter to remove low complexity reads?

@apeltzer
Copy link
Member

apeltzer commented Jun 6, 2018

Good idea - do you know any tool to be able to do/achieve that?

@jfy133
Copy link
Member Author

jfy133 commented Jun 6, 2018

@jfy133
Copy link
Member Author

jfy133 commented Aug 2, 2018

Possibly a better (more recent) tool designed specifically for the case described above: https://github.com/OpenGene/fastp#polyg-tail-trimming

@apeltzer apeltzer added enhancement New feature or request feature labels Aug 5, 2018
@apeltzer apeltzer added this to the V2.0 "Gray Wolf" milestone Aug 15, 2018
@apeltzer
Copy link
Member

Fastp is really a nice tool.

I'll add this before AdapterRemoval, keeping adapters and qualities untouched and only performing the poly_g_trimming on demand (default off, but people can turn it on if they want to!)

@apeltzer
Copy link
Member

Some notes for myself:

SE: 

fastp -in1 read1 -out1 "${read.baseName}.pG.fq.gz" -A -g --poly_g_min_lin 10 -Q -L 
-w ${task.cpus} -json "${read.baseName}"_fastp.json 
 
PE:
fastp -in1 read1 -in2  -out1 "${read.baseName}.pG.fq.gz" -out2 "${read.baseName}.pG.fq.gz" -A -g --poly_g_min_lin 10 -Q -L 
-w ${task.cpus} -json "${read.baseName}"_fastp.json 


parameters to add:

params.complexity_filter = false
params.complexity_filter_poly_g_min = 10

@apeltzer
Copy link
Member

As of commit 24c3329 , this is implemented and also covered by test cases for both single end and paired end data.

apeltzer pushed a commit that referenced this issue Nov 12, 2019
jfy133 added a commit that referenced this issue May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

2 participants