-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Require a minimum sequence length of 50bp after ITSx #718
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks! My only concern is that the length filtering is generally applied, and not only when running DADA2 for taxonomies. E.g. sintax by default discards sequences shorter than 32, not 50, but this can be changed with a parameter and I think there is no requirement on sequence length (not going into details on how short a sequence can be to achieve sensible results). So for sintax a sequence length of 50 is not technically needed. That being said, I guess it's easy to change the filtering cutoff in ampliseq with a custom config if one thinks 50 is too much.
Thanks for your review! Unfortunately I seem to have merged the PR just before your comment. |
Yes, I realized I was too slow when posting my comment :D I'm fine with this solution too. |
Sequences below 50bp cannot be used by DADA2 and most likely also other kmer-based classification methods fail with too short sequences (see
https://github.com/benjjneb/dada2/issues/601
). This can be usually easily prevented with--min_len_asv
, but not when ITSx is used (because this is after--min_len_asv
is appplied). Therefore, another length filtering step with 50bp min length is added here. The minimum value of 50 is not represented by a parameter but can be adjusted via a config, if required.Closes #704.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).