-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Add number of ambiguous characters in seqkit stats
#490
Comments
Do ambiguous characters mean "N" for DNA/RNA and "N/X" for amino acids? |
Yes. Sorry, the title is indeed confusing. |
OK, so it means we just need to count "N" for DNA/RNA, and "N"+"X" for proteins. |
Yep! |
added, just count N/n/X/x for any kind of sequence. |
Hi Wei. Could you also add a feature that allows us to filter nt/aa sequences based on the number and/or proportion of ambiguous characters of individual sequence (i.e. N/X) for fasta files? Thanks :D |
|
The number of ambiguous characters is often a good metric for genome quality. As far as I know, there's no way to add the number of counts of a given character to
seqkit stats
. I'm using-G "Nn"
to foolseqkit
to think thatN
is a gap character, but it would be nice to have that as an additional (optional) column.The text was updated successfully, but these errors were encountered: