[Feature request] Add number of ambiguous characters in `seqkit stats` #490

apcamargo · 2024-10-30T02:45:36Z

The number of ambiguous characters is often a good metric for genome quality. As far as I know, there's no way to add the number of counts of a given character to seqkit stats. I'm using -G "Nn" to fool seqkit to think that N is a gap character, but it would be nice to have that as an additional (optional) column.

The text was updated successfully, but these errors were encountered:

shenwei356 · 2024-10-30T02:58:38Z

Do ambiguous characters mean "N" for DNA/RNA and "N/X" for amino acids?

apcamargo · 2024-10-30T03:04:00Z

Yes. Sorry, the title is indeed confusing.

shenwei356 · 2024-10-30T03:12:27Z

OK, so it means we just need to count "N" for DNA/RNA, and "N"+"X" for proteins.

apcamargo · 2024-10-30T03:16:45Z

Yep!

…haracters. #490

shenwei356 · 2024-10-30T11:19:39Z

added, just count N/n/X/x for any kind of sequence.

kakuk9 · 2024-11-14T13:10:05Z

Hi Wei. Could you also add a feature that allows us to filter nt/aa sequences based on the number and/or proportion of ambiguous characters of individual sequence (i.e. N/X) for fasta files? Thanks :D

shenwei356 · 2024-11-16T11:08:32Z

@kakuk9

$ echo -ne ">s\nactgn\n" | seqkit fx2tab -B NX
s       actgn           20.00

$ echo -ne ">s\nactgn\n" | seqkit fx2tab -B NX | awk -F'\t' '$4 >= 20' | seqkit tab2fx 
>s
actgn

shenwei356 added a commit that referenced this issue Oct 30, 2024

stats: add an extra column 'sum_n' to count the number of ambiguous c…

1c96e2f

…haracters. #490

shenwei356 mentioned this issue Nov 1, 2024

Update SeqKit to v2.9.0 bioconda/bioconda-recipes#51860

Merged

BrewTestBot mentioned this issue Nov 1, 2024

seqkit 2.9.0 Homebrew/homebrew-core#196354

Merged

shenwei356 closed this as completed Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Add number of ambiguous characters in `seqkit stats` #490

[Feature request] Add number of ambiguous characters in `seqkit stats` #490

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024 •

edited

Loading

kakuk9 commented Nov 14, 2024

shenwei356 commented Nov 16, 2024

[Feature request] Add number of ambiguous characters in seqkit stats #490

[Feature request] Add number of ambiguous characters in seqkit stats #490

Comments

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024

apcamargo commented Oct 30, 2024

shenwei356 commented Oct 30, 2024 • edited Loading

kakuk9 commented Nov 14, 2024

shenwei356 commented Nov 16, 2024

[Feature request] Add number of ambiguous characters in `seqkit stats` #490

[Feature request] Add number of ambiguous characters in `seqkit stats` #490

shenwei356 commented Oct 30, 2024 •

edited

Loading