peanut

GAF alignment evaluation tool.

peanut calculates alignment metrics of a given GAF file from GraphAligner evaluating the CIGAR string. It outputs four metrics:

qsc
uniq
multi
nonaln

Optionally, it writes the nonaln query regions to BED.

metrics

query sequence containment (qsc)

$qsc=\frac{#\!\!E}{query\_lens}$

#E are the number of sequence matches (= or E symbol) in the GAF file. Nucleotide positions with sequence matches in multiple alignments are only counted once.
query_lens is the length of all queries in the GAF in nucleotides.

unique query sequence matches (uniq)

$uniq=\frac{uniq\_\!\!#\!\!E}{query\_lens}$

uniq_#E are the number of unique sequence matches in the GAF file.
query_lens is the length of all queries in the GAF in nucleotides.

multi query sequence matches (multi)

$multi=\frac{multi\_\!\!#\!\!E}{query\_lens}$

multi_#E are the number of multiple sequence matches in the GAF file. Nucleotide positions with more than one multiple sequence matches are only counted once.
query_lens is the length of all queries in the GAF in nucleotides.

non query sequence matches (nonaln)

$nonaln=\frac{nonaln\_\!\!#\!\!E}{query\_lens}$

nonaln_#E are the number of non-sequence matches in the GAF file.
query_lens is the length of all queries in the GAF in nucleotides.

usage

building

git clone https://github.com/pangenome/rs-peanut.git
cd rs-peanut
cargo build --release

example

peanut requires as an input a GAF file -g.

./target/release/peanut -g aln.gaf

The output is written to stdout in a tab-delimited format.

0.992910744238371	0.9926967987671109	0.00021394547126006352	0.007089255761628998

The first number is the qsc, the second number is the uniq, and the third number is the multi, and the fourth number is the nonaln.

TODOs

Add query sequence alignment match mismatch (qsamm).
Describe qsc.
Remove non-helping metrics qsamm and qsm.
Add 3 new metrics: number of unique query base alignments, number of multiple query base alignments, and number of nonaln query bases.

limits

So far, it has not been tested if peanut also works with GAF files not originating from GraphAligner.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
data/yeast		data/yeast
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

peanut

GAF alignment evaluation tool.

metrics

query sequence containment (qsc)

unique query sequence matches (uniq)

multi query sequence matches (multi)

non query sequence matches (nonaln)

usage

building

example

TODOs

limits

About

Releases

Packages

Contributors 2

Languages

License

pangenome/rs-peanut

Folders and files

Latest commit

History

Repository files navigation

peanut

GAF alignment evaluation tool.

metrics

query sequence containment (qsc)

unique query sequence matches (uniq)

multi query sequence matches (multi)

non query sequence matches (nonaln)

usage

building

example

TODOs

limits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages