Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The meaning of the six column in the Augustus gff3 output file and the meaning of --minexonitronprobe parameter #418

Open
shiyi-pan opened this issue Jul 24, 2024 · 1 comment

Comments

@shiyi-pan
Copy link

Hi, Augustus is an awesome de novo gene predictor and I want to used it to predict genes of my genome. Here is my script:

augustus --species=Soy10_1 --genemodel=complete --minexonintronprob=0.40 --minmeanexonintronprob=0.40 --uniqueGeneId=true --noInFrameStop=true --gff3=on --strand=both new_ptg000006l.fa > test_new_ptg000006l.out

I have two quesitons. First, what's the meaning of the six column in the Augustus gff3 format output file ,is the probabilities of the corresponding features ? second, if true, why I got an probability below 0.40 when i set the --minexonintronprob=0.40 parameter ?
Here is an example of predicted gene with an intron probability is 0.39:

start gene ptg000006l_RC.g3827

ptg000006l_RC AUGUSTUS gene 19703801 19704105 0.39 - . ID=ptg000006l_RC.g3827
ptg000006l_RC AUGUSTUS transcript 19703801 19704105 0.39 - . ID=ptg000006l_RC.g3827.t1;Parent=ptg000006l_RC.g3827
ptg000006l_RC AUGUSTUS stop_codon 19703801 19703803 . - 0 Parent=ptg000006l_RC.g3827.t1
ptg000006l_RC AUGUSTUS intron 19703902 19703990 0.39 - . Parent=ptg000006l_RC.g3827.t1
ptg000006l_RC AUGUSTUS CDS 19703801 19703901 0.39 - 2 ID=ptg000006l_RC.g3827.t1.cds;Parent=ptg000006l_RC.g3827.t1
ptg000006l_RC AUGUSTUS CDS 19703991 19704105 0.65 - 0 ID=ptg000006l_RC.g3827.t1.cds;Parent=ptg000006l_RC.g3827.t1
ptg000006l_RC AUGUSTUS start_codon 19704103 19704105 . - 0 Parent=ptg000006l_RC.g3827.t1

Looking forward to your reply very much.Thank you .

@MarioStanke
Copy link
Contributor

Yes, the 6th column holds the posterior probability of the exons and introns, given the sequence and evidence.

It may be that the variable

keep_viterbi                true  # set to true if all Viterbi transcripts should be reported

is set to true in your speciess config file. Then all transcripts from the most likely parse are kept.
Please try to rerun with --keep_viterbi=false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants