See overview and documentation:
Merge gene predictions of BRAKER (braker-final.gff3
) with gene predictions INferred Directly (DI-final.gff3
).
Note: See the details to generate these two predictions in braker
and DirectInf
.
-
Consolidate all the transcripts from
braker-final.gff3
andDI-final.gff3
, and predict potential protein coding sequence by Mikado:-
Make a configure file and prepare transcripts
You should prepare a
list_BIND.txt
as below to include gtf path (1st column), gtf abbrev (2nd column), stranded-specific or not (3rd column):braker-final.gff3 br False DI-final.gff3 DI False
Then run the script as below:
./01_runMikado_round1.sh TAIR10_chr_all.fas junctions.bed list_BIND.txt BIND
This will generate
BIND_prepared.fasta
file that will be used for predicting ORFs in the next step.Note:
junctions.bed
is the same file generate from DirectInf step. -
Predict potential CDS from transcripts:
./02_runTransDecoder.sh BIND_prepared.fasta
We will use
BIND_prepared.fasta.transdecoder.bed
in the next step.Note: Here we only kept complete CDS for next step. You can revise
02_runTransDecoder.sh
to use both incomplete and complete CDS if you need. -
Pick best transcripts for each locus and annotate them as gene:
./03_runMikado_round2.sh BIND_prepared.fasta.transdecoder.bed BIND
This will generate:
mikado.metrics.tsv mikado.scores.tsv BIND.loci.gff3
-
-
Optional: Filter out transcripts with redundant CDS:
./04_rm_redundance.sh BIND.loci.gff3 TAIR10_chr_all.fas
-
Optional: Filter out transcripts whose predicted proteins mapped to transposon elements:
./05_TEsorter.sh filter.pep.fa BIND.loci.gff3
Note:
filter.pep.fa
is an output from previous step for removing redundant CDSs. You can also use all protein sequence if you don't want to remove redundant CDSs.