TERRACE is a circRNA assembler for paired-end RNA-seq data.
TERRACE can be easily installed via conda. If you would install it from source code, please follow INSTALL.
The usage of TERRACE
is:
terrace -i <input.bam> -o <output.gtf> -fa <reference-genome.fa> --read_length <length-of-paired-end-reads> -r [reference_annotation.gtf] -fe [feature_file] [options]
The input.bam
is the read alignment file generated by some RNA-seq aligner, (for example, STAR or HISAT2).
Make sure that it is sorted; otherwise run samtools
to sort it:
samtools sort input.bam > input.sort.bam
The reconstructed circular transcripts shall be written as GTF format into output.gtf
. Detailed documentation about GTF format is available from Ensembl.
reference-genome.fa
is the reference genome file in fasta format. Recommended - Gencode GRCh37/GRCh38.
length-of-paired-end-reads
is the length of the reads used to produce the alignment file.
reference_annotation.gtf
is the annotation file in GTF format. This parameter is optional.
feature_file
is a csv file generated by TERRACE that contains circRNA features used in a machine learning model for assigning confidence scores. This parameter is optional. For detailed usage of this file, see the section on Scoring
below.
TERRACE support the following parameters. Please refer to additional explanations below the table.
Parameters | Default Value | Description |
---|---|---|
--help | print usage of TERRACE and exit | |
--version | print version of TERRACE and exit | |
--preview | show the inferred library_type and exit |
|
--library_type | empty | chosen from {empty, unstranded, first, second} |
--library_type
is highly recommended to provide. The unstranded
, first
, and second
correspond to fr-unstranded
, fr-firststrand
, and fr-secondstrand
used in standard Illumina
sequencing libraries. If none of them is given, i.e., it is empty
by default, then TERRACE
will try to infer the library_type
by itself (see --preview
). Notice that such inference is based
on the XS
tag stored in the input bam
file. If the input bam
file do not contain XS
tag,
then it is essential to provide the library_type
to TERRACE. You can try --preview
to see
the inferred library_type
.
A small example of input data example-input.bam
is available in the example
directory.
Suppose we have installed TERRACE following the steps in the Installation
section, we have the executable file terrace
at src/terrace
.
Commands to enter example
directory and run TERRACE using example-input.bam
as input:
cd ./example
../src/terrace -i example-input.bam -o example-output.gtf --read_length 150
An output file named example-output.gtf
will appear in the example
directory.
The output file stores the reconstructed circular transcripts assembled by TERRACE in GTF format.
The output.gtf
generated by TERRACE consists of abundance values in the score
field of the GTF file by default. We provide a Random Forest pre-trained model to generate more reliable scores (between 0 to 1) and integrate them in the score
field of the GTF file. After integrating the scores, a user-defined threshold can be provided to generate a supplementary precise.gtf file that contains circRNAs with scores above the given threshold. Please refer to RF-scoring/README for details of score generation, integration, and precise.gtf file.
To make use of the scoring functionalities, TERRACE need to be run to generate a feature file as follows.
cd ./example
../src/terrace -i example-input.bam -o example-output.gtf --read_length 150 -fe feature_file
An output file named example-output.gtf
and a feature file feature_file
will appear in the example
directory.
The output file stores the reconstructed circular transcripts assembled by TERRACE in GTF format. the feature file stores the features of output circRNAs needed for score generation.