Skip to content

Tool Use Cases

Ken Nakatsu edited this page Aug 19, 2023 · 3 revisions

Use cases for the GTF scripts.

RNA Central contains many annotations for many species. Let's process their annotations.

Firstly, filter for your selected biotype. Here we select Y RNAs

python gtf_modifiers.py select_gtf /Users/kenminsoo/Desktop/unprocessed-annotations/testing_env/final/homo_sapies_nolnc_UCSC.GRCh38.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf type Y_RNA

Then, let's just simply add sequences.

python gtf_modifiers.py add_sequence_gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf sequence

But unfortunately this throws an error!

Differing number of GFF fields encountered at line: 7.  Exiting...

We must standardize the fields. First let's try to select for only "transcripts." Since sRNAfrag considers every fragment, it will detect what are basically ncRNA exons.

python gtf_modifiers.py select_column /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf  /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf 2 transcript

But the error persists. We must standardize attributes.

python gtf_modifiers.py standardize_attributes /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf  /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_2.gtf '{"ID":"transcript_id", "type":"biotype"}'

Great, now we should be done... But no we are not. The chromosomes are not all found! While the default command for standarizing chromosomes uses humans, I will show how it can be done for all species. First find chromsome names.

python alias_work.py fasta_chr_extract  /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa 

Output

['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']

Now we select for these chromosomes.

python alias_work.py gtf_chr_select /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_2.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_3.gtf "['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']"

Finally we can add sequences.

python gtf_modifiers.py add_sequence_gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_3.gtf /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_4.gtf sequence

The GTF file is now ready for use in the sRNAfrag pipeline.

Clone this wiki locally