-
Notifications
You must be signed in to change notification settings - Fork 0
Tool Use Cases
RNA Central contains many annotations for many species. Let's process their annotations.
Firstly, filter for your selected biotype. Here we select Y RNAs
python gtf_modifiers.py select_gtf /Users/kenminsoo/Desktop/unprocessed-annotations/testing_env/final/homo_sapies_nolnc_UCSC.GRCh38.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf type Y_RNA
Then, let's just simply add sequences.
python gtf_modifiers.py add_sequence_gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf sequence
But unfortunately this throws an error!
Differing number of GFF fields encountered at line: 7. Exiting...
We must standardize the fields. First let's try to select for only "transcripts." Since sRNAfrag considers every fragment, it will detect what are basically ncRNA exons.
python gtf_modifiers.py select_column /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/yrna.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf 2 transcript
But the error persists. We must standardize attributes.
python gtf_modifiers.py standardize_attributes /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_2.gtf '{"ID":"transcript_id", "type":"biotype"}'
Great, now we should be done... But no we are not. The chromosomes are not all found! While the default command for standarizing chromosomes uses humans, I will show how it can be done for all species. First find chromsome names.
python alias_work.py fasta_chr_extract /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa
Output
['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']
Now we select for these chromosomes.
python alias_work.py gtf_chr_select /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_2.gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_3.gtf "['chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr2', 'chr20', 'chr21', 'chr22', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chrM', 'chrX', 'chrY']"
Finally we can add sequences.
python gtf_modifiers.py add_sequence_gtf /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_3.gtf /Users/kenminsoo/Desktop/unprocessed-annotations/hg38_std.fa /Volumes/Extreme_SSD/Lung_smRNAseq-2021_BW/tRNA/seq_yrna_4.gtf sequence
The GTF file is now ready for use in the sRNAfrag pipeline.