Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readTranscriptFeatures with GTF #192

Open
igordot opened this issue Jun 10, 2020 · 6 comments
Open

readTranscriptFeatures with GTF #192

igordot opened this issue Jun 10, 2020 · 6 comments

Comments

@igordot
Copy link

igordot commented Jun 10, 2020

readTranscriptFeatures() reads a BED file. Gene features are commonly stored as a GTF file. Is there a way to import a GTF file in the proper format for annotateWithGeneParts()? There is a convenient gffToGRanges() function, but you still need to convert the resulting GRanges object to a GRangesList. Is there already a function for that?

@katwre
Copy link
Contributor

katwre commented Jun 15, 2020

Hi @igordot ,
Yes, you are right, readTranscriptFeatures() doesn't support gff files. Either you can convert your gff file to a bed file, or what you might be interested in is to use first use gffToGRanges then manipulate the output GRanges object a bit to get the features you are interested in - promoters, exons, introns etc and then use annotateWithFeatures(), e.g.:

library(genomation)
data(cage)
gff.file = system.file('extdata/chr21.refseq.hg19.gtf', package='genomation')
gr21=gffToGRanges(gff.file)
# here you can manipulate the GRanges object, e.g.:
grl21=as(split(gr21, gr21$type), "GRangesList")
> annotateWithFeatures(cage, grl21)
summary of target set annotation with feature annotation:
Rows in target set: 2326
----------------------------
percentage of target elements overlapping with features:
       exon  stop_codon         CDS start_codon 
      29.45        0.47       13.54        1.46 

percentage of feature elements overlapping with target:
       exon  stop_codon         CDS start_codon 
      18.55        4.41       14.31       12.76 

Hope it helps,
Kasia

@igordot
Copy link
Author

igordot commented Jun 15, 2020

My concern with that approach is that it gives you different results than the BED file. I assume "chr21.refseq.hg19.bed" and "chr21.refseq.hg19.gtf" should be comparable.

@katwre
Copy link
Contributor

katwre commented Jun 15, 2020

I am not sure what exactly are you asking me about. Are you asking me why a gtf and a bed file look differently? There are two different file formats, but they should be comparable, I don't know the details. If you are interested in annotating your regions of interest with exons, introns, and promoters (output regions of readTranscriptFeatures) and you are not sure how to get their coordinates from a gtf file then check out .e.g GenomicFeatures::makeTxDbFromGFF, rtracklayer::exonsBy and rtracklayer::intronsByTranscript, and promoters are just 1kb (by default in genomation) flanking regions around TSS.

@igordot
Copy link
Author

igordot commented Jun 15, 2020

I understand that the BED and GTF files are different. I wanted to see if there was a way to achieve the same annotation results from both.

It sounds like it is possible, but requires a few extra steps, such as rtracklayer::exonsBy and rtracklayer::intronsByTranscript.

@al2na
Copy link
Member

al2na commented Jun 15, 2020 via email

@igordot
Copy link
Author

igordot commented Jun 15, 2020

Thanks for clarifying. I was hoping all or some parts were already included in the package. It would be a nice feature to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants