Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site upstream of transcription start site annotated as intronic #203

Open
pkothiyal opened this issue Jan 26, 2022 · 2 comments
Open

Site upstream of transcription start site annotated as intronic #203

pkothiyal opened this issue Jan 26, 2022 · 2 comments

Comments

@pkothiyal
Copy link

Hi, I tried annotating a few sites with annotateWithGeneParts and I'm getting an intron annotation even though the site is upstream of transcription start site. Here's an example:

BED12 entry for the associated transcript:

chr12      3906815 3908547 ENSMUST00000172913      0       +       3906815 3908265 0       3       68,138,295,     0,790,1437,

Site for annotation (chr12:3905814):

df <- data.frame (chr  = c("chr12"),
                  start = c(3905814),
                  end = c(3905814)
                  )

gr <- makeGRangesFromDataFrame(df, keep.extra.columns=FALSE,
                                      start.field="start",
                                      end.field="end")

as.data.frame(gr)

seqnames	start	end	width	strand
chr12	3905814	3905814	1	*

Annotate:

gene.obj=readTranscriptFeatures(mm10.bed", up.flank = 1000,down.flank = 1000, remove.unusual=TRUE, unique.prom = TRUE)
annRes=annotateWithGeneParts(gr,gene.obj)
cbind(getAssociationWithTSS(annRes), as.data.frame(getMembers(annRes)))

target.row	dist.to.feature	feature.name	feature.strand prom exon  intron
       1	-1002	ENSMUST00000172913	+               0    0    1

As seen above, the site is annotated in the intron of the transcript even though it is 1002 bases upstream of the transcription start site. The promoter flank is set at 1,000 so I understand it not being mapped to promoter but should it be intergenic instead of intronic in this case?

Thanks,
Prachi

@AstrobioMike
Copy link

Heya @pkothiyal and devs,

I happened to peek at this too, and just because of the older version of R where I tested, version 1.16.0 of genomation was installed, and this didn't happen in that version (in case it helps track down whatever is going on):

library(genomation)
packageVersion("genomation") # 1.16.0
library(GenomicRanges)
packageVersion("GenomicRanges") # 1.36.1

bed <- c("chr12", 3906815, 3908547, "ENSMUST00000172913", 0, "+", 3906815, 3908265, 0, 3, "68,138,295,", "0,790,1437,")
bed_df <- t(data.frame(bed))
write.table(bed_df, "mm10.bed", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)

df <- data.frame (chr  = c("chr12"),
                  start = c(3905814),
                  end = c(3905814)
                  )

gr <- makeGRangesFromDataFrame(df, keep.extra.columns = FALSE,
                                      start.field = "start",
                                      end.field = "end")

gene.obj <- readTranscriptFeatures("mm10.bed", up.flank = 1000, down.flank = 1000, remove.unusual = TRUE, unique.prom = TRUE)

annRes <- annotateWithGeneParts(gr, gene.obj)

cbind(getAssociationWithTSS(annRes), as.data.frame(getMembers(annRes)))

#   target.row dist.to.feature       feature.name feature.strand prom exon intron
#            1           -1002 ENSMUST00000172913              +    0    0      0

sessionInfo()
# R version 3.6.3 (2020-02-29)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS  10.16
# 
# Matrix products: default
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
#  [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] GenomicRanges_1.36.1 GenomeInfoDb_1.20.0  IRanges_2.18.3       S4Vectors_0.22.1     BiocGenerics_0.30.0  genomation_1.16.0   
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.8                  lattice_0.20-45             Rsamtools_2.0.3             Biostrings_2.52.0          
#  [5] assertthat_0.2.1            utf8_1.2.2                  gridBase_0.4-7              R6_2.5.1                   
#  [9] plyr_1.8.6                  ggplot2_3.3.5               pillar_1.6.5                zlibbioc_1.30.0            
# [13] rlang_1.0.0                 rstudioapi_0.13             data.table_1.14.2           Matrix_1.2-18              
# [17] BiocParallel_1.18.1         readr_2.1.1                 stringr_1.4.0               RCurl_1.98-1.5             
# [21] bit_4.0.4                   munsell_0.5.0               DelayedArray_0.10.0         compiler_3.6.3             
# [25] rtracklayer_1.44.4          pkgconfig_2.0.3             tidyselect_1.1.1            SummarizedExperiment_1.14.1
# [29] tibble_3.1.6                GenomeInfoDbData_1.2.1      matrixStats_0.61.0          XML_3.99-0.3               
# [33] fansi_1.0.2                 withr_2.4.3                 crayon_1.4.2                dplyr_1.0.7                
# [37] tzdb_0.2.0                  GenomicAlignments_1.20.1    bitops_1.0-7                gtable_0.3.0               
# [41] lifecycle_1.0.1             DBI_1.1.2                   magrittr_2.0.2              scales_1.1.1               
# [45] KernSmooth_2.23-20          cli_3.1.1                   stringi_1.7.6               impute_1.58.0              
# [49] vroom_1.5.7                 XVector_0.24.0              reshape2_1.4.4              ellipsis_0.3.2             
# [53] generics_0.1.1              vctrs_0.3.8                 tools_3.6.3                 bit64_4.0.5                
# [57] BSgenome_1.52.0             Biobase_2.44.0              glue_1.6.1                  seqPattern_1.16.0          
# [61] purrr_0.3.4                 hms_1.1.1                   plotrix_3.8-2               colorspace_2.0-2           
# [65] BiocManager_1.30.16    

@al2na
Copy link
Member

al2na commented Jan 31, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants