Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with annotation of variants #13

Open
SamakshSingh99 opened this issue Dec 23, 2023 · 1 comment
Open

Issues with annotation of variants #13

SamakshSingh99 opened this issue Dec 23, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@SamakshSingh99
Copy link

Hi!
I used the following command to run the VarGen pipeline but I am getting the following error:

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gtex_tissues = gtex_tissue,
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "Getting the GTEx variants..."
[1] "Loading GTEx lookup table... Please be patient"
|--------------------------------------------------|
|==================================================|
[1] "Number of GTEx ids removed (no corresponding rsid): 135"
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 1895121 rows; more than 471669 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE

Following gtex tissues were taken into consideration :

gtex_tissue <- select_gtex_tissues("./vargen_data/GTEx_Analysis_v8_eQTL/", 
                                   c("adipose",
                                     "artery_coronary",
                                     "pancreas", 
                                     "breast", 
                                     "heart", 
                                     "liver",
                                     "stomach",
                                     "lung",
                                     "fibroblasts",
                                     "small_intestine", "whole_blood", "brain"))

Also, when I am not including the gtex tissue in the pipeline I am able to run the pipeline with some warnings but unable to perfrom annotations:

Running Pipeline :

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "No values for 'gtex_tissues', skipping GTEx step..."
[1] "Getting the gwas variants,,,"
[1] "Writing the variants to .//vargen_variants.tsv"
Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
8: In (function (seqlevels, genome, new_style)  :
  cannot switch GRCh38's seqlevels from NCBI to UCSC style

Performing annotation:

> disease_annotation <- annotate_variants(disease_variants$rsid, verbose = T)
Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow
@SamakshSingh99
Copy link
Author

Given the 'disease_variant' dataframe containing variants from sources 'gwas,' 'omim,' and 'fantom5,' if I split this dataframe based on the source of variants, perform annotation separately using the 'annotate_variants' function for each source, and then merge the annotated dataframes together, will this process yield the same results as annotating all variants together in a single dataframe?

@MCorentin MCorentin self-assigned this Jan 28, 2024
@MCorentin MCorentin added the bug Something isn't working label Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants