Issues with annotation of variants #13

SamakshSingh99 · 2023-12-23T22:10:02Z

Hi!
I used the following command to run the VarGen pipeline but I am getting the following error:

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gtex_tissues = gtex_tissue,
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "Getting the GTEx variants..."
[1] "Loading GTEx lookup table... Please be patient"
|--------------------------------------------------|
|==================================================|
[1] "Number of GTEx ids removed (no corresponding rsid): 135"
Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__,  : 
  Join results in 1895121 rows; more than 471669 = nrow(x)+nrow(i). Check for duplicate key values in i each of which join to the same group in x over and over again. If that's ok, try by=.EACHI to run j for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
In addition: Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE

Following gtex tissues were taken into consideration :

gtex_tissue <- select_gtex_tissues("./vargen_data/GTEx_Analysis_v8_eQTL/", 
                                   c("adipose",
                                     "artery_coronary",
                                     "pancreas", 
                                     "breast", 
                                     "heart", 
                                     "liver",
                                     "stomach",
                                     "lung",
                                     "fibroblasts",
                                     "small_intestine", "whole_blood", "brain"))

Also, when I am not including the gtex tissue in the pipeline I am able to run the pipeline with some warnings but unable to perfrom annotations:

Running Pipeline :

> disease_variants <- vargen_pipeline(vargen_dir = "./vargen_data/", 
+                                     omim_morbid_ids = c("125853",
+                                                         "222100", 
+                                                         "601665", 
+                                                         "604302",
+                                                         "212750",
+                                                         "145500",
+                                                         "603813",
+                                                         "166710",
+                                                         "266600",
+                                                         "610938",
+                                                         "601367",
+                                                         "223100",
+                                                         "600807"),
+                                     gwas_traits = disease_traits, 
+                                     verbose = T)
[1] "Connecting to the gene mart..."
[1] "Connecting to the snp mart..."
[1] "Building the gwascat object..."
[1] "Reading the enhancer tss association file for FANTOM5... './vargen_data//enhancer_tss_associations.bed'"
[1] "Starting the pipeline..."
[1] "Getting genes for OMIM: 125853"
[1] "Getting genes for OMIM: 222100"
[1] "Getting genes for OMIM: 601665"
[1] "Getting genes for OMIM: 604302"
[1] "Getting genes for OMIM: 212750"
[1] "Getting genes for OMIM: 145500"
[1] "Getting genes for OMIM: 603813"
[1] "Getting genes for OMIM: 166710"
[1] "Getting genes for OMIM: 266600"
[1] "Getting genes for OMIM: 610938"
[1] "Getting genes for OMIM: 601367"
[1] "Getting genes for OMIM: 223100"
[1] "Getting genes for OMIM: 600807"
[1] "Writing the list of genes to: .//genes_info.tsv"
[1] "No values for 'gtex_tissues', skipping GTEx step..."
[1] "Getting the gwas variants,,,"
[1] "Writing the variants to .//vargen_variants.tsv"
Warning messages:
1: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Gene mart not provided (or not a valid Mart object).We used one from connect_to_gene_ensembl() instead.
2: In vargen_pipeline(vargen_dir = "./vargen_data/", omim_morbid_ids = c("125853",  :
  Snp mart not provided (or not a valid Mart object).We used one from connect_to_snp_ensembl() instead.
3: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
4: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
5: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
6: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
7: In type.convert.default(X[[i]], ...) :
  'as.is' should be specified by the caller; using TRUE
8: In (function (seqlevels, genome, new_style)  :
  cannot switch GRCh38's seqlevels from NCBI to UCSC style

Performing annotation:

> disease_annotation <- annotate_variants(disease_variants$rsid, verbose = T)
Error in if (cj == upper) next : missing value where TRUE/FALSE needed
In addition: Warning message:
In (1:g) * nnm : NAs produced by integer overflow

The text was updated successfully, but these errors were encountered:

SamakshSingh99 · 2023-12-24T01:00:59Z

Given the 'disease_variant' dataframe containing variants from sources 'gwas,' 'omim,' and 'fantom5,' if I split this dataframe based on the source of variants, perform annotation separately using the 'annotate_variants' function for each source, and then merge the annotated dataframes together, will this process yield the same results as annotating all variants together in a single dataframe?

MCorentin self-assigned this Jan 28, 2024

MCorentin added the bug Something isn't working label Jan 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with annotation of variants #13

Issues with annotation of variants #13

SamakshSingh99 commented Dec 23, 2023

SamakshSingh99 commented Dec 24, 2023

Issues with annotation of variants #13

Issues with annotation of variants #13

Comments

SamakshSingh99 commented Dec 23, 2023

SamakshSingh99 commented Dec 24, 2023