Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unite() function reassigns sample ID's #283

Closed
laurahspencer opened this issue May 15, 2023 · 7 comments
Closed

unite() function reassigns sample ID's #283

laurahspencer opened this issue May 15, 2023 · 7 comments

Comments

@laurahspencer
Copy link

When running the unite() function in methylKit, the sample IDs specified in the methylRaw object are not correctly being assigned. I have 13 samples, with the IDs 1, 2, 3, 4, 7, 8, 10, 11, 12, 15, 17, 18, 20 (i.e. not all consecutive numbers). When I run unite() the resulting columns of data in the methylBase object contain sample IDs #1-13. Thinking that unite() renamed them consecutively I tried to spot-check the data to see if I can confidently rename the columns using my original IDs, but the data doesn't line up (possibly because I've destranded, but I can't be sure). How can I make sure that the correct sample IDs are transferred over to the methylBase object?

Here is a directory containing the methylRaw object meth_10x, and using the code methylKit::unite(meth_10x, destrand=TRUE) I created the methylBase object meth_unite_10x, which has the incorrect sample IDs.

@alexg9010
Copy link
Collaborator

Hi @laurahspencer,

Thanks for reporting your issue.
Unfortunately, I am not able to reproduce your issue on my machine ( see attached output ).

Could you please give me more information about your session, by posting the sessionInfo()? Maybe you are using an outdated version of R or methylKit which is causing the issue.

Best,
Alex

library(methylKit)

load(file = "~/R-dev/methylkit-dev/issue283/meth_10x")

getSampleID(meth_10x)
#>  [1] "1"  "2"  "3"  "4"  "7"  "8"  "10" "11" "12" "15" "17" "18" "20"
getSampleID(unite(meth_10x))
#> uniting...
#>  [1] "1"  "2"  "3"  "4"  "7"  "8"  "10" "11" "12" "15" "17" "18" "20"
getSampleID(unite(meth_10x, destrand = TRUE))
#> destranding...
#> uniting...
#>  [1] "1"  "2"  "3"  "4"  "7"  "8"  "10" "11" "12" "15" "17" "18" "20"

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C         LC_TIME=C           
#>  [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=C       
#>  [7] LC_PAPER=C           LC_NAME=C            LC_ADDRESS=C        
#> [10] LC_TELEPHONE=C       LC_MEASUREMENT=C     LC_IDENTIFICATION=C 
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] methylKit_1.22.0     GenomicRanges_1.48.0 GenomeInfoDb_1.32.2 
#> [4] IRanges_2.30.0       S4Vectors_0.34.0     BiocGenerics_0.42.0 
#> 
#> loaded via a namespace (and not attached):
#>  [1] MatrixGenerics_1.8.1        Biobase_2.56.0             
#>  [3] splines_4.2.0               R.utils_2.12.0             
#>  [5] gtools_3.9.3                assertthat_0.2.1           
#>  [7] highr_0.9                   GenomeInfoDbData_1.2.8     
#>  [9] Rsamtools_2.12.0            yaml_2.3.5                 
#> [11] numDeriv_2016.8-1.1         pillar_1.7.0               
#> [13] lattice_0.20-45             glue_1.6.2                 
#> [15] limma_3.52.2                bbmle_1.0.25               
#> [17] digest_0.6.29               XVector_0.36.0             
#> [19] qvalue_2.28.0               colorspace_2.0-3           
#> [21] htmltools_0.5.2             Matrix_1.4-1               
#> [23] R.oo_1.25.0                 plyr_1.8.7                 
#> [25] XML_3.99-0.10               pkgconfig_2.0.3            
#> [27] emdbook_1.3.12              zlibbioc_1.42.0            
#> [29] purrr_0.3.4                 mvtnorm_1.1-3              
#> [31] scales_1.2.0                BiocParallel_1.30.3        
#> [33] tibble_3.1.7                styler_1.8.1               
#> [35] generics_0.1.3              ggplot2_3.4.0              
#> [37] ellipsis_0.3.2              SummarizedExperiment_1.26.1
#> [39] withr_2.5.0                 cli_3.4.1                  
#> [41] magrittr_2.0.3              crayon_1.5.2               
#> [43] mclust_5.4.10               evaluate_0.15              
#> [45] R.methodsS3_1.8.2           fs_1.5.2                   
#> [47] fansi_1.0.3                 R.cache_0.16.0             
#> [49] MASS_7.3-56                 tools_4.2.0                
#> [51] data.table_1.14.2           matrixStats_0.62.0         
#> [53] BiocIO_1.6.0                lifecycle_1.0.3            
#> [55] stringr_1.4.0               munsell_0.5.0              
#> [57] reprex_2.0.2                DelayedArray_0.22.0        
#> [59] Biostrings_2.64.0           compiler_4.2.0             
#> [61] fastseg_1.42.0              rlang_1.0.6                
#> [63] grid_4.2.0                  RCurl_1.98-1.7             
#> [65] rstudioapi_0.14             rjson_0.2.21               
#> [67] bitops_1.0-7                rmarkdown_2.14             
#> [69] restfulr_0.0.15             gtable_0.3.0               
#> [71] codetools_0.2-18            DBI_1.1.3                  
#> [73] reshape2_1.4.4              R6_2.5.1                   
#> [75] GenomicAlignments_1.32.1    knitr_1.39                 
#> [77] dplyr_1.0.9                 rtracklayer_1.56.1         
#> [79] bdsmatrix_1.3-6             fastmap_1.1.0              
#> [81] utf8_1.2.2                  stringi_1.7.8              
#> [83] parallel_4.2.0              Rcpp_1.0.9                 
#> [85] vctrs_0.5.1                 tidyselect_1.2.0           
#> [87] xfun_0.36                   coda_0.19-4

Created on 2023-05-16 with reprex v2.0.2

@laurahspencer
Copy link
Author

Interesting! Here is my session info:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clipr_0.8.0                BiocManager_1.30.19        ggrepel_0.9.2              googlesheets4_1.0.1        janitor_2.1.0             
 [6] corrplot_0.92              PerformanceAnalytics_2.0.4 xts_0.12.2                 zoo_1.8-11                 factoextra_1.0.7          
[11] vegan_2.6-4                lattice_0.20-45            permute_0.9-7              viridis_0.6.2              viridisLite_0.4.1         
[16] Pstat_1.2                  matrixStats_0.62.0         ggforce_0.4.1              methylKit_1.24.0           GenomicRanges_1.49.0      
[21] GenomeInfoDb_1.34.2        IRanges_2.32.0             S4Vectors_0.36.0           BiocGenerics_0.44.0        here_1.0.1                
[26] reshape2_1.4.4             forcats_0.5.2              stringr_1.4.1              dplyr_1.0.10               purrr_0.3.5               
[31] readr_2.1.3                tidyr_1.2.1                tibble_3.1.8               ggplot2_3.4.0              tidyverse_1.3.2           

loaded via a namespace (and not attached):
  [1] Rsamtools_2.14.0            rprojroot_2.0.3             crayon_1.5.2                MASS_7.3-58.1               nlme_3.1-160               
  [6] backports_1.4.1             reprex_2.0.2                rlang_1.0.6                 XVector_0.38.0              readxl_1.4.1               
 [11] limma_3.54.0                BiocParallel_1.32.1         rjson_0.2.21                bit64_4.0.5                 glue_1.6.2                 
 [16] parallel_4.2.2              haven_2.5.1                 tidyselect_1.2.0            SummarizedExperiment_1.28.0 XML_3.99-0.12              
 [21] GenomicAlignments_1.34.0    xtable_1.8-4                magrittr_2.0.3              evaluate_0.18               cli_3.4.1                  
 [26] zlibbioc_1.44.0             rstudioapi_0.14             xfun_0.34                   askpass_1.1                 cluster_2.1.4              
 [31] Biostrings_2.66.0           withr_2.5.0                 bitops_1.0-7                plyr_1.8.7                  cellranger_1.1.0           
 [36] coda_0.19-4                 pillar_1.8.1                fs_1.5.2                    scatterplot3d_0.3-42        vctrs_0.5.0                
 [41] ellipsis_0.3.2              generics_0.1.3              tools_4.2.2                 munsell_0.5.0               tweenr_2.0.2               
 [46] emmeans_1.8.2               DelayedArray_0.23.2         fastmap_1.1.0               compiler_4.2.2              abind_1.4-5                
 [51] rtracklayer_1.58.0          GenomeInfoDbData_1.2.9      gridExtra_2.3               utf8_1.2.2                  jsonlite_1.8.3             
 [56] scales_1.2.1                carData_3.0-5               estimability_1.4.1          car_3.1-1                   R.utils_2.12.2             
 [61] rmarkdown_2.17              Biobase_2.58.0              numDeriv_2016.8-1.1         yaml_2.3.6                  htmltools_0.5.3            
 [66] BiocIO_1.8.0                quadprog_1.5-8              digest_0.6.30               assertthat_0.2.1            leaps_3.1                  
 [71] rappdirs_0.3.3              emdbook_1.3.12              data.table_1.14.4           R.oo_1.25.0                 splines_4.2.2              
 [76] labeling_0.4.2              googledrive_2.0.0           RCurl_1.98-1.9              broom_1.0.1                 hms_1.1.2                  
 [81] modelr_0.1.9                colorspace_2.0-3            Rcpp_1.0.9                  mclust_6.0.0                mvtnorm_1.1-3              
 [86] multcompView_0.1-8          fansi_1.0.3                 tzdb_0.3.0                  R6_2.5.1                    grid_4.2.2                 
 [91] lifecycle_1.0.3             curl_4.3.3                  snakecase_0.11.0            Matrix_1.5-1                qvalue_2.30.0              
 [96] fastseg_1.44.0              polyclip_1.10-4             timechange_0.1.1            rvest_1.0.3                 mgcv_1.8-41                
[101] openssl_2.0.4               flashClust_1.01-2           bdsmatrix_1.3-6             codetools_0.2-18            lubridate_1.9.0            
[106] gtools_3.9.3                dbplyr_2.2.1                R.methodsS3_1.8.2           gtable_0.3.1                DBI_1.1.3                  
[111] httr_1.4.4                  KernSmooth_2.23-20          stringi_1.7.8               vroom_1.6.0                 farver_2.1.1               
[116] xml2_1.3.3                  bbmle_1.0.25                restfulr_0.0.15             bit_4.0.4                   MatrixGenerics_1.10.0      
[121] pkgconfig_2.0.3             gargle_1.2.1                knitr_1.40                 

@laurahspencer
Copy link
Author

laurahspencer commented May 16, 2023

When I run getSampleID() on the united object meth_unite_10x it provides the correct sample IDs, but the actual column names are incorrect:

colnames(meth_unite_10x)

[1] "chr"        "start"      "end"        "strand"     "coverage1"  "numCs1"     "numTs1"     "coverage2"  "numCs2"     "numTs2"     "coverage3" 
[12] "numCs3"     "numTs3"     "coverage4"  "numCs4"     "numTs4"     "coverage5"  "numCs5"     "numTs5"     "coverage6"  "numCs6"     "numTs6"    
[23] "coverage7"  "numCs7"     "numTs7"     "coverage8"  "numCs8"     "numTs8"     "coverage9"  "numCs9"     "numTs9"     "coverage10" "numCs10"   
[34] "numTs10"    "coverage11" "numCs11"    "numTs11"    "coverage12" "numCs12"    "numTs12"    "coverage13" "numCs13"    "numTs13"  

@laurahspencer
Copy link
Author

And similarly,

colnames(unite(meth_10x, destrand = TRUE))

destranding...
uniting...
 [1] "chr"        "start"      "end"        "strand"     "coverage1"  "numCs1"     "numTs1"     "coverage2"  "numCs2"     "numTs2"     "coverage3" 
[12] "numCs3"     "numTs3"     "coverage4"  "numCs4"     "numTs4"     "coverage5"  "numCs5"     "numTs5"     "coverage6"  "numCs6"     "numTs6"    
[23] "coverage7"  "numCs7"     "numTs7"     "coverage8"  "numCs8"     "numTs8"     "coverage9"  "numCs9"     "numTs9"     "coverage10" "numCs10"   
[34] "numTs10"    "coverage11" "numCs11"    "numTs11"    "coverage12" "numCs12"    "numTs12"    "coverage13" "numCs13"    "numTs13"   

@alexg9010
Copy link
Collaborator

alexg9010 commented May 16, 2023 via email

@laurahspencer
Copy link
Author

That makes sense- so when I run getData(meth_unite_10x) the resulting dataframe's columns containing the methylation data (coverage#, numCs#, numTs#) can safely be renamed using the vector of sampleIDs that I extract using getSampleID(meth_unite_10x)? Or is there a methylKit function that does that for me that I've overlooked?

@alexg9010
Copy link
Collaborator

alexg9010 commented May 18, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants