-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read lipidblast-2022 data #114
Comments
Thanks for adding this issue. I'm currently working on a solution to import/process large json files in a chunk-wise manner. |
- Add parameter `n` to `compound_tbl_lipidblast()` to enable and support import and processing of the data in sets of lines to reduce memory demand (issue #114).
Hi @pratarora , I've added a new parameter Thus, the code below (with fl <-("MoNA-export-LipidBlast_2022.json")
cmps <- compound_tbl_lipidblast(fl, n = 100000, verbose = TRUE) For now, you would need to install this version of the package from github using |
Hi Johannes Thanks a lot for the quick fix. I tried it, but it works partially. The exactmass column is converted to character while reading.
|
oooooh, thanks for the report! I will check that. Should be numeric of course! |
I am working on it, I will post the code for it soon. |
- fix `compound_tbl_lipidblast()` to return field *exactmass* as numeric (issue #114).
should be fixed now. thanks for reporting! |
Hi Johannes, I modified the code. I tried to make it a bit parallel as well. It can now read all the fields for both the old lipidblast
|
Hi Johannes, |
Hi
Thanks for such a great package!
I am trying to make a compounddb object from lipidblast-2022 data (https://mona.fiehnlab.ucdavis.edu/rest/downloads/retrieve/d0899a6d-15b7-46ad-a836-f07313465fb8). But it keeps on failing, with an empty data frame as an output. I first thought it was memory issue, so I tried with bigger memory(256GB, it doesn't work).
I also tried to split the json to smaller chunks, in this case it can read the file but it gives an empty data frame.
I have checked the structure of the json file it seems to be fine as the original.
any idea what i might be missing?
Any place i could directly find the sqlite file?
my session info is :
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] jsonlite_1.8.8 mzR_2.36.0 Rcpp_1.0.12
[4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[7] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[10] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[13] tidyverse_2.0.0 CompoundDb_1.6.0 AnnotationFilter_1.26.0
[16] MetaboCoreUtils_1.10.0 MetaboAnnotation_1.6.1 Spectra_1.12.0
[19] ProtGenerics_1.34.0 BiocParallel_1.36.0 S4Vectors_0.40.2
[22] BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] MultiAssayExperiment_1.28.0 magrittr_2.0.3
[3] fs_1.6.4 zlibbioc_1.48.2
[5] vctrs_0.6.5 memoise_2.0.1
[7] RCurl_1.98-1.14 base64enc_0.1-3
[9] htmltools_0.5.8.1 S4Arrays_1.2.1
[11] AnnotationHub_3.10.0 curl_5.2.1
[13] SparseArray_1.2.4 htmlwidgets_1.6.4
[15] cachem_1.0.8 uuid_1.2-0
[17] igraph_2.0.3 mime_0.12
[19] lifecycle_1.0.4 pkgconfig_2.0.3
[21] Matrix_1.6-5 R6_2.5.1
[23] fastmap_1.1.1 GenomeInfoDbData_1.2.11
[25] MatrixGenerics_1.14.0 shiny_1.8.0
[27] clue_0.3-65 digest_0.6.35
[29] rsvg_2.6.0 colorspace_2.1-0
[31] AnnotationDbi_1.64.1 GenomicRanges_1.54.1
[33] RSQLite_2.3.6 filelock_1.0.3
[35] fansi_1.0.6 timechange_0.3.0
[37] httr_1.4.7 abind_1.4-5
[39] compiler_4.3.2 bit64_4.0.5
[41] withr_3.0.0 DBI_1.2.2
[43] MASS_7.3-60 ChemmineR_3.54.0
[45] rappdirs_0.3.3 DelayedArray_0.28.0
[47] rjson_0.2.21 tools_4.3.2
[49] interactiveDisplayBase_1.40.0 httpuv_1.6.14
[51] glue_1.7.0 QFeatures_1.12.0
[53] promises_1.2.1 grid_4.3.2
[55] pbdZMQ_0.3-11 cluster_2.1.4
[57] generics_0.1.3 gtable_0.3.5
[59] tzdb_0.4.0 hms_1.1.3
[61] xml2_1.3.6 utf8_1.2.4
[63] XVector_0.42.0 BiocVersion_3.18.1
[65] pillar_1.9.0 IRdisplay_1.1
[67] later_1.3.2 BiocFileCache_2.10.1
[69] lattice_0.22-5 bit_4.0.5
[71] tidyselect_1.2.1 Biostrings_2.70.3
[73] gridExtra_2.3 IRanges_2.36.0
[75] SummarizedExperiment_1.32.0 Biobase_2.62.0
[77] matrixStats_1.3.0 DT_0.32
[79] stringi_1.8.4 lazyeval_0.2.2
[81] yaml_2.3.8 evaluate_0.23
[83] codetools_0.2-19 MsCoreUtils_1.14.1
[85] BiocManager_1.30.23 cli_3.6.2
[87] IRkernel_1.3.2 xtable_1.8-4
[89] repr_1.1.7 munsell_0.5.1
[91] GenomeInfoDb_1.38.8 dbplyr_2.5.0
[93] png_0.1-8 parallel_4.3.2
[95] ellipsis_0.3.2 blob_1.2.4
[97] bitops_1.0-7 scales_1.3.0
[99] ncdf4_1.22 crayon_1.5.2
[101] rlang_1.1.3 KEGGREST_1.42.0
I split the json file in python so that i can avoid loading on memory. The code i used to split the json:
The text was updated successfully, but these errors were encountered: