Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exceed 2^31-1 bytes in methylDB object during conversion to non-DB variant #141

Closed
alexg9010 opened this issue Jan 14, 2019 · 8 comments
Closed

Comments

@alexg9010
Copy link
Collaborator

People are running into an issue when trying to convert a big methlDB object into the in-memory variant.
This Code caused the explicit error:

> objDB=as(methDB,"methylBase")

Error in paste(tabixRes[[1]], collapse = "\n") : 
  result would exceed 2^31-1 bytes

I was able to pinpoint the problematic line (https://github.com/al2na/methylKit/blob/master/R/tabix.functions.R#L317) which is called when we are using a select or subsetting ([) call.

select/`[` --> headTabix --> getTabixByChunk --> tabix2dt --> fread(paste(tabixRes[[1]],collapse="\n"),"\n" )

, where tabixRes is a list, with one element per region. Each element of the list is a character vector representing records in the region.
The error occurs because size of the pasted string exceeds the length limit of strings (
https://stackoverflow.com/questions/53120436/error-in-pastev-collapse-n-result-would-exceed-231-1-bytes ) .

@al2na
Copy link
Owner

al2na commented Jan 15, 2019 via email

@alexg9010
Copy link
Collaborator Author

My idea is to replace headTabix/ getTabixByChunk with applyTabixByChunk in all functions that fetch the whole methylDB into memory, such that this will be done by chunks (of 1e6 lines). This way we should not reach the string length anymore.

@al2na
Copy link
Owner

al2na commented Jan 15, 2019 via email

@al2na
Copy link
Owner

al2na commented Jan 15, 2019 via email

@al2na
Copy link
Owner

al2na commented Jan 16, 2019

@alexg9010! @katwre uses data.table::fread to read the whole tabix file into memory, maybe it it is something to replace scanBam when the user wants to read the whole file.

@katwre
Copy link
Contributor

katwre commented Jan 16, 2019

yes, I sometimes use data.table::fread to read even really big tabix files (1.7G) and it's quite fast, e.g.:

fread('zcat ~/methylBase_meth.deT.nodups.txt.bgz')

@alexg9010
Copy link
Collaborator Author

yes, I would like to stay with fread too, and we are already using it extensively.
tabix2dt and alikes where supposed to only read chunks and not the whole files, so I think we need to limit their use to those cases.

alexg9010 added a commit that referenced this issue Jan 25, 2019
alexg9010 added a commit that referenced this issue Feb 5, 2019
@alexg9010
Copy link
Collaborator Author

should be fixed in aa6a0d1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants