-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exceed 2^31-1 bytes in methylDB object during conversion to non-DB variant #141
Comments
any idea how to fix it?
…On Mon, Jan 14, 2019 at 12:29 PM Alexander Gosdschan < ***@***.***> wrote:
People are running into an issue when trying to convert a big methlDB
object into the in-memory variant.
This Code caused the explicit error:
> objDB=as(methDB,"methylBase")
Error in paste(tabixRes[[1]], collapse = "\n") :
result would exceed 2^31-1 bytes
I was able to pinpoint the problematic line (
https://github.com/al2na/methylKit/blob/master/R/tabix.functions.R#L317)
which is called when we are using a select or subsetting ([) call.
select/`[` --> headTabix --> getTabixByChunk --> tabix2dt --> fread(paste(tabixRes[[1]],collapse="\n"),"\n" )
, where tabixRes is a list, with one element per region. Each element of
the list is a character vector representing records in the region.
The error occurs because size of the pasted string exceeds the length
limit of strings (
https://stackoverflow.com/questions/53120436/error-in-pastev-collapse-n-result-would-exceed-231-1-bytes
) .
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#141>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAm9EQkzYS1lzL8Ke0_ZsNBXHR89csMWks5vDGn_gaJpZM4Z-EXC>
.
|
My idea is to replace |
maybe we can look at what these guys are doing:
https://www.rdocumentation.org/packages/seqminer/versions/6.7/topics/tabix.read.table
…On Tue, Jan 15, 2019 at 10:18 AM Alexander Gosdschan < ***@***.***> wrote:
My idea is to replace headTabix/ getTabixByChunk with applyTabixByChunk
in all functions that fetch the whole methylDB into memory, such that this
will be done by chunks (of 1e6 lines). This way we should not reach the
string length anymore.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#141 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAm9EQsDai_mTrIcC_rBK_cKPeanHYW9ks5vDZz-gaJpZM4Z-EXC>
.
|
you can also look at the example in rsamtools, I don't know how fast it
would be, they use map instead of reading everything at once.
res <- scanTabix(tbx)
dff <- Map(function(elt) { read.csv(textConnection(elt), sep="\t",
header=FALSE) }, res)
this would be the easiest to change tabix2dt, tabix2df and tabix2gr functions with map. This way we will not get into this problem. I don't know if it would be faster than depending on seqminer which is C/C++ based for reading the tabix to data frame. But it is the easiest option
…On Tue, Jan 15, 2019 at 3:06 PM Altuna Akalin ***@***.***> wrote:
maybe we can look at what these guys are doing:
https://www.rdocumentation.org/packages/seqminer/versions/6.7/topics/tabix.read.table
On Tue, Jan 15, 2019 at 10:18 AM Alexander Gosdschan <
***@***.***> wrote:
> My idea is to replace headTabix/ getTabixByChunk with applyTabixByChunk
> in all functions that fetch the whole methylDB into memory, such that this
> will be done by chunks (of 1e6 lines). This way we should not reach the
> string length anymore.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#141 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAm9EQsDai_mTrIcC_rBK_cKPeanHYW9ks5vDZz-gaJpZM4Z-EXC>
> .
>
|
@alexg9010! @katwre uses data.table::fread to read the whole tabix file into memory, maybe it it is something to replace scanBam when the user wants to read the whole file. |
yes, I sometimes use data.table::fread to read even really big tabix files (1.7G) and it's quite fast, e.g.:
|
yes, I would like to stay with fread too, and we are already using it extensively. |
should be fixed in aa6a0d1 |
People are running into an issue when trying to convert a big methlDB object into the in-memory variant.
This Code caused the explicit error:
I was able to pinpoint the problematic line (https://github.com/al2na/methylKit/blob/master/R/tabix.functions.R#L317) which is called when we are using a select or subsetting (
[
) call., where tabixRes is a list, with one element per region. Each element of the list is a character vector representing records in the region.
The error occurs because size of the pasted string exceeds the length limit of strings (
https://stackoverflow.com/questions/53120436/error-in-pastev-collapse-n-result-would-exceed-231-1-bytes ) .
The text was updated successfully, but these errors were encountered: