Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'init' not passed on to bpiterate in .reduceByYield_iterate #5

Open
teunbrand opened this issue Feb 9, 2021 · 0 comments
Open

'init' not passed on to bpiterate in .reduceByYield_iterate #5

teunbrand opened this issue Feb 9, 2021 · 0 comments

Comments

@teunbrand
Copy link

Hello everyone,

I was trying to use reduceByYield(..., init = DF, iterate = TRUE, parallel = TRUE), but it didn't seem to pass on the init argument to the downstream reduce function. Adapting an example from the documentation, I can show the problem as follows. Below is identical to the example:

suppressPackageStartupMessages({
    library(Rsamtools)
    library(GenomicFiles)
})

fl <- system.file(package="Rsamtools", "extdata", "ex1.bam")
bf <- BamFile(fl, yieldSize=500)

YIELD <- function(X, ...) {
    flag = scanBamFlag(isUnmappedQuery=FALSE)
    param = ScanBamParam(flag=flag, what="seq")
    scanBam(X, param=param, ...)[[1]][['seq']]
}
MAP <- function(value, ...) {
    requireNamespace("Biostrings", quietly=TRUE)
    Biostrings::alphabetFrequency(value, collapse=TRUE)
}
REDUCE <- `+`

Then, we could want to offset every number by +100 and try to do this through the init parameter.

init <- alphabetFrequency(DNAStringSet())
init <- setNames(rep(100, ncol(init)), colnames(init))
print(init)
#>   A   C   G   T   M   R   W   S   Y   K   V   H   D   B   N   -   +   . 
#> 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100

When we do this, the output is identical to the output we'd get if we had not set the init argument.

outcome <- reduceByYield(bf, YIELD, MAP, REDUCE, parallel=TRUE, init = init)

print(outcome)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 39904 23195 20477 31681     0     0     0     0     0     0     0     0     0 
#>     B     N     -     +     . 
#>     0    29     0     0     0

The following is the outcome I had expected, and is also the outcome when setting parallel = FALSE.

print(outcome + 100)
#>     A     C     G     T     M     R     W     S     Y     K     V     H     D 
#> 40004 23295 20577 31781   100   100   100   100   100   100   100   100   100 
#>     B     N     -     +     . 
#>   100   129   100   100   100

I think that the line mentioned below doesn't pass on the init parameter to bpiterate, but I don't know if this is intended or not.

result <- bpiterate(ITER, FUN=MAP, REDUCE=REDUCE, ...)

I had assumed this is a bug because I thought changing the parallel parameter shouldn't effect the outcome, but it does, so I thought to report it here.

Thanks for reading!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant