-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add cores arg to methSeg #120
base: master
Are you sure you want to change the base?
Conversation
- changed parameter name from `estimate.params.density` to `initialize.on.subset` - allow for values higher than 1, which directly yields number of samples - priorize Mclust argument `initialization` - check for to few samples, 9 seems to be the magic number - add tests
Remove duplicate check for sample size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Job! @katwre
expect_equal(a,b) | ||
}) | ||
|
||
test_that("check if methSeg with cores > 1 is the same as cores=1 (non-tabix file)" ,{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the same test as before.
tests/testthat/test-8-methSeg.r
Outdated
expect_equal(a,b) | ||
}) | ||
|
||
methylRawDB.obj <- methRead( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a tabix based object?
I think you have to set save.db=TRUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I am correct it's dbtype="tabix"
R/methSeg.R
Outdated
gr0 = gr0[,"meth"] | ||
}else if("meth.diff" %in% names(mcols(gr0))){ | ||
gr0 = gr0[,"meth.diff"] | ||
}else if (class(obj) != "GRanges"){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary since you are already checking above for the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right!
R/methSeg.R
Outdated
# match argument names to fastseg arguments | ||
args.fastseg=dots[names(dots) %in% names(formals(fastseg)[-1] ) ] | ||
initialize.on.subset=1, | ||
cores=1, ...){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cores should be mc.cores to keep argument names consistent with other methylKit functions
seg.res <- do.call("fastseg", args.fastseg) | ||
|
||
# stop if segmentation produced only one range | ||
if(length(seg.res)==1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this if clause be called after fastseg and before calling mclust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think I know what you mean, this if clause is here in the original code and you return seg.res if there is only 1 segment
Line 127 in 0d007f2
return(seg.res) |
R/methSeg.R
Outdated
# methylKit naming convention | ||
df2getcolnames = as.data.frame(gr0[1]) | ||
df2getcolnames$width = NULL | ||
methylKit:::.setMethylDBNames(df2getcolnames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this line ?
methylKit:::.setMethylDBNames(df2getcolnames)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used to predict the column names of the given data.frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do not need the methylKit:::
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! my fault
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got confused, I thought we are resetting names on the tabix files but this doesn't do that, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually the data.frame that we get from the tabix file does not have column names and with that function we retrieve them.
R/methSeg.R
Outdated
## Tabix files | ||
} else if(class(obj)=="methylDiffDB" | class(obj)=="methylRawDB"){ | ||
|
||
.run.fastseg.tabix = function(gr0, ...){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a function which actually takes the class of the object as argument:
.run.fastseg.tabix = function(gr0, class ,...) {
### and then you can directly set the colnames
.setMethylDBNames(df2getcolnames,class)
}
I improved the code according to your suggestions besides @al2na suggestion about the if clause #120 (diff) |
could we also comment the code wherever possible, please think about people who will maintain this in the future or your future selves. Certain things that are trivial are not going to be trivial after 3 months of not looking at the code. |
@al2na I added more comments, hope it's better now |
there is something wrong when |
I checked if with methylRawDB and multiple cores is faster than using methylRaw object on example of data with ~350K Cs (two chromosomes) and methylRaw is faster. I don't know why. Maybe it depends on the size of the input, I will check that
|
please check datasets that have multiple chromosomes lets say at least 5
chromosomes, compare also memory consumption.
…On Wed, Jul 4, 2018 at 10:55 AM katwre ***@***.***> wrote:
I checked if with methylRawDB and multiple cores is faster than using
methylRaw object on example of data with ~350K Cs and methylRaw is faster.
I don't know why. Maybe it depends on the size of the input, I will check
that
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#120 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAm9ERdF218-iOSwujVjdGoOCvaGaiKcks5uDIL8gaJpZM4U7trW>
.
|
I checked it using 5 chromosomes and it's not better.
thanks @alexg9010 for the suggestion to use profvis, but it didnt work for me, I got an error that I didn't what to do with. I used profmem instead and it showed that memory usage when there are parallel cores is smaller than without using multiple cores.
|
@al2na @alexg9010 I didn't manage to show that this method is faster. Should we close this pull request? |
Hi guys,
I added parallel methSeg() with methylDB objects and non-tabix objects.
I separated code for running fastseg and mclust into two auxiliary function .run.fastseg and .run.mclust. Parallelization is in the step of fastseg and each fastseg run is concatenated. For that, I added to
return.type
inapplyTbxByChr
"GRanges" to concatenate GRanges.I wrote some tests, but I think I maybe should add more of them
Let me know what do you think about it
Kasia