-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in robustSummary()
#349
Conversation
When samplenames of input are not alphabetically ordered, the order of the `robustSummary()` output is incorrect. Model.matrix orders the ellements in a categorical variable alphabetically. So the coefficients of rlm are also ordered alphabetically. This order should be again reversed in the output into the same order as the samplesnames of input. The proposed code change fixes this.
Just a note: you could avoid this alphabetical re-ordering from the beginning by defining the factors providing also the levels: ## Instead of
sample <- rep(colnames(e), each = nrow(e))[p]
## Use
sample <- factor(rep(colnames(e), each = nrow(e))[p], levels = colnames(e)) there should be no need to sort them afterwords. |
Yes, you are offcourse right. That would be a more elegant way to handle the sample ordering. ## Instead of
sample <- factor(rep(colnames(e), each = nrow(e))[p], levels = colnames(e))
## Use
sample <- factor(rep(colnames(e), each = nrow(e))[p], levels = colnames(e)[p]) As a side, note that my suggested patch also handels missing values in the output (samples with no intensity = NA) while reducing the involved lines of code from 4 to 2 :) |
coef = fit$coefficients[sampleid] | ||
## Sort the sample coefficients in the same way as the samplenames of expression matrix | ||
## Puts NA for the samples without any expression value | ||
coef[paste0('sample',colnames(e))] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you don't replace line 318 fit <- MASS::rlm(X, expression, ...)
with
MASS::rlm(X, expression)$coefficients[paste0("sample", colnames(e))]
(maybe covered by unname
) and remove all the following lines?
@sgibb Yes this should indeed provide the wanted results in the least amount of lines of codes. @lgatto Not sure how to proceed from here. Can this still fixed somehow in stable bioconductor? greets |
Yes, that is possible and recommended. I will fix the bug in the devel and release. I hope to be able to review the PR tonight. |
Sorry for the delay - looking into this now. @adder - in general, in such cases I would suggest to also provide a unit test that identifies the bug and verifies that is is fixed. |
Here's an example unit test: test_that("Robust summary and sample names order (bug PR# 349)", {
## This identified the bug
data(msnset)
msnset2 <- msnset <- log(filterNA(msnset), 2)
## Expected results
res0 <- combineFeatures(msnset,
fcol = "ProteinAccession",
fun = "robust")
## Identify the bug
sampleNames(msnset2)[1] <- "zzz"
res2 <- combineFeatures(msnset2,
fcol = "ProteinAccession",
fun = "robust")
## Re-set sample name
sampleNames(res2) <- sampleNames(res0)
expect_equal(exprs(res0), exprs(res2))
}) This identifies the bug >
> ## This identified the bug
> data(msnset)
> msnset2 <- msnset <- log(filterNA(msnset), 2)
> res0 <- combineFeatures(msnset,
+ fcol = "ProteinAccession",
+ fun = "robust")
> sampleNames(msnset2)[1] <- "zzz"
> res2 <- combineFeatures(msnset2,
+ fcol = "ProteinAccession",
+ fun = "robust")
> sampleNames(res2) <- sampleNames(res0)
> expect_equal(exprs(res0), exprs(res2))
Error: exprs(res0) not equal to exprs(res2).
32/160 mismatches (average diff: 0.397)
[1] 9.52 - 10.16 == -0.6410
[3] 11.09 - 11.30 == -0.2125
[5] 10.06 - 10.01 == 0.0468
[26] 10.89 - 11.10 == -0.2134
[29] 11.50 - 11.59 == -0.0900
[34] 10.04 - 9.87 == 0.1789
[39] 12.49 - 12.55 == -0.0655
[40] 11.46 - 10.82 == 0.6406
[41] 10.16 - 11.08 == -0.9193
...
> tail(exprs(res0))
iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
ECA4030 14.754384 15.02730 14.970478 14.700696
ECA4037 10.613557 10.49613 10.507692 10.604738
ECA4512 9.496451 10.04831 9.733812 9.883604
ECA4513 13.309896 13.35631 13.427600 13.462701
ECA4514 12.489348 12.55487 12.576657 12.621516
ENO 11.455669 10.81511 10.140333 9.225928
> tail(exprs(res2))
iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
ECA4030 14.754384 15.02730 14.970478 14.700696
ECA4037 10.613557 10.49613 10.507692 10.604738
ECA4512 9.496451 10.04831 9.733812 9.883604
ECA4513 13.309896 13.35631 13.427600 13.462701
ECA4514 12.554874 12.57666 12.621516 12.489348
ENO 10.815109 10.14033 9.225928 11.455669 Now with the code in this PR: > res2 <- combineFeatures(msnset2,
+ fcol = "ProteinAccession",
+ fun = "robust")
> sampleNames(res2) <- sampleNames(res0)
> expect_equal(exprs(res0), exprs(res2)) |
When samplenames of input are not alphabetically ordered, the order of the
robustSummary()
output is incorrect.Model.matrix orders the ellements in a categorical variable alphabetically.
So the coefficients of rlm are also ordered alphabetically.
This order should be again reversed in the output into the same order as the samplesnames of input.
The proposed code change fixes this.