Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'useNames = TRUE' is the new default in matrixStats v1.0.0 #31

Closed
const-ae opened this issue Jun 2, 2023 · 11 comments
Closed

'useNames = TRUE' is the new default in matrixStats v1.0.0 #31

const-ae opened this issue Jun 2, 2023 · 11 comments

Comments

@const-ae
Copy link
Collaborator

const-ae commented Jun 2, 2023

matrixStats v1.0.0 was just published on CRAN (HenrikBengtsson/matrixStats#232). The main change is that the old default useNames = NA is now deprecated and the new default useNames = TRUE.

I wonder if we should patch the current release version to use useNames = FALSE because that avoids deprecation warnings and won't change the behavior all of a sudden. And then with the next Bioconductor release change the default to useNames = TRUE

@hpages
Copy link
Contributor

hpages commented Jun 2, 2023

the new default useNames = TRUE

The new behaviour is more inline with colSums() and colMeans() propagating the names in base R. Never understood the purpose of the useNames=NA thing...

I wonder if we should patch the current release version to use useNames = FALSE ...

I guess that would work.

I'd be more confortable though if we didn't touch useNames=NA in the generics and only made the following simple change in the methods:

.matrixStats_rowVars <- function(x, rows=NULL, cols=NULL, na.rm=FALSE, center=NULL, dim.=dim(x), ...,
                                    useNames=NA)  # <-- keep this
{
  matrixStats::rowVars(x, rows=rows, cols=cols, na.rm=na.rm, center=center, dim.=dim., ...,
                          useNames=isTRUE(useNames))  # <-- proposed change
}

That way the argument list of the generics doesn't change and remains in sync with the argument list of all the methods in sparseMatrixStats, SparseArray, DelayedArray, and DelayedMatrixStats.

H.

@const-ae
Copy link
Collaborator Author

const-ae commented Jun 2, 2023

I'd be more confortable though if we didn't touch useNames=NA in the generics and only made the following simple change in the methods:
matrixStats::rowVars(x, rows=rows, cols=cols, na.rm=na.rm, center=center, dim.=dim., ..., useNames=isTRUE(useNames)) # <-- proposed change

I like that idea.


I have already started to implement the new default in sparseMatrixStats (const-ae/sparseMatrixStats@5441917), but then I would only merge this in the devel branch.

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jun 2, 2023

Never understood the purpose of the useNames=NA thing...

In the beginning, matrixStats didn't control for names attributes, so we got whatever the implementation happened to output (yet there were packages depending on it). In 2021, we ran a GSoC project for adding support for useNames = FALSE and useNames = TRUE. We decided to keep the old, undefined behavior via useNames = NA until we settled on what the new default should be, which has now been done. This will allow us to defunct useNames = NA in a near future.

There is no need for you to keep supporting useNames = NA. Actually, the sooner you can drop support for that, the better; it'll probably help speeding up its removal from matrixStats too.

@hpages
Copy link
Contributor

hpages commented Jun 2, 2023

I see. That means that in order to prevent disruption in release, we can't just blindly replace useNames=useNames with useNames=isTRUE(useNames) everywhere like I was proposing we do for matrixStats::rowVars above. We'll need to figure out what the method was doing for useNames=NA, and replace with useNames=isTRUE(useNames) if it was not propagating the names, and with useNames=!isFALSE(useNames) if it was propagating the names.

@hpages
Copy link
Contributor

hpages commented Jun 2, 2023

@const-ae I will give this a try unless you already started to work on it?

@const-ae
Copy link
Collaborator Author

const-ae commented Jun 2, 2023

Go ahead, I am currently just barely staying afloat as I am finishing up my PhD thesis 😅

@hpages
Copy link
Contributor

hpages commented Jun 2, 2023

Exciting times! I suppose you don't need this kind of distraction to keep you entertained 😉

@hpages
Copy link
Contributor

hpages commented Jun 3, 2023

ok so it looks like most functions did NOT propagate the names in matrixStats 0.63.0

Keeping track of this here for reference:

### With matrixStats 0.63.0 (useNames=NA)

library(matrixStats)
stopifnot(packageVersion("matrixStats") == '0.63.0')

# 36 row* + 36 col* generics in MatrixGenerics.
# All of them have the 'useNames' arg, except rowAvgsPerColSet() and
# colAvgsPerRowSet().
# Also MatrixGenerics does not define generics for the *AnyMissings() functions.
all_matrixStats_symbols <- ls('package:matrixStats')
rowfuns <- setdiff(grep('^row', all_matrixStats_symbols, value=TRUE),
                   c("rowAvgsPerColSet", "rowAnyMissings"))
colfuns <- setdiff(grep('^col', all_matrixStats_symbols, value=TRUE),
                   c("colAvgsPerRowSet", "colAnyMissings"))

### Figuring out whether they propagate the names or not:

m <- matrix(11:22, ncol=3, dimnames=list(letters[1:4], LETTERS[1:3]))

rowfun_propagates_the_names <- function(fun, useNames=NA)
{
    FUN <- match.fun(fun)
    if (fun == "rowCollapse") {
        idxs <- sample(ncol(m), nrow(m), replace=TRUE)
        res <- FUN(m, idxs=idxs, useNames=useNames)
    } else if (fun == "rowOrderStats") {
        res <- FUN(m, which=1L, useNames=useNames)
    } else {
        res <- FUN(m, useNames=useNames)
    }
    names_on_res <- if (is.matrix(res)) rownames(res) else names(res)
    identical(names_on_res, rownames(m))
}

colfun_propagates_the_names <- function(fun, useNames=NA)
{   
    FUN <- match.fun(fun)
    if (fun == "colCollapse") { 
        idxs <- sample(nrow(m), ncol(m), replace=TRUE)
        res <- FUN(m, idxs=idxs, useNames=useNames)
    } else if (fun == "colOrderStats") {
        res <- FUN(m, which=1L, useNames=useNames)
    } else {
        res <- FUN(m, useNames=useNames)
    }
    if (fun %in% c("colQuantiles", "colRanges", "colRanks", "colTabulates")) {
        names_on_res <- rownames(res)
    } else {
        names_on_res <- if (is.matrix(res)) colnames(res) else names(res)
    }
    identical(names_on_res, colnames(m))
}

yes_or_no <- sapply(rowfuns, rowfun_propagates_the_names)
table(yes_or_no)
# FALSE  TRUE 
#    25    10 

### row* functions that propagate the names in matrixStats 0.63.0:
rowfuns[yes_or_no]
# [1] "rowIQRDiffs"      "rowLogSumExps"    "rowMadDiffs"      "rowQuantiles"    
# [5] "rowSdDiffs"       "rowVarDiffs"      "rowWeightedMads"  "rowWeightedMeans"
# [9] "rowWeightedSds"   "rowWeightedVars" 

yes_or_no <- sapply(colfuns, colfun_propagates_the_names)
table(yes_or_no)
# FALSE  TRUE 
#    25    10

### col* functions that propagate the names in matrixStats 0.63.0:
colfuns[yes_or_no]
# [1] "colIQRDiffs"      "colLogSumExps"    "colMadDiffs"      "colQuantiles"    
# [5] "colSdDiffs"       "colVarDiffs"      "colWeightedMads"  "colWeightedMeans"
# [9] "colWeightedSds"   "colWeightedVars" 

hpages added a commit that referenced this issue Jun 5, 2023
Modify calls to matrixStats functions to mitigate non-backward comptible
changes in matrixStats 1.0.0. More precisely:
- 'useNames=NA' became deprecated in matrixStats 1.0.0 so we make sure to
  never pass 'useNames=NA' when calling a matrixStats function;
- furthermore, we make sure to pass a 'useNames' value that preserves the
  pre-matrixStats 1.0.0 behavior in order to avoid disruption.
See #31 for a
discussion of these changes.
@hpages
Copy link
Contributor

hpages commented Jun 5, 2023

@PeteHaitch @const-ae If you guys want to take a look at this before I merge: #32 This is a PR against the RELEASE_3_17 branch.

I'll start working on a PR against the devel branch today. If you're ok with this, my plan is to just switch the useNames default used in the argument lists of the generics from NA to TRUE, like in matrixStats. These changes will actually be simpler than the changes in release. However, unlike the latter, they won't be backward compatible.

hpages added a commit that referenced this issue Jun 5, 2023
All generic functions and methods defined in MatrixGenerics now
set the 'useNames' argument to TRUE instead of NA by default.
See #31 for
a discussion of this change.
@hpages
Copy link
Contributor

hpages commented Jun 5, 2023

And here is the proposed changes for the devel version: #33

hpages added a commit that referenced this issue Jun 8, 2023
Modify calls to matrixStats functions to mitigate non-backward comptible
changes in matrixStats 1.0.0. More precisely:
- 'useNames=NA' became deprecated in matrixStats 1.0.0 so we make sure to
  never pass 'useNames=NA' when calling a matrixStats function;
- furthermore, we make sure to pass a 'useNames' value that preserves the
  pre-matrixStats 1.0.0 behavior in order to avoid disruption.
See #31 for a
discussion of these changes.
PeteHaitch pushed a commit that referenced this issue Jul 24, 2023
All generic functions and methods defined in MatrixGenerics now
set the 'useNames' argument to TRUE instead of NA by default.
See #31 for
a discussion of this change.
@PeteHaitch
Copy link
Collaborator

MatrixGenerics (along with DelayedMatrixStats and sparseMatrixStats) has now been updated in both the release (3.17) and devel (3.18) branches of Bioconductor to handle this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants