-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New 'useNames = TRUE' implementation distinguishes matrix without dimnames attribute and 'dimnames = list(NULL, NULL)' #234
Comments
Thanks again. First observationThere's nothing that changed in the behavior of library(matrixStats)
stopifnot(packageVersion("matrixStats") == "0.63.0")
X <- matrix(1:5, ncol = 1L)
Xs <- list(
none = X,
null = structure(X, dimnames = list(NULL, NULL))
)
ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
str(ys) gives List of 2
$ none: int [1:5, 1] 1 3 6 10 15
$ null: int [1:5, 1] 1 3 6 10 15
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : NULL What changed is that we're moving from Second observationBase R treats empty dimnames the same as no dimnames, e.g. > ys <- lapply(Xs, FUN = colSums)
> str(ys)
List of 2
$ none: num 15
$ null: num 15
ys <- lapply(Xs, FUN = colMeans)
> str(ys)
List of 2
$ none: num 3
$ null: num 3
ys <- lapply(Xs, FUN = apply, MARGIN = 2L, sum)
> str(ys)
List of 2
$ none: int 15
$ null: int 15
ys <- lapply(Xs, FUN = apply, MARGIN = 2L, mean)
> str(ys)
List of 2
$ none: num 3
$ null: num 3 Thus, this is how matrixStats should also do it. Third observationmatrixStats behaves like base R, except for the cumulative functions, e.g. all of the following returns unnamed results: ys <- lapply(Xs, FUN = colMeans2, useNames = TRUE)
ys <- lapply(Xs, FUN = colMedians, useNames = TRUE)
ys <- lapply(Xs, FUN = colMads, useNames = TRUE)
ys <- lapply(Xs, FUN = colVars, useNames = TRUE)
ys <- lapply(Xs, FUN = colIQRs, useNames = TRUE)
ys <- lapply(Xs, FUN = colLogSumExps, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMeans, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMedians, useNames = TRUE) However, the following, cumulative functions treats the two cases differently: ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
ys <- lapply(Xs, FUN = colCumprods, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummins, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummaxs, useNames = TRUE) Fourth observationBTW, setting dimnames to > str(X)
int [1:5, 1] 1 2 3 4 5
> rownames(X) <- character(0)
> str(X)
int [1:5, 1] 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL
> rownames(X) <- NULL
> str(X)
int [1:5, 1] 1 2 3 4 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL In other words, we don't have to treat Action
|
CorrectionThere are more functions: ys <- lapply(Xs, FUN = colRanges, useNames = TRUE)
ys <- lapply(Xs, FUN = colRanks, useNames = TRUE) also preserves NULL dimnames. I think what they have in common is that they call an internal C function $ grep -F "setDimnames(" src/*.c
src/colRanges.c: setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/colRanges.c: setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/naming.c:void setDimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows,
src/rowCummaxs.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummaxs.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanges.c: setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanges.c: setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c: setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE); which means we can probably fix this in a single location. I'll have a look. |
@yaccos , could you please have a lock at Lines 61 to 117 in 7a45f85
and see if the problem reported here, can be fixed there? |
Yes, you can. If you add the special case test: /* In case both elements of the dimname is NULL, we disregard the name
attribute completely in order to conform to base R behavior */
if (VECTOR_ELT(dimnames, 0) == R_NilValue && VECTOR_ELT(dimnames, 1) == R_NilValue) {
return;
} at the top of the function, you should get the expected behavior. Besides, it annoyes me that many of the calls to matrixStats/src/rowRanksWithTies.c Lines 68 to 401 in 7a45f85
|
…in line with how base R does it [#234]
Thanks for the prompt fix @yaccos ! I've added it to matrixStats 0.63.0-9010.
If you've got time, feel free to do a PR for this. And absolutely no worries; it's really hard to be on top of everything at all time, and can take years and a fresh mind to sometimes spot even the most obvious things. |
Thanks for the quick fix and sorry for the poor description. There is one more function ( mat_empty_names <- matrix(1, nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
str(matrixStats::colCumsums(mat_empty_names))
#> num [1:5, 1] 1 2 3 4 5
str(matrixStats::colDiffs(mat_empty_names))
#> num [1:4, 1] 0 0 0 0
#> - attr(*, "dimnames")=List of 2
#> ..$ : NULL
#> ..$ : NULL Created on 2023-06-01 with reprex v2.0.2 |
Line 128 in fa94b84
Line 193 in fa94b84
|
…ndled consistently and in line with how base R does it [#234]
Thanks both. Fixed in matrixStats 0.63.0-9011. |
Continuing the discussion from #232.
With the new implementation of
useNames = TRUE
,matrixStats::colCumsums
and other functions that return a matrix, treat a matrix without a dimnames attribute anddimnames = list(NULL, NULL)
differently (v0.63.0-9008).Created on 2023-05-30 with reprex v2.0.2
Previously, they were treated the same (v0.63.0):
Created on 2023-05-30 with reprex v2.0.2
From the sparse matrix point-of-view, the old behavior was preferable because dgCMatrix treats matrices without a dimnames attribute and
dimnames = list(NULL, NULL)
the same.The text was updated successfully, but these errors were encountered: