Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New 'useNames = TRUE' implementation distinguishes matrix without dimnames attribute and 'dimnames = list(NULL, NULL)' #234

Closed
const-ae opened this issue May 30, 2023 · 8 comments
Labels
Milestone

Comments

@const-ae
Copy link
Contributor

const-ae commented May 30, 2023

Continuing the discussion from #232.

With the new implementation of useNames = TRUE, matrixStats::colCumsums and other functions that return a matrix, treat a matrix without a dimnames attribute and dimnames = list(NULL, NULL) differently (v0.63.0-9008).

mat_names <- matrix(nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
mat_no_names <- matrix(nrow = 5, ncol = 1)

# Matrices with NULL names and no names differ in R
str(mat_names)
#>  logi [1:5, 1] NA NA NA NA NA
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : NULL
str(mat_no_names)
#>  logi [1:5, 1] NA NA NA NA NA

str(matrixStats::colCumsums(mat_names))  
#>  int [1:5, 1] NA NA NA NA NA
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : NULL
str(matrixStats::colCumsums(mat_no_names))
#>  int [1:5, 1] NA NA NA NA NA

Created on 2023-05-30 with reprex v2.0.2

Previously, they were treated the same (v0.63.0):

mat_names <- matrix(nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
mat_no_names <- matrix(nrow = 5, ncol = 1)

# Matrices with NULL names and no names differ in R
str(mat_names)
#>  logi [1:5, 1] NA NA NA NA NA
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : NULL
str(mat_no_names)
#>  logi [1:5, 1] NA NA NA NA NA

str(matrixStats::colCumsums(mat_names))  
#>  int [1:5, 1] NA NA NA NA NA
str(matrixStats::colCumsums(mat_no_names))
#>  int [1:5, 1] NA NA NA NA NA

Created on 2023-05-30 with reprex v2.0.2


From the sparse matrix point-of-view, the old behavior was preferable because dgCMatrix treats matrices without a dimnames attribute and dimnames = list(NULL, NULL) the same.

@HenrikBengtsson
Copy link
Owner

Thanks again.

First observation

There's nothing that changed in the behavior of useNames = TRUE since matrixStats 0.63.0;

library(matrixStats)
stopifnot(packageVersion("matrixStats") == "0.63.0")
X <- matrix(1:5, ncol = 1L)
Xs <- list(
  none  = X,
  null  = structure(X, dimnames = list(NULL, NULL))
)

ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
str(ys)

gives

List of 2
 $ none: int [1:5, 1] 1 3 6 10 15
 $ null: int [1:5, 1] 1 3 6 10 15
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL

What changed is that we're moving from useNames = NA to useNames = TRUE, by default (#232).

Second observation

Base R treats empty dimnames the same as no dimnames, e.g.

> ys <- lapply(Xs, FUN = colSums)
> str(ys)
List of 2
 $ none: num 15
 $ null: num 15

ys <- lapply(Xs, FUN = colMeans)
> str(ys)
List of 2
 $ none: num 3
 $ null: num 3

ys <- lapply(Xs, FUN = apply, MARGIN = 2L, sum)
> str(ys)
List of 2
 $ none: int 15
 $ null: int 15

ys <- lapply(Xs, FUN = apply, MARGIN = 2L, mean)
> str(ys)
List of 2
 $ none: num 3
 $ null: num 3

Thus, this is how matrixStats should also do it.

Third observation

matrixStats behaves like base R, except for the cumulative functions, e.g. all of the following returns unnamed results:

ys <- lapply(Xs, FUN = colMeans2, useNames = TRUE)
ys <- lapply(Xs, FUN = colMedians, useNames = TRUE)
ys <- lapply(Xs, FUN = colMads, useNames = TRUE)
ys <- lapply(Xs, FUN = colVars, useNames = TRUE)
ys <- lapply(Xs, FUN = colIQRs, useNames = TRUE)
ys <- lapply(Xs, FUN = colLogSumExps, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMeans, useNames = TRUE)
ys <- lapply(Xs, FUN = colWeightedMedians, useNames = TRUE)

However, the following, cumulative functions treats the two cases differently:

ys <- lapply(Xs, FUN = colCumsums, useNames = TRUE)
ys <- lapply(Xs, FUN = colCumprods, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummins, useNames = TRUE)
ys <- lapply(Xs, FUN = colCummaxs, useNames = TRUE)

Fourth observation

BTW, setting dimnames to character(0L) is the same as setting them to NULL in R, e.g.

> str(X)
 int [1:5, 1] 1 2 3 4 5
> rownames(X) <- character(0)
> str(X)
 int [1:5, 1] 1 2 3 4 5
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : NULL
> rownames(X) <- NULL
> str(X)
 int [1:5, 1] 1 2 3 4 5
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : NULL

In other words, we don't have to treat character(0) specially; it's taken care of by R itself.

Action

  • Fix all cumulative functions to treat NULL dimnames the same as non-existing dimnames.

@HenrikBengtsson
Copy link
Owner

Correction

There are more functions:

ys <- lapply(Xs, FUN = colRanges, useNames = TRUE)
ys <- lapply(Xs, FUN = colRanks, useNames = TRUE)

also preserves NULL dimnames.

I think what they have in common is that they call an internal C function setDimnames();

$ grep -F "setDimnames(" src/*.c
src/colRanges.c:            setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/colRanges.c:            setDimnames(ans, dimnames, ncols, ccols, 0, crows, TRUE);
src/naming.c:void setDimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows,
src/rowCummaxs.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummaxs.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCummins.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumprods.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowCumsums.c:        setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanges.c:            setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanges.c:            setDimnames(ans, dimnames, nrows, crows, 0, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
src/rowRanksWithTies.c:              setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);

which means we can probably fix this in a single location. I'll have a look.

@HenrikBengtsson
Copy link
Owner

@yaccos , could you please have a lock at

matrixStats/src/naming.c

Lines 61 to 117 in 7a45f85

void setDimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows,
R_xlen_t *crows, R_xlen_t ncols, R_xlen_t *ccols, Rboolean reverseDimnames) {
if (crows == NULL && ccols == NULL && nrows > 0 && ncols > 0){
dimnamesgets(mat, dimnames);
return;
}
SEXP rownames = VECTOR_ELT(dimnames, reverseDimnames ? 1 : 0);
SEXP colnames = VECTOR_ELT(dimnames, reverseDimnames ? 0 : 1);
SEXP ansDimnames = PROTECT(allocVector(VECSXP, 2));
if (nrows == 0 || rownames == R_NilValue) {
SET_VECTOR_ELT(ansDimnames, 0, R_NilValue);
} else if (crows == NULL) {
SET_VECTOR_ELT(ansDimnames, 0, rownames);
} else {
SEXP ansRownames = PROTECT(allocVector(STRSXP, nrows));
R_xlen_t i, thisIdx;
for (i = 0; i < nrows; i++) {
thisIdx = crows[i];
if (thisIdx == NA_R_XLEN_T) {
SET_STRING_ELT(ansRownames, i, NA_STRING);
}
else {
SEXP eltElement = STRING_ELT(rownames, thisIdx);
SET_STRING_ELT(ansRownames, i, eltElement);
}
}
SET_VECTOR_ELT(ansDimnames, 0, ansRownames);
UNPROTECT(1);
}
if (ncols == 0 || colnames == R_NilValue) {
SET_VECTOR_ELT(ansDimnames, 1, R_NilValue);
} else if (ccols == NULL) {
SET_VECTOR_ELT(ansDimnames, 1, colnames);
} else {
if (colnames != R_NilValue) {
SEXP ansColnames = PROTECT(allocVector(STRSXP, ncols));
R_xlen_t i, thisIdx;
for (i = 0; i < ncols; i++) {
thisIdx = ccols[i];
if (thisIdx == NA_R_XLEN_T) {
SET_STRING_ELT(ansColnames, i, NA_STRING);
}
else {
SEXP eltElement = STRING_ELT(colnames, thisIdx);
SET_STRING_ELT(ansColnames, i, eltElement);
}
}
SET_VECTOR_ELT(ansDimnames, 1, ansColnames);
UNPROTECT(1);
}
}
dimnamesgets(mat, ansDimnames);
UNPROTECT(1);
}

and see if the problem reported here, can be fixed there?

@yaccos
Copy link
Contributor

yaccos commented May 30, 2023

Yes, you can. If you add the special case test:

/* In case both elements of the dimname is NULL, we disregard the name 
attribute completely in order to conform to base R behavior */
if (VECTOR_ELT(dimnames, 0) == R_NilValue && VECTOR_ELT(dimnames, 1) == R_NilValue) {
    return;
}

at the top of the function, you should get the expected behavior. Besides, it annoyes me that many of the calls to setDimnames is duplicated even if there is no good reason for it. The worst offender here is rowRanksWithTies where the code for generating the answer is split across a multitude of cases, but the naming would be achieved the same way for all cases if it was put to the end of the function. I apologize for not being vigilent enough to discover this redundancy during the GSOC project.

if (isReal(x)) {
if (byrow) {
switch (tiesmethod) {
case 1:
PROTECT(ans = allocMatrix(REALSXP, nrows, ncols));
rowRanksWithTies_Average_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, REAL(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 2:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_First_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 3:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Last_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 4:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
GetRNGstate();
rowRanksWithTies_Random_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
PutRNGstate();
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 5:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Max_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 6:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Min_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 7:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Dense_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
} /* switch */
} else {
switch (tiesmethod) {
case 1:
PROTECT(ans = allocMatrix(REALSXP, nrows, ncols));
colRanksWithTies_Average_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, REAL(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 2:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_First_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 3:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Last_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 4:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
GetRNGstate();
colRanksWithTies_Random_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
PutRNGstate();
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 5:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Max_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 6:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Min_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 7:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Dense_dbl(REAL(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
} /* switch */
}
} else if (isInteger(x)) {
if (byrow) {
switch (tiesmethod) {
case 1:
PROTECT(ans = allocMatrix(REALSXP, nrows, ncols));
rowRanksWithTies_Average_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, REAL(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 2:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_First_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 3:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Last_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 4:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
GetRNGstate();
rowRanksWithTies_Random_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
PutRNGstate();
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 5:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Max_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 6:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Min_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 7:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
rowRanksWithTies_Dense_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
} /* switch */
} else {
switch (tiesmethod) {
case 1:
PROTECT(ans = allocMatrix(REALSXP, nrows, ncols));
colRanksWithTies_Average_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, REAL(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 2:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_First_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 3:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Last_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 4:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
GetRNGstate();
colRanksWithTies_Random_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
PutRNGstate();
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 5:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Max_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 6:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Min_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
case 7:
PROTECT(ans = allocMatrix(INTSXP, nrows, ncols));
colRanksWithTies_Dense_int(INTEGER(x), nrow, ncol, crows, nrows, rowsHasNA, ccols, ncols, colsHasNA, INTEGER(ans));
if (usenames != NA_LOGICAL && usenames){
SEXP dimnames = getAttrib(x, R_DimNamesSymbol);
if (dimnames != R_NilValue) {
setDimnames(ans, dimnames, nrows, crows, ncols, ccols, FALSE);
}
}
UNPROTECT(1);
break;
} /* switch */
}
}

HenrikBengtsson added a commit that referenced this issue May 30, 2023
@HenrikBengtsson
Copy link
Owner

Thanks for the prompt fix @yaccos ! I've added it to matrixStats 0.63.0-9010.

... the naming would be achieved the same way for all cases if it was put to the end of the function. I apologize for not being vigilent enough to discover this redundancy during the GSOC project.

If you've got time, feel free to do a PR for this. And absolutely no worries; it's really hard to be on top of everything at all time, and can take years and a fresh mind to sometimes spot even the most obvious things.

@const-ae
Copy link
Contributor Author

const-ae commented Jun 1, 2023

Thanks for the quick fix and sorry for the poor description.


There is one more function (colDiffs) that returns empty dimnames:

mat_empty_names <- matrix(1, nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
str(matrixStats::colCumsums(mat_empty_names))
#>  num [1:5, 1] 1 2 3 4 5
str(matrixStats::colDiffs(mat_empty_names))
#>  num [1:4, 1] 0 0 0 0
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : NULL

Created on 2023-06-01 with reprex v2.0.2

@yaccos
Copy link
Contributor

yaccos commented Jun 1, 2023

Thanks for the quick fix and sorry for the poor description.

There is one more function (colDiffs) that returns empty dimnames:

mat_empty_names <- matrix(1, nrow = 5, ncol = 1, dimnames = list(NULL, NULL))
str(matrixStats::colCumsums(mat_empty_names))
#>  num [1:5, 1] 1 2 3 4 5
str(matrixStats::colDiffs(mat_empty_names))
#>  num [1:4, 1] 0 0 0 0
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : NULL

Created on 2023-06-01 with reprex v2.0.2

colDiffs() and rowDiffs() have their own respective C-level functions for naming (

void set_rowDiffs_Dimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows,
and
void set_colDiffs_Dimnames(SEXP mat/*Answer matrix*/, SEXP dimnames, R_xlen_t nrows, R_xlen_t nrow_ans,
). This means that the same fix must be applied to these locations as well.

HenrikBengtsson added a commit that referenced this issue Jun 1, 2023
…ndled consistently and in line with how base R does it [#234]
@HenrikBengtsson
Copy link
Owner

Thanks both. Fixed in matrixStats 0.63.0-9011.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants