Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowMaxs() et al. can be improved #200

Open
HenrikBengtsson opened this issue Feb 22, 2021 · 0 comments
Open

rowMaxs() et al. can be improved #200

HenrikBengtsson opened this issue Feb 22, 2021 · 0 comments

Comments

@HenrikBengtsson
Copy link
Owner

HenrikBengtsson commented Feb 22, 2021

The underlying implementation for rowRanges(), rowMins(), and rowMaxs() can probably be improved. Looking at the benchmarks compared to Rfast it certainly looks like so.

> X <- matrix(rnorm(1000*1000), nrow=1000, ncol=1000)

> byrow <- microbenchmark::microbenchmark(matrixStats = matrixStats::rowMaxs(m), Rfast = Rfast::rowMaxs(m, value=TRUE), apply = apply(m, MARGIN=1, FUN=max))
> byrow
Unit: microseconds
        expr       min         lq      mean     median        uq       max neval
 matrixStats  1957.475  2089.3485  2304.821  2228.9490  2395.270  4203.469   100
       Rfast   693.778   866.4185  1032.467   935.8735  1045.914  2446.249   100
       apply 12781.334 18051.5830 21182.688 20964.8795 23095.604 70352.344   100

Note that the colNnn() implementation is already optimized;

> bycol <- microbenchmark::microbenchmark(matrixStats = matrixStats::colMaxs(m), Rfast = Rfast::colMaxs(m, value=TRUE), apply = apply(m, MARGIN=2, FUN=max))

> bycol
Unit: milliseconds
        expr      min       lq      mean    median        uq       max neval
 matrixStats 1.204109 1.366232  1.503899  1.440197  1.572250  2.750286   100
       Rfast 1.287229 1.390556  1.553244  1.491345  1.631343  2.800478   100
       apply 8.864002 9.924433 12.371923 12.143643 13.839154 24.526761   100

The reason for the row versions not being as fast is most likely because of how the implementation attempts to re-use the same code/macro-base for both rows and columns and this doesn't work all the way, e.g. see

/* rowMaxs() */
maxs = ans;
for (jj=0; jj < ncols; jj++) {
colBegin = R_INDEX_OP(COL_INDEX(ccols,jj), *, nrow);
for (ii=0; ii < nrows; ii++) {
if (!narm && skip[ii]) continue;
idx = R_INDEX_OP(colBegin, +, ROW_INDEX(crows,ii));
value = R_INDEX_GET(x, idx, X_NA);
if (X_ISNAN(value)) {
if (!narm) {
maxs[ii] = value;
is_counted[ii] = 1;
/* Early stopping? */
#if X_TYPE == 'i'
skip[ii] = 1;
#elif X_TYPE == 'r'
if (X_ISNA(value)) skip[ii] = 1;
#endif
}
} else if (!is_counted[ii]) {
maxs[ii] = value;
is_counted[ii] = 1;
} else if (value > maxs[ii]) {
maxs[ii] = value;
}
}
} /* for (jj ...) */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant