rowMaxs() et al. can be improved #200

HenrikBengtsson · 2021-02-22T19:37:18Z

The underlying implementation for rowRanges(), rowMins(), and rowMaxs() can probably be improved. Looking at the benchmarks compared to Rfast it certainly looks like so.

> X <- matrix(rnorm(1000*1000), nrow=1000, ncol=1000)

> byrow <- microbenchmark::microbenchmark(matrixStats = matrixStats::rowMaxs(m), Rfast = Rfast::rowMaxs(m, value=TRUE), apply = apply(m, MARGIN=1, FUN=max))
> byrow
Unit: microseconds
        expr       min         lq      mean     median        uq       max neval
 matrixStats  1957.475  2089.3485  2304.821  2228.9490  2395.270  4203.469   100
       Rfast   693.778   866.4185  1032.467   935.8735  1045.914  2446.249   100
       apply 12781.334 18051.5830 21182.688 20964.8795 23095.604 70352.344   100

Note that the colNnn() implementation is already optimized;

> bycol <- microbenchmark::microbenchmark(matrixStats = matrixStats::colMaxs(m), Rfast = Rfast::colMaxs(m, value=TRUE), apply = apply(m, MARGIN=2, FUN=max))

> bycol
Unit: milliseconds
        expr      min       lq      mean    median        uq       max neval
 matrixStats 1.204109 1.366232  1.503899  1.440197  1.572250  2.750286   100
       Rfast 1.287229 1.390556  1.553244  1.491345  1.631343  2.800478   100
       apply 8.864002 9.924433 12.371923 12.143643 13.839154 24.526761   100

The reason for the row versions not being as fast is most likely because of how the implementation attempts to re-use the same code/macro-base for both rows and columns and this doesn't work all the way, e.g. see

matrixStats/src/rowRanges_lowlevel_template.h

Lines 100 to 130 in 483be54

    
                 /* rowMaxs() */ 
        
                 maxs = ans; 
        
                 for (jj=0; jj < ncols; jj++) { 
        
                   colBegin = R_INDEX_OP(COL_INDEX(ccols,jj), *, nrow); 
        
                   for (ii=0; ii < nrows; ii++) { 
        
                     if (!narm && skip[ii]) continue; 
        
                     idx = R_INDEX_OP(colBegin, +, ROW_INDEX(crows,ii)); 
        
                     value = R_INDEX_GET(x, idx, X_NA); 
        
                     if (X_ISNAN(value)) { 
        
                       if (!narm) { 
        
                         maxs[ii] = value; 
        
                         is_counted[ii] = 1; 
        
                         /* Early stopping? */ 
        
           #if X_TYPE == 'i' 
        
                         skip[ii] = 1; 
        
           #elif X_TYPE == 'r' 
        
                         if (X_ISNA(value)) skip[ii] = 1; 
        
           #endif 
        
                       } 
        
                     } else if (!is_counted[ii]) { 
        
                       maxs[ii] = value; 
        
                       is_counted[ii] = 1; 
        
                     } else if (value > maxs[ii]) { 
        
                       maxs[ii] = value; 
        
                     } 
        
                   } 
        
                 } /* for (jj ...) */

The text was updated successfully, but these errors were encountered:

HenrikBengtsson added this to the Future release (not next) milestone Feb 22, 2021

frederikziebell mentioned this issue Sep 2, 2023

Improve speed of rowwise computations #238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowMaxs() et al. can be improved #200

rowMaxs() et al. can be improved #200

HenrikBengtsson commented Feb 22, 2021 •

edited

Loading

rowMaxs() et al. can be improved #200

rowMaxs() et al. can be improved #200

Comments

HenrikBengtsson commented Feb 22, 2021 • edited Loading

HenrikBengtsson commented Feb 22, 2021 •

edited

Loading