Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200

chrhansk · 2024-06-08T13:31:20Z

Consider the following example problem:

#include <stdlib.h>
#include <stdio.h>
#include "spral.h"

int main(void) {
   /* Derived types */
   struct spral_scaling_hungarian_options options;
   struct spral_scaling_hungarian_inform inform;

   /* Other variables */
   int match[5];
   double rscaling[5], cscaling[5];

   /* Data for unsymmetric matrix:
    * ( 2,  1,  0,  0,  0)
    * ( 1,  4,  1,  0,  1)
    * ( 0,  0,  0,  0,  0)
    */
   int m = 3, n = 5;
   int ptr[]    = { 1, 3, 5, 6, 6, 7};
   int row[]    = { 1, 2, 1, 2, 2, 2};
   double val[] = { 2., 1., 1., 4., 1., 1. };
   printf("Initial matrix:\n");
   spral_print_matrix(-1, SPRAL_MATRIX_REAL_UNSYM, m, n, ptr, row, val, 1);

   /* Perform unsymmetric scaling */
   spral_scaling_hungarian_default_options(&options);

   options.array_base = 1;
   options.scale_if_singular = true;

   spral_scaling_hungarian_unsym(m, n, ptr, row, val, rscaling, cscaling, match,
         &options, &inform);

   if(inform.flag<0) {
      printf("spral_scaling_hungarian_unsym() returned with error %5d", inform.flag);
      exit(1);
   }

   /* Print scaling and matching */
   printf("Matching:");
   for(int i=0; i<n; i++) printf(" %10d", match[i]);
   printf("\nRow Scaling: ");
   for(int i=0; i<m; i++) printf(" %10.2le", rscaling[i]);
   printf("\nCol Scaling: ");
   for(int i=0; i<n; i++) printf(" %10.2le", cscaling[i]);
   printf("\n");

   return 0;
}

This outputs the following:

Initial matrix:
Real unsymmetric matrix, dimension 3x5 with 6 entries.
1:   2.0000E+00   1.0000E+00                                       
2:   1.0000E+00   4.0000E+00   1.0000E+00                1.0000E+00
3:                                                                 
Matching:          1          5         -3      32522         -4
Row Scaling:    8.41e-01   8.41e-01   8.41e-01
Col Scaling:    5.95e-01   2.97e-01   1.19e+00   1.00e+00   1.19e+00

The matching has the clearly invalid entry 32522 (random, changing at each iteration). There is also a valgrind error:

==213605== Invalid write of size 8
==213605==    at 0x49AF0A5: __spral_scaling_MOD_match_postproc (scaling.f90:1675)
==213605==    by 0x49B59AB: __spral_scaling_MOD_hungarian_wrapper (scaling.f90:684)
==213605==    by 0x49B9679: __spral_scaling_MOD_hungarian_scale_unsym_int64 (scaling.f90:225)
==213605==    by 0x49B9C02: __spral_scaling_MOD_hungarian_scale_unsym_int32 (scaling.f90:202)
==213605==    by 0x48EA1D8: spral_scaling_hungarian_unsym (scaling.f90:729)
==213605==    by 0x10936A: main (hungarian_unsym.c:33)
==213605==  Address 0x7260180 is 32 bytes before a block of size 48 in arena "client"

The offending line attempts to access the rscaling variable at the match(i) index, where i = 3 and match = (1, 5, -3). It is clearly infeasible to access the array at a negative index, which is causing the initial memory fault. On the other hand, I don't know why the index becomes negative in the first place...

The text was updated successfully, but these errors were encountered:

mjacobse · 2024-06-08T14:53:48Z

The matching used in this scaling is documented to be based on MC64

spral/src/scaling.f90

Lines 936 to 938 in 662c7ac

    
           ! This code is adapted from MC64 v 1.6.0 
        
           ! 
        
             subroutine hungarian_match(m,n,ptr,row,val,iperm,num,dualu,dualv,st)

which itself documents that "If the matrix is structurally singular [...], then the entries of [match] for which there is
no entry on the diagonal of the permuted matrix are set negative" (https://www.hsl.rl.ac.uk/specs/mc64.pdf). So in the example above, row 3 is causing the matrix to be structurally singular and supposedly that's why -3 is returned.

I'm not quite sure how the scaling has to handle such cases. Perhaps the negative entries of match should just be set to 0 to signal "unmatched", similar to what the auction algorithm does here

spral/src/scaling.f90

Lines 1487 to 1488 in 662c7ac

    
           ! We expect unmatched columns to have match(col) = 0 
        
           where(match(:) .eq. -1) match(:) = 0

such that they are skipped in match_postproc, e.g. the aforementioned offending line.

chrhansk · 2024-06-08T17:21:06Z

That is probably a good idea. Seeing as the spral_scaling_hungarian_options struct has an entry scale_if_singular, a user would expect that singular matrices can be scaled (albeit with a warning)...

jfowkes · 2024-06-10T09:30:45Z

It has been implemented in this way in MC64 so that the rows which cause the matrix to be structurally singular can be identified, but I agree that it probably makes sense to set the negative entries of match to 0. Happy to accept a PR.

mjacobse · 2024-06-18T20:33:49Z

I noticed that this actually already goes wrong before what was mentioned. This was hidden with match being allocated larger than necessary in the example (see #207). When allocating with size m (on the heap instead of the stack to make it easier to detect with valgrind) like so

/* examples/C/scaling/hungarian_unsym.c - Example code for SPRAL_SCALING */
#include <stdlib.h>
#include <stdio.h>
#include "spral.h"

int main(void) {
   /* Derived types */
   struct spral_scaling_hungarian_options options;
   struct spral_scaling_hungarian_inform inform;

   /* Data for unsymmetric matrix:
    * ( 2  1         )
    * ( 1  4  1    1 )
    * (              ) */
   int m = 3, n = 5;
   int ptr[]    = { 0,        2,        4,   5,   5,  6 };
   int row[]    = { 0,   1,   0,   1,   1,   1   };
   double val[] = { 2.0, 1.0, 1.0, 4.0, 1.0, 1.0 };
   printf("Initial matrix:\n");
   spral_print_matrix(-1, SPRAL_MATRIX_REAL_UNSYM, m, n, ptr, row, val, 0);

   /* Perform symmetric scaling */
   int* match = malloc(sizeof(int)*m);
   double* rscaling = malloc(sizeof(double)*m);
   double* cscaling = malloc(sizeof(double)*n);
   spral_scaling_hungarian_default_options(&options);
   spral_scaling_hungarian_unsym(m, n, ptr, row, val, rscaling, cscaling, match,
         &options, &inform);
   if(inform.flag<0) {
      printf("spral_scaling_hungarian_unsym() returned with error %5d", inform.flag);
      exit(1);
   }

   /* Print scaling and matching */
   printf("Matching:");
   for(int i=0; i<m; i++) printf(" %10d", match[i]);
   printf("\nRow Scaling: ");
   for(int i=0; i<m; i++) printf(" %10.2le", rscaling[i]);
   printf("\nCol Scaling: ");
   for(int i=0; i<n; i++) printf(" %10.2le", cscaling[i]);
   printf("\n");

   /* Calculate scaled matrix and print it */
   for(int i=0; i<n; i++) {
      for(int j=ptr[i]; j<ptr[i+1]; j++)
         val[j] = rscaling[row[j]] * val[j] * cscaling[i];
   }
   printf("Scaled matrix:\n");
   spral_print_matrix(-1, SPRAL_MATRIX_REAL_UNSYM, m, n, ptr, row, val, 0);

   free(rscaling);
   free(cscaling);
   free(match);
   /* Success */
   return 0;
}

we already get an error inside of hungarian_match, without even setting options.scale_if_singular to true:

Initial matrix:
Real unsymmetric matrix, dimension 3x5 with 6 entries.
0:   2.0000E+00   1.0000E+00                                       
1:   1.0000E+00   4.0000E+00   1.0000E+00                1.0000E+00
2:                                                                 
==29266== Invalid write of size 4
==29266==    at 0x124E1B: __spral_scaling_MOD_hungarian_match (scaling.f90:1192)
==29266==    by 0x126BA1: __spral_scaling_MOD_hungarian_wrapper (scaling.f90:660)
==29266==    by 0x12AAFF: __spral_scaling_MOD_hungarian_scale_unsym_int64 (scaling.f90:225)
==29266==    by 0x12B0B6: __spral_scaling_MOD_hungarian_scale_unsym_int32 (scaling.f90:201)
==29266==    by 0x10F7F5: spral_scaling_hungarian_unsym (scaling.f90:723)
==29266==    by 0x1092AB: main (hungarian_unsym.c:27)
==29266==  Address 0x4ee02d0 is 4 bytes after a block of size 12 alloc'd
==29266==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==29266==    by 0x109258: main (hungarian_unsym.c:23)

The offending code

spral/src/scaling.f90

Lines 1174 to 1193 in 7fc5ec3

    
           ! Otherwise, matrix is structurally singular, complete iperm. 
        
           ! jperm, out are work arrays 
        
           jperm(1:n) = 0 
        
           k = 0 
        
           do i = 1, m 
        
              if (iperm(i) .eq. 0) then 
        
                 k = k + 1 
        
                 out(k) = i 
        
              else 
        
                 j = iperm(i) 
        
                 jperm(j) = i 
        
              end if 
        
           end do 
        
           k = 0 
        
           do j = 1, n 
        
              if (jperm(j) .ne. 0) cycle 
        
              k = k + 1 
        
              jdum = int(out(k)) 
        
              iperm(jdum) = -j 
        
           end do

looks dubious to me; for this example with one unmatched and two matched rows, one index is stored in out and two in jperm. That leaves three 0-entries in jperm, causing three indices to be read from out - despite it only containing one. So we essentially overflow and read what happens to remain in the work array from earlier, which just so happens to be out of bounds for iperm. I am not sure right now how this is supposed to work.

This "inner" issue definitely has to be fixed before fixing the "outer" issue can then also allow using options.scale_if_singular == true.

jfowkes · 2024-06-18T20:42:44Z

@nimgould any thoughts on what was intended here: #200 (comment)

nimgould · 2024-06-19T07:37:58Z

It is unclear to me what the purpose of this last 1174-1193 section is. The documentation implies that either iperm(i) > 0 gives a match of row i to column iperm(i) or iperm(i) = 0, and there is no match. But this last loop seems to be setting negative values to unmatched rows; what does this mean? And as @mjacobse says there is an inconsistency between the counts for the variable k when marching through the rows (first loop) and columns (second loop). The iperm array becomes the match array in hungarian_scale_unsym & hungarian_scale_sym, and I see no attempt there to correct negative components of match. And hungarian_wrapper is not used anywhere else in spral.

I would be tempted to comment out this segment and see what happens. Perhaps Jonathan had something else in mind?

mjacobse · 2024-06-19T08:05:49Z

Reassuring to hear that this section seems strange to you too, many thanks.

My understanding is that this function is from MC64, so while it is true that the SPRAL documentation says that 0 is returned for unmatched rows, MC64 itself returns negative entries according to its documentation (#200 (comment)). And indeed, as it stands the SPRAL documentation is incorrect, the hungarian_scale_[un]sym functions do return those negative entries as noticed in #205 (review).

Just returning 0 from hungarian_match leads to other problems in other SPRAL-internal uses of hungarian_match that do expect the negative entries, as we noticed in #205 (comment).

But I tend to agree that it might be easiest to just return 0 and adapt all uses to work with that instead of with negative entries. But to be honest I still do not really understand what further information the negative indices in MC64 provide, so perhaps I am missing how they are crucially important for some usecase.

jfowkes · 2024-06-19T08:39:23Z

According to @isduff the negative entries in MC64 are intended to signal which of the rows are problematic as the negative entry is in fact minus the index of the problematic row. Now the question is, does SSIDS actually use this information anywhere? If not, then I agree with @mjacobse that we should just standardise on returning zero for unmatched entries.

nimgould · 2024-06-19T08:49:39Z

All of this makes sense for square matrices, but for rectangular ones a 0 is simply a reflection that one cannot possibly match all rows to columns. But, to me anyway, rank defficiency is more than simply not having enough rows or columns, but that there are zero singular values. I do not see how the scheme before the offending lines (the bulk of the subroutine) distinguishes between the two, and thus the offending lines add nothing extra (except for memory trouble!). Thus, as I said earlier, I would simply comment this out.

jfowkes · 2024-06-19T08:54:20Z

Thanks @nimgould, it looks like we are converging on a solution.

jfowkes self-assigned this Jun 13, 2024

jfowkes added the bug label Jun 13, 2024

jfowkes removed their assignment Jun 13, 2024

chrhansk linked a pull request Jun 14, 2024 that will close this issue

Fix memory fault during scaling of singular matrix #205

Open

mjacobse mentioned this issue Jun 18, 2024

Dimension of argument match for matching-based scaling methods is unclear for non-square matrices #207

Closed

jfowkes mentioned this issue Jun 19, 2024

Avoid oversized arrays in scaling tests #214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200

Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200

chrhansk commented Jun 8, 2024

mjacobse commented Jun 8, 2024 •

edited

Loading

chrhansk commented Jun 8, 2024

jfowkes commented Jun 10, 2024 •

edited

Loading

mjacobse commented Jun 18, 2024 •

edited

Loading

jfowkes commented Jun 18, 2024

nimgould commented Jun 19, 2024 •

edited

Loading

mjacobse commented Jun 19, 2024 •

edited

Loading

jfowkes commented Jun 19, 2024 •

edited

Loading

nimgould commented Jun 19, 2024

jfowkes commented Jun 19, 2024

Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200

Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200

Comments

chrhansk commented Jun 8, 2024

mjacobse commented Jun 8, 2024 • edited Loading

chrhansk commented Jun 8, 2024

jfowkes commented Jun 10, 2024 • edited Loading

mjacobse commented Jun 18, 2024 • edited Loading

jfowkes commented Jun 18, 2024

nimgould commented Jun 19, 2024 • edited Loading

mjacobse commented Jun 19, 2024 • edited Loading

jfowkes commented Jun 19, 2024 • edited Loading

nimgould commented Jun 19, 2024

jfowkes commented Jun 19, 2024

mjacobse commented Jun 8, 2024 •

edited

Loading

jfowkes commented Jun 10, 2024 •

edited

Loading

mjacobse commented Jun 18, 2024 •

edited

Loading

nimgould commented Jun 19, 2024 •

edited

Loading

mjacobse commented Jun 19, 2024 •

edited

Loading

jfowkes commented Jun 19, 2024 •

edited

Loading