-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory fault during unsymmetric scaling of singular matrix with Hungarian algorithm #200
Comments
The matching used in this scaling is documented to be based on MC64 Lines 936 to 938 in 662c7ac
which itself documents that "If the matrix is structurally singular [...], then the entries of [match] for which there is I'm not quite sure how the scaling has to handle such cases. Perhaps the negative entries of Lines 1487 to 1488 in 662c7ac
such that they are skipped in |
That is probably a good idea. Seeing as the |
It has been implemented in this way in MC64 so that the rows which cause the matrix to be structurally singular can be identified, but I agree that it probably makes sense to set the negative entries of |
I noticed that this actually already goes wrong before what was mentioned. This was hidden with match being allocated larger than necessary in the example (see #207). When allocating with size /* examples/C/scaling/hungarian_unsym.c - Example code for SPRAL_SCALING */
#include <stdlib.h>
#include <stdio.h>
#include "spral.h"
int main(void) {
/* Derived types */
struct spral_scaling_hungarian_options options;
struct spral_scaling_hungarian_inform inform;
/* Data for unsymmetric matrix:
* ( 2 1 )
* ( 1 4 1 1 )
* ( ) */
int m = 3, n = 5;
int ptr[] = { 0, 2, 4, 5, 5, 6 };
int row[] = { 0, 1, 0, 1, 1, 1 };
double val[] = { 2.0, 1.0, 1.0, 4.0, 1.0, 1.0 };
printf("Initial matrix:\n");
spral_print_matrix(-1, SPRAL_MATRIX_REAL_UNSYM, m, n, ptr, row, val, 0);
/* Perform symmetric scaling */
int* match = malloc(sizeof(int)*m);
double* rscaling = malloc(sizeof(double)*m);
double* cscaling = malloc(sizeof(double)*n);
spral_scaling_hungarian_default_options(&options);
spral_scaling_hungarian_unsym(m, n, ptr, row, val, rscaling, cscaling, match,
&options, &inform);
if(inform.flag<0) {
printf("spral_scaling_hungarian_unsym() returned with error %5d", inform.flag);
exit(1);
}
/* Print scaling and matching */
printf("Matching:");
for(int i=0; i<m; i++) printf(" %10d", match[i]);
printf("\nRow Scaling: ");
for(int i=0; i<m; i++) printf(" %10.2le", rscaling[i]);
printf("\nCol Scaling: ");
for(int i=0; i<n; i++) printf(" %10.2le", cscaling[i]);
printf("\n");
/* Calculate scaled matrix and print it */
for(int i=0; i<n; i++) {
for(int j=ptr[i]; j<ptr[i+1]; j++)
val[j] = rscaling[row[j]] * val[j] * cscaling[i];
}
printf("Scaled matrix:\n");
spral_print_matrix(-1, SPRAL_MATRIX_REAL_UNSYM, m, n, ptr, row, val, 0);
free(rscaling);
free(cscaling);
free(match);
/* Success */
return 0;
} we already get an error inside of
The offending code Lines 1174 to 1193 in 7fc5ec3
out and two in jperm . That leaves three 0-entries in jperm , causing three indices to be read from out - despite it only containing one. So we essentially overflow and read what happens to remain in the work array from earlier, which just so happens to be out of bounds for iperm . I am not sure right now how this is supposed to work.
This "inner" issue definitely has to be fixed before fixing the "outer" issue can then also allow using |
@nimgould any thoughts on what was intended here: #200 (comment) |
It is unclear to me what the purpose of this last 1174-1193 section is. The documentation implies that either iperm(i) > 0 gives a match of row i to column iperm(i) or iperm(i) = 0, and there is no match. But this last loop seems to be setting negative values to unmatched rows; what does this mean? And as @mjacobse says there is an inconsistency between the counts for the variable k when marching through the rows (first loop) and columns (second loop). The iperm array becomes the match array in hungarian_scale_unsym & hungarian_scale_sym, and I see no attempt there to correct negative components of match. And hungarian_wrapper is not used anywhere else in spral. I would be tempted to comment out this segment and see what happens. Perhaps Jonathan had something else in mind? |
Reassuring to hear that this section seems strange to you too, many thanks. My understanding is that this function is from MC64, so while it is true that the SPRAL documentation says that 0 is returned for unmatched rows, MC64 itself returns negative entries according to its documentation (#200 (comment)). And indeed, as it stands the SPRAL documentation is incorrect, the Just returning 0 from But I tend to agree that it might be easiest to just return 0 and adapt all uses to work with that instead of with negative entries. But to be honest I still do not really understand what further information the negative indices in MC64 provide, so perhaps I am missing how they are crucially important for some usecase. |
According to @isduff the negative entries in MC64 are intended to signal which of the rows are problematic as the negative entry is in fact minus the index of the problematic row. Now the question is, does SSIDS actually use this information anywhere? If not, then I agree with @mjacobse that we should just standardise on returning zero for unmatched entries. |
All of this makes sense for square matrices, but for rectangular ones a 0 is simply a reflection that one cannot possibly match all rows to columns. But, to me anyway, rank defficiency is more than simply not having enough rows or columns, but that there are zero singular values. I do not see how the scheme before the offending lines (the bulk of the subroutine) distinguishes between the two, and thus the offending lines add nothing extra (except for memory trouble!). Thus, as I said earlier, I would simply comment this out. |
Thanks @nimgould, it looks like we are converging on a solution. |
Consider the following example problem:
This outputs the following:
The matching has the clearly invalid entry 32522 (random, changing at each iteration). There is also a valgrind error:
The offending line attempts to access the
rscaling
variable at thematch(i)
index, wherei = 3
andmatch = (1, 5, -3)
. It is clearly infeasible to access the array at a negative index, which is causing the initial memory fault. On the other hand, I don't know why the index becomes negative in the first place...The text was updated successfully, but these errors were encountered: