Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A problem with corrupted files #216

Open
Ri0n opened this issue Aug 18, 2023 · 7 comments
Open

A problem with corrupted files #216

Ri0n opened this issue Aug 18, 2023 · 7 comments

Comments

@Ri0n
Copy link
Contributor

Ri0n commented Aug 18, 2023

There is another thing I discovered with corrupted files.

matvar->data = calloc(nelems_x_nfields, matvar->data_size);

code like this may try to allocate enormous amount of memory. And I have an corrupted files which causes just this.

I thought to quickly fix it by adding the file size in mat_t struct and change all the allocations to check it. But since the file or some variable can be compressed it's hard to say what's the max size. Of course it's possible to use a multiplier to find approximate max of the max, but still it's sort of ugly.
Another idea would be to introduce a sort of compile-time constant which will be checked on every allocation, but probably this option is even worse.

more complex solution would be

  • set limit on maximal size of values which can be returned by Matio API and processed without Matio aid. For example max size of a string.
  • for everything else allocate memory in chunks and reallocate when necessary and/or design a multi-chunk structure and iterators over it. Maybe it's even better while iterating to handle just a current chunk, releasing the previous one.
  • track total amount of allocated/de-allocated memory to have an impression how much more it's still reasonable to allocate. But this is only useful if we can somehow know in advance the max amount of memory available for the parent container.
@Ri0n
Copy link
Contributor Author

Ri0n commented Aug 19, 2023

ah I just realized the line above allocates nelems_x_nfields of pointers to variables and for the actual data. Then it's definitely can be limited, most likely even in runtime.

@Ri0n
Copy link
Contributor Author

Ri0n commented Aug 19, 2023

Just an example which works for me.

diff --git a/src/mat.c b/src/mat.c
index 9c48578..20370db 100644
--- a/src/mat.c
+++ b/src/mat.c
@@ -506,6 +506,7 @@ Mat_Open(const char *matname, int mode)
     mat->version = 0;
     mat->byteswap = 0;
     mat->num_datasets = 0;
+    mat->struct_vars_limit = 0;
 #if defined(MAT73) && MAT73
     mat->refs_id = -1;
 #endif
@@ -2967,3 +2968,17 @@ Mat_VarWriteAppend(mat_t *mat, matvar_t *matvar, enum matio_compression compress
 
     return err;
 }
+
+/** @brief Limit max number of variables to allocate memory for
+ *
+ * This sets a limit on maximum number of variables which may be allocated for a struct.
+ * max number of variables is number of struct members * number of elements per member.
+ * By default there is not limit.
+ * @ingroup MAT
+ * @param mat MAT file
+ * @param limit max number of variables in the struct
+ */
+void Mat_VarSetSrtuctVarsLimit(mat_t *mat, size_t limit)
+{
+    mat->struct_vars_limit = limit;
+}
diff --git a/src/mat5.c b/src/mat5.c
index eb4b384..258b0d7 100644
--- a/src/mat5.c
+++ b/src/mat5.c
@@ -1468,6 +1468,10 @@ ReadNextStructField(mat_t *mat, matvar_t *matvar)
         if ( !matvar->nbytes )
             return bytesread;
 
+        if (mat->struct_vars_limit && nelems_x_nfields > mat->struct_vars_limit) {
+            Mat_Critical("Struct vars limit exceeded. Allocation failed");
+            return bytesread;
+        }
         matvar->data = calloc(nelems_x_nfields, matvar->data_size);
         if ( NULL == matvar->data ) {
             Mat_Critical("Couldn't allocate memory for the data");
@@ -1739,6 +1743,10 @@ ReadNextStructField(mat_t *mat, matvar_t *matvar)
         if ( !matvar->nbytes )
             return bytesread;
 
+        if (mat->struct_vars_limit && nelems_x_nfields > mat->struct_vars_limit) {
+            Mat_Critical("Struct vars limit exceeded. Allocation failed");
+            return bytesread;
+        }
         matvar->data = calloc(nelems_x_nfields, matvar->data_size);
         if ( NULL == matvar->data ) {
             Mat_Critical("Couldn't allocate memory for the data");
diff --git a/src/matio.h b/src/matio.h
index 8a4caeb..78e7168 100644
--- a/src/matio.h
+++ b/src/matio.h
@@ -347,6 +347,7 @@ EXTERN int Mat_VarWriteAppend(mat_t *mat, matvar_t *matvar, enum matio_compressi
 EXTERN int Mat_VarWriteInfo(mat_t *mat, matvar_t *matvar);
 EXTERN int Mat_VarWriteData(mat_t *mat, matvar_t *matvar, void *data, int *start, int *stride,
                             int *edge);
+EXTERN void Mat_VarSetSrtuctVarsLimit(mat_t *mat, size_t limit);
 
 /* Other functions */
 EXTERN int Mat_CalcSingleSubscript(int rank, int *dims, int *subs);
diff --git a/src/matio_private.h b/src/matio_private.h
index d6d22b6..51f9594 100644
--- a/src/matio_private.h
+++ b/src/matio_private.h
@@ -113,6 +113,7 @@ struct _mat_t
     mat_off_t bof;       /**< Beginning of file not including any header */
     size_t next_index;   /**< Index/File position of next variable to read */
     size_t num_datasets; /**< Number of datasets in the file */
+    size_t struct_vars_limit; /**< A limit to max number of variable in a struct. n_members * n_elems_per_member */
 #if defined(MAT73) && MAT73
     hid_t refs_id; /**< Id of the /#refs# group in HDF5 */
 #endif

@tbeu
Copy link
Owner

tbeu commented Aug 19, 2023

I wonder where you get those corrupted MAT files. I doubt it can be created by MATLAB save command.

@Ri0n
Copy link
Contributor Author

Ri0n commented Aug 19, 2023

I don't really know. Those come to me from other people. Maybe they have special software to generate all sort of invalid stuff just to check software stability.

@tbeu
Copy link
Owner

tbeu commented Oct 26, 2023

@Ri0n Do you think somethink like 0adfda6 could also work here?

@Ri0n
Copy link
Contributor Author

Ri0n commented Oct 26, 2023

@tbeu
I'm just worried about compressed stuff inside of mat files, which may trigger false positives for failure. But I'm not an expert to say where it's the case for this commit

@vladimir1899
Copy link

Hello, @tbeu , I continue the Ri0n's mission. So, 0adfda6 does not work for us, since the problem arises later in the code, when we invoke function:

if ( matvar->class_type == MAT_C_STRUCT ) (void)ReadNextStructField(mat, matvar);

mat5.c:5479 in this commit

In ReadNextStructField we have matvar->data = calloc(nelems_x_nfields, matvar->data_size); were nelems_x_nfields comes from our corrupted file and is unrealistically big.

I tried to open this file in the Matlab and it immediately throw the error kinda: "you need 45 GB of memory to open the file, but only 32 GB are available".

May be it is good idea to follow the Matlab way and implement such check, something like that ?

// Get total memory ram in bytes
#if ( defined(_WIN64) || defined(_WIN32) ) && !defined(__CYGWIN__)
#include <windows.h>
long long getPhysMemSize()
{
    MEMORYSTATUSEX status;
    status.dwLength = sizeof(status);
    GlobalMemoryStatusEx(&status);
    return status.ullTotalPhys;
}
#define physRamVolumeBytes true;
#else
#include <sys/sysinfo.h>
long long getPhysMemSize()
{
    struct sysinfo info;
    sysinfo(&info);
    return info.totalram;
}
#define physRamVolumeBytes true;
#endif

// check before allocation
#ifdef physRamVolumeBytes
        if ( getPhysMemSize() < nelems_x_nfields * matvar->data_size )
        {
            Mat_Critical("Couldn't allocate memory for the data. Not enough memory");
            return bytesread;
        }
#endif
        matvar->data = calloc(nelems_x_nfields, matvar->data_size);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants