Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap lacks error handling (SEH) on Windows which can lead to interpreter crashes #118209

Closed
Dobatymo opened this issue Apr 24, 2024 · 4 comments
Closed
Labels
extension-modules C modules in the Modules dir OS-windows type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@Dobatymo
Copy link
Contributor

Dobatymo commented Apr 24, 2024

Crash report

What happened?

mmap reads and writes require structured exception handling (SEH) for correct handling of errors (like hardware read issues, or disk full write issues) on Windows. See https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-mapviewoffile#remarks for a description of the possible exceptions and https://learn.microsoft.com/en-us/windows/win32/memory/reading-and-writing-from-a-file-view for how to handle them.

I found this issue when trying to read a file from a corrupt hdd. A normal open().read() would raise a OSError: [Errno 22] Invalid argument exception. Whereas doing the same using mmap crashes the interpreter.

I only tested on Windows 7 and Python 3.8, but from looking at the Microsoft docs and the Python source code, this issue should exist for all Windows versions and the latest Python source. (I don't have access to the broken file from a newer machine)

I think there is nothing here which guards against these errors

static PyObject *
mmap_read_method(mmap_object *self,
PyObject *args)
{
Py_ssize_t num_bytes = PY_SSIZE_T_MAX, remaining;
PyObject *result;
CHECK_VALID(NULL);
if (!PyArg_ParseTuple(args, "|O&:read", _Py_convert_optional_to_ssize_t, &num_bytes))
return NULL;
CHECK_VALID(NULL);
/* silently 'adjust' out-of-range requests */
remaining = (self->pos < self->size) ? self->size - self->pos : 0;
if (num_bytes < 0 || num_bytes > remaining)
num_bytes = remaining;
result = PyBytes_FromStringAndSize(&self->data[self->pos], num_bytes);
self->pos += num_bytes;
return result;
}

with open(path, "rb") as fp:
    with mmap.mmap(fp.fileno(), 0, access=mmap.ACCESS_READ) as fr:
        fr.read(1024*8)   # this crashes when `path` is on a corrupt hdd for example.

CPython versions tested on:

3.8

Operating systems tested on:

Windows

Output from running 'python -VV' on the command line:

No response

Linked PRs

@Dobatymo Dobatymo added the type-crash A hard crash of the interpreter, possibly with a core dump label Apr 24, 2024
@Eclips4 Eclips4 added OS-windows extension-modules C modules in the Modules dir labels Apr 24, 2024
@zooba
Copy link
Member

zooba commented Apr 24, 2024

Posted this on the PR, but repeating here:

One other possibility may be to protect PyBytes_FromString internally, which could cut off an entire class of crashes instead of just one. Thoughts?

@eryksun
Copy link
Contributor

eryksun commented Apr 24, 2024

On POSIX, the mmap type doesn't currently support handling synchronous SIGSEGV or SIGBUS signals generated by an error while reading or writing in the address range of a memory-mapped file. That requires a handler set by sigaction() that's configured with SA_SIGINFO in order to be able to check whether the faulting si_addr is in the range of the mapped file.

Is it premature to implement SEH or VEH support for this on Windows, given nothing similar is currently implemented on POSIX?

@zooba
Copy link
Member

zooba commented Apr 24, 2024

Is it premature to implement SEH or VEH support for this on Windows, given nothing similar is currently implemented on POSIX?

Something has to be implemented first, and the likelihood of finding someone capable of doing both is pretty slim.

I think we're best to implement one first, but in a way that's easily adapted to support other platforms (e.g. my safe_memcpy suggestion on the PR). That way another contributor has the framework for adding platform support, and we don't have to force collaboration or conflict resolution too early.

@zooba
Copy link
Member

zooba commented May 10, 2024

Thanks for the contribution! This is now merged into 3.13 for beta 2.

If someone would like to add POSIX/signal support, please feel free, but let's do it on a new issue. Similarly with any additional operations that we can protect using SEH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir OS-windows type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants