Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyMem_Raw{New,Resize} convenience macros #50

Open
picnixz opened this issue Nov 29, 2024 · 10 comments
Open

Add PyMem_Raw{New,Resize} convenience macros #50

picnixz opened this issue Nov 29, 2024 · 10 comments

Comments

@picnixz
Copy link

picnixz commented Nov 29, 2024

We have convenience macros for PyMem_New and PyMem_Resize to allocate memory for n objects of the given type without checking if n * sizeof(T) would overflow but we don't have any for the raw allocators. I suggest adding those two macros.

@encukou expressed some reservations on the existing issue (python/cpython#127415) concerning their usefulness:

PyMem_RawCalloc(sizeof(T), n) isn't that much worse to type than PyMem_RawNew(T, n)

Concerning this point, there are benefits of using PyMem_RawNew over the existing PyMem_RawCalloc. First, PyMem_RawNew uses malloc() instead of calloc() which downstream users might find convenient for their behaviour (calloc() initializes the memory being allocated while malloc() doesn't) and, possibly, for their performances (calloc() might be slightly slower). Second, using the correct semantics may be useful for downstream users as well.

PyMem_Resize isn't very convenient if you want proper error handling.

Maybe so, but it's a documented way to resize memory and can be used directly as if (PyMem_Resize(...) == NULL) { ... } which could be convenient. Again, this may not necessarily be useful for CPython itself but we do provide convenience macros for the PyMem_* APIs, so I don't know why we wouldn't do the same for the PyMem_Raw* APIs.

@encukou
Copy link
Collaborator

encukou commented Dec 2, 2024

What's the reason to add these?
AFAIK, we generally want new additions to be demonstrably useful on their own.
In this case, AFAIK the Raw domain is intended to be much more low-level, so it can do without some convenience macros.


[PyMem_Resize] can be used directly as if (PyMem_Resize(...) == NULL) { ... } which could be convenient.

That's convenient, but on error it will forget the pointer's originaf value -- as PyMem_Resize says, you must “save the original value of p to avoid losing memory when handling errors.”
IMO, this one not a good convenience macro. It can save keystrokes, but makes up for that in mental load :(

@picnixz
Copy link
Author

picnixz commented Dec 2, 2024

AFAIK, we generally want new additions to be demonstrably useful on their own.

When you do the multiplication, you can have an overflow. So you cannot just replace it by PyMem_RawMalloc(sizeof(T) * n) since sizeof(T) * n could overflow a size_t (n could be Py_ssize_t but we downcast it to size_t in the macro forms, so we should probably document this as well). With the convenience macro, the overflow check is (size_t)n < PY_SSIZE_T_MAX / sizeof(T) which would not overflow (assuming that n fits on a size_t)

For downstream users, that would be helpful (otherwise the overflow check in PyMem_RawMalloc does not make sense) and protect against overflows.

IMO, this one not a good convenience macro.

For the resize, I can live without it I think.

@encukou
Copy link
Collaborator

encukou commented Dec 3, 2024

When you do the multiplication, you can have an overflow.

Right, there's some value in that, though I'm not convinced there's enough value. Are there any examples of projects that would want this?

The check isn't something that needs a macro, so if we want this, we should first add a function with a calloc-like signature, and then add a macro to do the macro-only part. (Or not -- users can easily type sizeof(TYPE) manually.)

Maybe we instead want a size_t PyMem_ArraySize(size_t element_size, size_t n) that returns an unallocatable size ((size_t)-1) on overflow? Then users could use that directly with the other functions, and we wouldn't be pressured to add a two-arg variant to every “alloc”, “realloc”, and “alloc-and-zero” function.

(But I still think we don't need this. We're not building a general-purpose C library here.)

@picnixz
Copy link
Author

picnixz commented Dec 3, 2024

Are there any examples of projects that would want this?

Well, my first idea was CPython itself as a project.


I don't think I'll be able to convince you more on this matter :( But I don't have more arguments except the consistency with the existing PyMem_New macro and the overflow checks (together with the automatic type cast).

Should I tag the other members of the C API WG for voting?

@vstinner
Copy link

vstinner commented Dec 3, 2024

Well, my first idea was CPython itself as a project.

Do you have some examples?

@picnixz
Copy link
Author

picnixz commented Dec 3, 2024

In BZ2_Malloc (similar in PyLzma_Malloc and PyZlib_Malloc)

static void*
BZ2_Malloc(void* ctx, int items, int size)
{
    if (items < 0 || size < 0)
        return NULL;
    if (size != 0 && (size_t)items > (size_t)PY_SSIZE_T_MAX / (size_t)size)
        return NULL;
    /* PyMem_Malloc() cannot be used: compress() and decompress()
       release the GIL */
    return PyMem_RawMalloc((size_t)items * (size_t)size);
}

In subprocess_fork_exec_impl:

        if (extra_group_size > 0) {
            extra_groups = PyMem_RawMalloc(extra_group_size * sizeof(gid_t));
            if (extra_groups == NULL) {
                PyErr_SetString(PyExc_MemoryError,
                        "failed to allocate memory for group list");
                goto cleanup;
            }
        }

In _Py_DecodeUTF8Ex:

    /* Note: size will always be longer than the resulting Unicode
       character count */
    if (PY_SSIZE_T_MAX / (Py_ssize_t)sizeof(wchar_t) - 1 < size) {
        return -1;
    }

    unicode = PyMem_RawMalloc((size + 1) * sizeof(wchar_t));
    if (!unicode) {
        return -1;
    }

In _Py_EncodeUTF8Ex:

    if (raw_malloc) {
        bytes = PyMem_RawMalloc((len + 1) * max_char_size);
    }

In decode_current_locale of fileutils.c:

        if (argsize > PY_SSIZE_T_MAX / sizeof(wchar_t) - 1) {
            return -1;
        }
        res = (wchar_t *)PyMem_RawMalloc((argsize + 1) * sizeof(wchar_t));
        if (!res) {
            return -1;
        }

And in some other places (perhaps 4-5 more). Note that there are many places where RawMalloc is invoked without a multiplied operand so those places are ok.

@picnixz
Copy link
Author

picnixz commented Dec 3, 2024

Note that for the Unicode-related calls, we should perhaps leave them as is since size + 1 could overflow as well. Alternatively, deducing the expected size could be an alternative to the macro.

@ZeroIntensity
Copy link

IMO, if we want to make arrays for us easier, the more ideal solution would be something like an internal array API, not more allocator macros. For users, I could see some usage, but in my experience, users choose either the system malloc or PyMem_Malloc, never really the raw allocator. A code search is saying otherwise though.

@vstinner
Copy link

vstinner commented Dec 3, 2024

For users, I could see some usage, but in my experience, users choose either the system malloc or PyMem_Malloc, never really the raw allocator.

I added PyMem_RawMalloc() to be able to trace these memory allocation made by Python using the tracemalloc debug tool.

@vstinner
Copy link

vstinner commented Dec 3, 2024

I'm +0 on adding PyMem_RawNew() and PyMem_RawResize(). I can see the benefits of "built-in" integer overflow checks, but I'm also used to test manually for integer overflow. I don't have a strong opinion, since it's less common to use PyMem_RawMalloc() than using PyMem_Malloc()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants