Add `PyMem_Raw{New,Resize}` convenience macros #50

picnixz · 2024-11-29T20:03:31Z

We have convenience macros for PyMem_New and PyMem_Resize to allocate memory for n objects of the given type without checking if n * sizeof(T) would overflow but we don't have any for the raw allocators. I suggest adding those two macros.

@encukou expressed some reservations on the existing issue (python/cpython#127415) concerning their usefulness:

PyMem_RawCalloc(sizeof(T), n) isn't that much worse to type than PyMem_RawNew(T, n)

Concerning this point, there are benefits of using PyMem_RawNew over the existing PyMem_RawCalloc. First, PyMem_RawNew uses malloc() instead of calloc() which downstream users might find convenient for their behaviour (calloc() initializes the memory being allocated while malloc() doesn't) and, possibly, for their performances (calloc() might be slightly slower). Second, using the correct semantics may be useful for downstream users as well.

PyMem_Resize isn't very convenient if you want proper error handling.

Maybe so, but it's a documented way to resize memory and can be used directly as if (PyMem_Resize(...) == NULL) { ... } which could be convenient. Again, this may not necessarily be useful for CPython itself but we do provide convenience macros for the PyMem_* APIs, so I don't know why we wouldn't do the same for the PyMem_Raw* APIs.

The text was updated successfully, but these errors were encountered:

encukou · 2024-12-02T16:27:02Z

What's the reason to add these?
AFAIK, we generally want new additions to be demonstrably useful on their own.
In this case, AFAIK the Raw domain is intended to be much more low-level, so it can do without some convenience macros.

[PyMem_Resize] can be used directly as if (PyMem_Resize(...) == NULL) { ... } which could be convenient.

That's convenient, but on error it will forget the pointer's originaf value -- as PyMem_Resize says, you must “save the original value of p to avoid losing memory when handling errors.”
IMO, this one not a good convenience macro. It can save keystrokes, but makes up for that in mental load :(

picnixz · 2024-12-02T16:38:35Z

AFAIK, we generally want new additions to be demonstrably useful on their own.

When you do the multiplication, you can have an overflow. So you cannot just replace it by PyMem_RawMalloc(sizeof(T) * n) since sizeof(T) * n could overflow a size_t (n could be Py_ssize_t but we downcast it to size_t in the macro forms, so we should probably document this as well). With the convenience macro, the overflow check is (size_t)n < PY_SSIZE_T_MAX / sizeof(T) which would not overflow (assuming that n fits on a size_t)

For downstream users, that would be helpful (otherwise the overflow check in PyMem_RawMalloc does not make sense) and protect against overflows.

IMO, this one not a good convenience macro.

For the resize, I can live without it I think.

encukou · 2024-12-03T10:27:42Z

When you do the multiplication, you can have an overflow.

Right, there's some value in that, though I'm not convinced there's enough value. Are there any examples of projects that would want this?

The check isn't something that needs a macro, so if we want this, we should first add a function with a calloc-like signature, and then add a macro to do the macro-only part. (Or not -- users can easily type sizeof(TYPE) manually.)

Maybe we instead want a size_t PyMem_ArraySize(size_t element_size, size_t n) that returns an unallocatable size ((size_t)-1) on overflow? Then users could use that directly with the other functions, and we wouldn't be pressured to add a two-arg variant to every “alloc”, “realloc”, and “alloc-and-zero” function.

(But I still think we don't need this. We're not building a general-purpose C library here.)

picnixz · 2024-12-03T10:56:27Z

Are there any examples of projects that would want this?

Well, my first idea was CPython itself as a project.

I don't think I'll be able to convince you more on this matter :( But I don't have more arguments except the consistency with the existing PyMem_New macro and the overflow checks (together with the automatic type cast).

Should I tag the other members of the C API WG for voting?

vstinner · 2024-12-03T13:54:04Z

Well, my first idea was CPython itself as a project.

Do you have some examples?

picnixz · 2024-12-03T14:00:32Z

In BZ2_Malloc (similar in PyLzma_Malloc and PyZlib_Malloc)

static void*
BZ2_Malloc(void* ctx, int items, int size)
{
    if (items < 0 || size < 0)
        return NULL;
    if (size != 0 && (size_t)items > (size_t)PY_SSIZE_T_MAX / (size_t)size)
        return NULL;
    /* PyMem_Malloc() cannot be used: compress() and decompress()
       release the GIL */
    return PyMem_RawMalloc((size_t)items * (size_t)size);
}

In subprocess_fork_exec_impl:

        if (extra_group_size > 0) {
            extra_groups = PyMem_RawMalloc(extra_group_size * sizeof(gid_t));
            if (extra_groups == NULL) {
                PyErr_SetString(PyExc_MemoryError,
                        "failed to allocate memory for group list");
                goto cleanup;
            }
        }

In _Py_DecodeUTF8Ex:

    /* Note: size will always be longer than the resulting Unicode
       character count */
    if (PY_SSIZE_T_MAX / (Py_ssize_t)sizeof(wchar_t) - 1 < size) {
        return -1;
    }

    unicode = PyMem_RawMalloc((size + 1) * sizeof(wchar_t));
    if (!unicode) {
        return -1;
    }

In _Py_EncodeUTF8Ex:

    if (raw_malloc) {
        bytes = PyMem_RawMalloc((len + 1) * max_char_size);
    }

In decode_current_locale of fileutils.c:

        if (argsize > PY_SSIZE_T_MAX / sizeof(wchar_t) - 1) {
            return -1;
        }
        res = (wchar_t *)PyMem_RawMalloc((argsize + 1) * sizeof(wchar_t));
        if (!res) {
            return -1;
        }

And in some other places (perhaps 4-5 more). Note that there are many places where RawMalloc is invoked without a multiplied operand so those places are ok.

picnixz · 2024-12-03T14:01:41Z

Note that for the Unicode-related calls, we should perhaps leave them as is since size + 1 could overflow as well. Alternatively, deducing the expected size could be an alternative to the macro.

ZeroIntensity · 2024-12-03T14:55:56Z

IMO, if we want to make arrays for us easier, the more ideal solution would be something like an internal array API, not more allocator macros. For users, I could see some usage, but in my experience, users choose either the system malloc or PyMem_Malloc, never really the raw allocator. A code search is saying otherwise though.

vstinner · 2024-12-03T14:59:32Z

For users, I could see some usage, but in my experience, users choose either the system malloc or PyMem_Malloc, never really the raw allocator.

I added PyMem_RawMalloc() to be able to trace these memory allocation made by Python using the tracemalloc debug tool.

vstinner · 2024-12-03T15:03:13Z

I'm +0 on adding PyMem_RawNew() and PyMem_RawResize(). I can see the benefits of "built-in" integer overflow checks, but I'm also used to test manually for integer overflow. I don't have a strong opinion, since it's less common to use PyMem_RawMalloc() than using PyMem_Malloc()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `PyMem_Raw{New,Resize}` convenience macros #50

Add `PyMem_Raw{New,Resize}` convenience macros #50

picnixz commented Nov 29, 2024 •

edited

Loading

encukou commented Dec 2, 2024

picnixz commented Dec 2, 2024

encukou commented Dec 3, 2024

picnixz commented Dec 3, 2024

vstinner commented Dec 3, 2024

picnixz commented Dec 3, 2024

picnixz commented Dec 3, 2024 •

edited

Loading

ZeroIntensity commented Dec 3, 2024

vstinner commented Dec 3, 2024

vstinner commented Dec 3, 2024 •

edited

Loading

Add PyMem_Raw{New,Resize} convenience macros #50

Add PyMem_Raw{New,Resize} convenience macros #50

Comments

picnixz commented Nov 29, 2024 • edited Loading

encukou commented Dec 2, 2024

picnixz commented Dec 2, 2024

encukou commented Dec 3, 2024

picnixz commented Dec 3, 2024

vstinner commented Dec 3, 2024

picnixz commented Dec 3, 2024

picnixz commented Dec 3, 2024 • edited Loading

ZeroIntensity commented Dec 3, 2024

vstinner commented Dec 3, 2024

vstinner commented Dec 3, 2024 • edited Loading

Add `PyMem_Raw{New,Resize}` convenience macros #50

Add `PyMem_Raw{New,Resize}` convenience macros #50

picnixz commented Nov 29, 2024 •

edited

Loading

picnixz commented Dec 3, 2024 •

edited

Loading

vstinner commented Dec 3, 2024 •

edited

Loading