Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python reports vectorcall as available even when using it will not help and even hurt performance #123372

Closed
ChayimFriedman2 opened this issue Aug 27, 2024 · 2 comments
Labels
performance Performance or resource usage topic-C-API

Comments

@ChayimFriedman2
Copy link

ChayimFriedman2 commented Aug 27, 2024

Bug report

Bug description:

The following Python function

def foo(*args): pass

Has vectorcall enabled (PyVectorcall_Function() returns a non-NULL function), but because it collects args, Python needs to convert the arguments into a tuple, and so vectorcall won't be any faster than allocating a tuple directly.

This is not a problem if we know the number of arguments beforehand, but if we allocate the array for vectorcall, we will allocate twice needlessly.

Avoiding allocating for vectorcall also does not always provide the best performance, because vectorcall can have an edge with bound methods with the flag PY_VECTORCALL_ARGUMENTS_OFFSET.

Per the docs:

A class should not implement vectorcall if that would be slower than tp_call. For example, if the callee needs to convert the arguments to an args tuple and kwargs dict anyway, then there is no point in implementing vectorcall.

As such, I believe this function (and similar functions) should not implement vectorcall.

Originally reported on Stack Overflow.

CPython versions tested on:

3.12

Operating systems tested on:

Windows

@ChayimFriedman2 ChayimFriedman2 added the type-bug An unexpected behavior, bug, or error label Aug 27, 2024
@python python deleted a comment Aug 27, 2024
@python python deleted a comment Aug 27, 2024
@python python deleted a comment from amir1387aht Aug 27, 2024
@python python deleted a comment from amir1387aht Aug 27, 2024
@python python deleted a comment Aug 27, 2024
@python python deleted a comment Aug 27, 2024
@mdboom mdboom added the performance Performance or resource usage label Aug 27, 2024
@encukou
Copy link
Member

encukou commented Sep 13, 2024

As the docs mention, vectorcall support is a per-class property. For most Python functions, vectorcall is faster, so the function type implements vectorcall.

I believe this function (and similar functions) should not implement vectorcall.

It is possible to disable vectorcall function per instance, but that's not the only thing that would need to change: Python functions would also need to implement tuple-and-dict calls, which they currently don't. In fact, there is a substantial chunk of code that passes arguments in the vectorcall format -- from _PyFunction_Vectorcall to initialize_locals:

#6  0x00000000005b91d4 in initialize_locals at Python/ceval.c:1471
#7  0x00000000005ba259 in _PyEvalFramePushAndInit at Python/ceval.c:1733
#8  0x00000000005bef71 in _PyEval_EvalFrameDefault at Python/generated_cases.c.h:894
#9  0x00000000005d3a70 in _PyEval_EvalFrame at ./Include/internal/pycore_ceval.h:119
#10 0x00000000005d3b9e in _PyEval_Vector at Python/ceval.c:1851
#11 0x000000000049e580 in _PyFunction_Vectorcall at Objects/call.c:413

All this would need a separate code path in order to preserve a tuple. I doubt that it would be worth the speedup for (*args) functions.

@encukou encukou removed the type-bug An unexpected behavior, bug, or error label Sep 13, 2024
@encukou
Copy link
Member

encukou commented Sep 20, 2024

There is no clear way forward, so I'll close this.
If you want to push the idea forward it might be best to start with a benchmark at pyperformance, or an implementation draft.

@encukou encukou closed this as completed Sep 20, 2024
@encukou encukou closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-C-API
Projects
None yet
Development

No branches or pull requests

4 participants