Optimization of valid index calculation #498

eendebakpt · 2022-11-18T19:59:29Z

In listobject.c there is an optimization to check whether an index is valid (e.g. 0 <= index < N) using a single comparison.
The same optimization is not used in other files such as tupleobject.c (except for _collectionsmodule.c)

Is there a reason the optimization is used only for the list and not a tuple?
Would a PR applying this optimization be acceptable? (either only for the tuple, or for all locations where we can do this)

I created a branch with the idea for the tupleobject.c since that might be the most performance critical:

python/cpython@main...eendebakpt:cpython:list_tuple

I could not measure significant performance improvements on my system, but perhaps there are other systems/compilers where there is a measurable difference.

The text was updated successfully, but these errors were encountered:

brandtbucher · 2022-11-18T20:11:48Z

If we do this, we should just define a nice, readable macro for it in pymacro.h and stop reinventing the wheel everywhere. We do similar moves, for example, all over bytecodes.c.

eendebakpt · 2022-11-22T20:51:33Z

If we do this, we should just define a nice, readable macro for it in pymacro.h and stop reinventing the wheel everywhere. We do similar moves, for example, all over bytecodes.c.

@brandtbucher Do you mean bytesarray.c or bytesobject.c? In those I can find index checks, but not in bytecodes.c.

Putting a macro in pymacro.h seems like a good idea. A real macro, or an inline function then? Such a change would make the code more consistent, but it would also mean many (very small) changes in the codebase.

gvanrossum · 2022-11-22T21:15:42Z

I think Brandt is referring to things like

            Py_ssize_t signed_magnitude = Py_SIZE(sub);
            DEOPT_IF(((size_t)signed_magnitude) > 1, BINARY_SUBSCR);

which I found at line 386 in Python/bytecodes.c.

eendebakpt · 2022-11-24T22:02:47Z

That is indeed a similar trick! I found 3 occurrences in bytecodes.c, the code then changes from:

DEOPT_IF(((size_t)signed_magnitude) > 1, BINARY_SUBSCR);

to

DEOPT_IF(invalid_index(signed_magnitude, 2), BINARY_SUBSCR);

JelleZijlstra · 2022-11-24T23:44:59Z

Does this actually make a difference in practice? Can't the compiler already do this optimization?

eendebakpt · 2022-11-25T10:09:13Z

Does this actually make a difference in practice? Can't the compiler already do this optimization?

The PR is created is more about consistency than optimization. With optimizations on there is a difference in the generated code (an extra test %rsi,%rsi, see details below), but I do not think it makes a large difference (perhaps only in the tuple/list indexing or the usage in bytecodes.c)

Output of `gdb -batch -ex "disassemble/rs tupleitem" ./python`

Main build with optimizations on:

Dump of assembler code for function tupleitem:
Objects/tupleobject.c:
363	{
   0x0000000000244e50 <+0>:	f3 0f 1e fa	endbr64 

364	    if (i < 0 || i >= Py_SIZE(a)) {
   0x0000000000244e54 <+4>:	48 85 f6	test   %rsi,%rsi
   0x0000000000244e57 <+7>:	78 10	js     0x244e69 <tupleitem+25>

./Include/object.h:
144	    return var_ob->ob_size;
   0x0000000000244e59 <+9>:	48 3b 77 10	cmp    0x10(%rdi),%rsi
   0x0000000000244e5d <+13>:	7d 0a	jge    0x244e69 <tupleitem+25>

Objects/tupleobject.c:
368	    return Py_NewRef(a->ob_item[i]);
   0x0000000000244e5f <+15>:	48 8b 44 f7 18	mov    0x18(%rdi,%rsi,8),%rax

./Include/object.h:
641	    Py_INCREF(obj);
   0x0000000000244e64 <+20>:	48 83 00 01	addq   $0x1,(%rax)

642	    return obj;
   0x0000000000244e68 <+24>:	c3	retq   

Objects/tupleobject.c:
365	        PyErr_SetString(PyExc_IndexError, "tuple index out of range");
   0x0000000000244e69 <+25>:	50	push   %rax
   0x0000000000244e6a <+26>:	48 8b 3d 8f 52 28 00	mov    0x28528f(%rip),%rdi        # 0x4ca100 <PyExc_IndexError>
   0x0000000000244e71 <+33>:	48 8d 35 fa e5 12 00	lea    0x12e5fa(%rip),%rsi        # 0x373472
   0x0000000000244e78 <+40>:	e8 13 17 05 00	callq  0x296590 <PyErr_SetString>

366	        return NULL;
   0x0000000000244e7d <+45>:	31 c0	xor    %eax,%eax
   0x0000000000244e7f <+47>:	5a	pop    %rdx
   0x0000000000244e80 <+48>:	c3	retq   
End of assembler dump.

PR build with optimizations on:

Dump of assembler code for function tupleitem:
Objects/tupleobject.c:
363	{
   0x0000000000245820 <+0>:	f3 0f 1e fa	endbr64 

./Include/pymacro.h:
171	    return (size_t)i >= (size_t)limit;
   0x0000000000245824 <+4>:	48 39 77 10	cmp    %rsi,0x10(%rdi)
   0x0000000000245828 <+8>:	76 0a	jbe    0x245834 <tupleitem+20>

Objects/tupleobject.c:
368	    return Py_NewRef(a->ob_item[i]);
   0x000000000024582a <+10>:	48 8b 44 f7 18	mov    0x18(%rdi,%rsi,8),%rax

./Include/object.h:
646	    Py_INCREF(obj);
   0x000000000024582f <+15>:	48 83 00 01	addq   $0x1,(%rax)

647	    return obj;
   0x0000000000245833 <+19>:	c3	retq   

Objects/tupleobject.c:
365	        PyErr_SetString(PyExc_IndexError, "tuple index out of range");
   0x0000000000245834 <+20>:	50	push   %rax
   0x0000000000245835 <+21>:	48 8b 3d c4 48 28 00	mov    0x2848c4(%rip),%rdi        # 0x4ca100 <PyExc_IndexError>
   0x000000000024583c <+28>:	48 8d 35 2f dc 12 00	lea    0x12dc2f(%rip),%rsi        # 0x373472
   0x0000000000245843 <+35>:	e8 68 17 05 00	callq  0x296fb0 <PyErr_SetString>

366	        return NULL;
   0x0000000000245848 <+40>:	31 c0	xor    %eax,%eax
   0x000000000024584a <+42>:	5a	pop    %rdx
   0x000000000024584b <+43>:	c3	retq   
End of assembler dump.

System: linux, gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

eendebakpt mentioned this issue Nov 24, 2022

Consistent use of valid index check python/cpython#99761

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of valid index calculation #498

Optimization of valid index calculation #498

eendebakpt commented Nov 18, 2022

brandtbucher commented Nov 18, 2022

eendebakpt commented Nov 22, 2022

gvanrossum commented Nov 22, 2022

eendebakpt commented Nov 24, 2022

JelleZijlstra commented Nov 24, 2022

eendebakpt commented Nov 25, 2022 •

edited

Loading

Optimization of valid index calculation #498

Optimization of valid index calculation #498

Comments

eendebakpt commented Nov 18, 2022

brandtbucher commented Nov 18, 2022

eendebakpt commented Nov 22, 2022

gvanrossum commented Nov 22, 2022

eendebakpt commented Nov 24, 2022

JelleZijlstra commented Nov 24, 2022

eendebakpt commented Nov 25, 2022 • edited Loading

eendebakpt commented Nov 25, 2022 •

edited

Loading