Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low effort, precise specialization in tier 2 using lookup tables, replace and replicate. #660

Open
markshannon opened this issue Mar 7, 2024 · 5 comments

Comments

@markshannon
Copy link
Member

markshannon commented Mar 7, 2024

The primary motivation for this is BINARY_OP, but it can apply to BINARY_SUBSCR, COMPARE_OP, and CONTAINS_OP.

The idea is that we add a single tier1 specialization that contains a table lookup. The table is const table of function pointers.
That is:

replaced op(_BINARY_OP_TABLE_LOOKUP, (index/1, left, right -- res)) {
    res = BinaryOpTable[index](left, right);
}
macro (BINARY_OP_TABLE) = _BINARY_TYPE_CHECK + _BINARY_OP_TABLE_LOOKUP;

We replace _BINARY_OP_TABLE_LOOKUP in tier 2 with

replicate(N) op(_BINARY_OP_INLINE, (left, right -- res)) {
    res = BinaryOpTable[oparg](left, right);
}

Note how _BINARY_OP_INLINE replaces index with oparg.
As oparg is a const, the C compiler will (at least it should) eliminate the table lookup.
If we declare the relevant function as inline, it will be inlined and we end up with an optimal tier 2 uop.

@Fidget-Spinner
Copy link
Collaborator

This will slow down tier 1 though. The last time we tried this on BINARY_SUBSCR way back, it was a wash in perf. Are you hoping that the tier 2 will take over soon enough that that doesn't matter?

@markshannon
Copy link
Member Author

If we eliminate the existing specializations of BINARY_OP, specifically int and float addition and multiplication, it might slow things down. We can leave those and add BINARY_OP_TABLE for all the other cases if it gives better performance.

@markshannon
Copy link
Member Author

markshannon commented Mar 7, 2024

The last time we tried this on BINARY_SUBSCR way back, it was a wash in perf. Are you hoping that the tier 2 will take over soon enough that that doesn't matter?

I don't expect this to help in tier 1. We should expect a speedup in tier 2.
So, as long as it isn't slower for tier 1, we should get an overall speedup.

@Fidget-Spinner
Copy link
Collaborator

If we eliminate the existing specializations of BINARY_OP, specifically int and float addition and multiplication, it might slow things down. We can leave those and add BINARY_OP_TABLE for all the other cases if it gives better performance.

Make sense to me.

@gvanrossum
Copy link
Collaborator

As oparg is a const, the C compiler will (at least it should) eliminate the table lookup.
If we declare the relevant function as inline, it will be inlined and we end up with an optimal tier 2 uop.

This expresses a lot of confidence in the compiler (taking a const function pointer and expanding it inline). Have you tried to see if that works experimentally?

What would _BINARY_TYPE_CHECK do?

Suppose we'd want to do this for BINARY_OP_ADD_UNICODE, what would be in the table? I guess a function that's the equivalent of _BINARY_OP_ADD_UNICODE -- but that has an ERROR_IF() call, which might be tricky (it uses goto). Assuming we solve that (or use a different example that doesn't error), would the _BINARY_TYPE_CHECK also use a table? With a different-valued index perhaps? (I'm sure we can fix two 8-bit index values in a single cache entry -- but that's still one more than BINARY_OP currently has.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants