-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mypyc] Avoid boxing/unboxing when coercing between tuple types #14899
Conversation
Instead, coerce each tuple item individually. This makes some coercions between tuple types much faster, primarily because there is less (or no) allocation and deallocation going on. This speeds up the raytrace benchmark by about 7% (when using native floats). Related to mypyc/mypyc#99.
@@ -666,13 +665,11 @@ L0: | |||
return r0 | |||
def g(): | |||
r0 :: tuple[int, int] | |||
r1 :: object | |||
r2 :: tuple[int64, int64] | |||
r1 :: tuple[int64, int64] | |||
L0: | |||
r0 = (2, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little unfortunate that this remains in the IR, but it's nothing DCE can't fix if we ever implement it.
tuple_T288 CPyDef_g(void) {
tuple_T2II cpy_r_r0;
tuple_T288 cpy_r_r1;
CPyL0: ;
cpy_r_r0.f0 = 2;
cpy_r_r0.f1 = 4;
CPyTagged_INCREF(cpy_r_r0.f0);
CPyTagged_INCREF(cpy_r_r0.f1);
CPyTagged_DECREF(cpy_r_r0.f0);
CPyTagged_DECREF(cpy_r_r0.f1);
cpy_r_r1.f0 = 1;
cpy_r_r1.f1 = 2;
return cpy_r_r1;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C compiler may optimize some of these away. Also, optimizing these shouldn't be too difficult in mypyc. Added mypyc/mypyc#983 to track this.
# We can't reuse register values, since they can be modified. | ||
if not isinstance(item, Register): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocking comment, moreso for my own learning. What situations exist where a register is modified during the coerce operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about code like this:
t = (n, 0)
n += 1
t2: tuple[float, float] = t
We can't use n
to refer to the first item of t
when constructing t2
, since n
was incremented on line 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. I should've thought of that, painfully obvious now you show it. Thanks!
Instead, coerce each tuple item individually. This makes some coercions between tuple types much faster, primarily because there is less (or no) allocation and deallocation going on.
This speeds up the raytrace benchmark by about 7% (when using native floats).
Related to mypyc/mypyc#99.