-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuple-dialect has performance consequences (anonymous functions?) #17126
Comments
|
Great catch as always, @Jutho. Doesn't seem to affect the timing, though; the reason the |
Worth noting that making the top-level functions |
Hmm... That sounds a bit strict. The threads I read defined "pure" as any function the programmer is happy to tell the compiler it can run any time (once at compile time, more than once at run time, etc) without worrying about the consequence on program logic. (and the specific implementation of the compiler will affect the speed/improve inference/etc, but not change the program output). |
Follow-up: Obviously, having it be a function of types makes it most obvious to be able to run it at compile-time, but inference now tracks many constant values too... |
I guess part of my thinking was that it would be nice to mark Would be great to be able to say "do the expansion at compile time, but save the evaluation of the user function for run time" (naturally, |
I think you are safe to mark |
Based on discussion in #17729,
|
It feels like we (you @timholy) keep finding these cases where adding explicit |
I would love to see improvements in our inlining heuristics. I don't understand them at all, so that's not an area I've looked at or plan to look at. I try to be fairly conservative about not adding |
This may be #15276, but it seems that issue has various gradations of challenge, so perhaps additional examples are useful.
This contains test code implementing the equivalent of
map(s->1:s, size(A))
using 3 different strategies: one based onntuple
, several variants ofmap
, and one that manually assembles the tuples. On my laptop, the variation in performance of these methods appears to be >5x; presumably, we'd prefer that they all be the same (and all good!). Naturally, the best of these is also the most laborious (the manual method), and it beats all the others by a factor of 2.This is admittedly nitty-gritty microoptimization, but was necessary to get #16260 to pass nanosoldier; as I work on the cleanup (#16973), I thought I'd better take the time to document some of the challenges.
Results:
The text was updated successfully, but these errors were encountered: