-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] experimental memref_stream unroll-and-jam pass using interpreter-based cost model #3724
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3724 +/- ##
==========================================
- Coverage 91.30% 91.28% -0.02%
==========================================
Files 468 471 +3
Lines 58636 58697 +61
Branches 5656 5661 +5
==========================================
+ Hits 53535 53581 +46
- Misses 3650 3662 +12
- Partials 1451 1454 +3 ☔ View full report in Codecov by Sentry. |
d0066cc
to
d770896
Compare
In the marimo notebook, when I click on |
uh oh |
I have a feeling it was a python 3.12 thing, should work now |
OK despite the CI still failing it should work locally for you |
It's nice. I like the "lazy pass" pattern. And I like the fact that the unroll and jam is confined at linalg level (rather than materializing loops). In the case of this cost model, it's minimizing the number of jumps/blt (thanks to unrolling) that reduces the cost, right? |
Yep, basically. It should be relatively easy to make the cost model a bit smarter and to add latency of instructions. |
59871dc
to
45d57ed
Compare
Most of the juicy stuff here is in the autotune.py marimo notebook, best experienced by checking out this branch and running
uv run marimo edit docs/marimo
.The main idea here is to have a system that proposes a bunch of rewrites, and then evaluates each of these rewrites based on an additional lowering + interpreter tracing.