-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIEX] Delay metalizing of multi-slot until iterative scheduling is converged #182
base: aie-public
Are you sure you want to change the base?
Conversation
@martien-de-jong , @andcarminati. |
Hi @krishnamtibrewala, nice work! Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a test example where we see improvement?
46df116
to
54977ed
Compare
Do you have some results for the PixelShuffle*/PixelUnshuffle* benchmarks? If I remember correctly, we have some suboptimal mov desc assignments (movx should be selected instead of mova to not shift loads ups).
Could we have a test example where we see improvement?
Based on discussion with @gbossu we were expecting some impact but with current implementation QoR have no change. |
54977ed
to
12024f9
Compare
$wh10 = VMOV_mv_w $wl0 | ||
JNZ $r3, %bb.1 | ||
DelayedSchedBarrier | ||
bb.2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general advice, I think we should have a class to manage the description handling. We can encapsulate and use it in HazardRecognizer and InterBlockScheduling. |
98514f1
to
4033a60
Compare
4033a60
to
aaf0aa8
Compare
aaf0aa8
to
904d379
Compare
904d379
to
49f3ad1
Compare
02d0b50
to
8bf356f
Compare
7a241b2
to
fdab2af
Compare
Could you summarize what this PR does? Maybe in the PR description. I'm particularly interested in:
|
fdab2af
to
74da8ee
Compare
74da8ee
to
f7ef02c
Compare
liveins: $r1, $r2 | ||
successors: %bb.3 | ||
$r2 = OR $r2, $r1 | ||
bb.3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test is unfortunately very hard to read. Could you think of something smaller that shows a diff? I'd also suggest avoiding two labels like on/off and stick to unique CHECK lines unless the test is very concise.
QoR Results are as followed, there are few regression reset all benchmarks have same QoR Select_aie2_bf16 409 440 REGR(+7.58%) |
This PR allows Multi-Slot Instr. to be used during iterative scheduling of loop.
Before this PR we were materializing Multi-Slot Instr. to selected OpCode/Slot after first iteration of iterative scheduling.
Now we wait until PostRA scheduling is converged to an acceptable schedule and we have decided to commit the schedule and move to next MBB.
Note : Given the information of what Alternate opcode/desc was selected is stored in Hazard Recognizer for a region.
And by the time we come to the end of MBB ( i.e leaveMBB() ) we do not have access to the instance of those Hazard Recognizer, therefore we need to make the Alternate opcode/desc part of the
BlockState