-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accurate frest & frsqest #15079
Accurate frest & frsqest #15079
Conversation
96b369c
to
df6c8f7
Compare
const auto eval_sign = eval(extract(a_sign, i)); | ||
|
||
value_t<u32> r_fraction = load_const<u32>(m_spu_frest_fraction_lut, eval_fraction); | ||
value_t<u32> r_exponent = load_const<u32>(m_spu_frest_exponent_lut, eval_exponent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened to the gather version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into a bit more, the only intrinsic that'd make sense for use is llvm.masked.gather (Intrinsic::masked_gather) and it pretty much what I do except it requires a vector of pointers as a starting point(see https://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics ). Of note is that it describes what the Instrinsic with all true mask results in which is very similar to what I already do(except I do the pointer calculations in the middle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what's preventing this solution from being used here? You can load the base pointers into a 512 bit vector, add index, and load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the amount of pointers (8) is the problem you can try to force the upper 4 indices to be equal to the lowest index so only 4 memory locations are accessed.
It should be tested if it benefits performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried Intrinsic::vp_gather which seemed as good(though it returns an array of u64) but I can't seem to get it to work, it crashes during IR production from parsing the parameters of the CreateCall. If someone think they can make it work they're welcome to try.
Could this also fix physics issues e.g GTA IV falling through the ground without accurate Xfloat? |
inFamous has weird shadow striping with Accurate XFloat in this PR: inFamousSPUStripes.mp4Doesn't happen with Approximate though, but I don't know if this PR fixes the gameplay issues has without Accurate. |
NFS Most Wanted and The Run audio doesnt break when using approximate xfloat after testing this PR. |
Watch Dogs also works fine. |
49a6c65
to
2b32be5
Compare
inFamous still has the Accurate XFloat bug but it also seems to affect ASMJIT as well, here are some logs for them: |
However testing NFS Most Wanted with ASMJIT recompiler causes an audio regression to make loud and broken noises, which doesnt happen on master builds. |
Yes I just noticed that asmjit doesn't implement FI at all which the result of frest/frsqest rely on. |
db251fd
to
998ee39
Compare
998ee39
to
0071e88
Compare
Added FI implementation for ASMJit and removed special case for accurate xfloat in FI which just forwarded the value. |
The inFamous Striping issue has been resolved. |
They should now be 1:1 with ps3.
Could fix weird graphical issues coming from spu accuracy.