-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VFPU sin/cos #16984
VFPU sin/cos #16984
Conversation
Also fix the license text.
Looks good, though I'm not the biggest fan of giant text-format headers with binary data. Maybe a binary file in /assets would make sense? "inc-bin"-type approaches are probably out due to cross platform issues. |
Actually maybe it's not so bad these days with the header thing, compiling them isn't that slow.. |
Not terribly pleased with the header myself. As for speed, this has some benchmarks (this specific header represents 473KB of binary, so 400KB column is closest: compile time should be around 2s for GCC, and 4s for MSVC). Memory benchmarks below that are also interesting. |
So I assume this one is obsoleted. |
VFS rework is done, but shouldn't make much difference. Yeah, I think it makes more sense to have a PR with everything in at once, then this can be tested against the math-problematic games like Ridge Racer more easily. |
Hm, storing tables as files in theory requires to do something about endianness. Anyway, I'm having trouble testing this, because on my machine (32-bit Linux) complied PPSSPP (master, e.g. Some details, in case it is useful: GDB
It crashes fairly consistently at this location, regardless of what game is loaded. |
Hm, sorry for that, I'll take a look. It's my texture replacer stuff, somehow. Working on fixing up some stuff I broke. As for endianness, meh, we only support little-endian platforms currently anyway. Big-endian is gone. |
Actually you should hopefully be fine if you update to the current master now. |
Yes, fixed now. |
Looks like PPSSPP doesn't do As for Ridge Racer, it seems that while JIT, Interpreter and IR all give different results, none of them are affected by these functions (car path, that is; camera is affected if I make them return something funky). Or I might have missed something. I just added the functions to MIPSVFPUUtils.cpp, and tweaked MIPSIntVFPU.cpp to actually use them. Didn't touch JIT (e.g. CompVFPU.cpp) which currently seems to be doing its own thing for at least Might add the commits, so you can take a look, but I need to figure out how to make |
Okay, after Again, this only affects 32-bit x86 (and only on GCC). |
That's a good catch, great to have 32-bit match! |
Ok, WIP version. |
...and I got it (copying to assets/vfpu on build) wrong anyway (it was mostly guessing, without actually knowing CMake). Other notes:
|
Rather, it looks like the |
Yeah that wouldn't surprise me, feel free to add that. |
Let's try this again. |
Also fix the license text.
Such globals I think are fine for this kind of use, but I do generally prefer to put them at the top of the file instead - statics in functions are harder to see and care needs to be taken with globals in general. |
Right. Just to say, we should also not need to really care about threading here since only one thread should ever actively call these funcs.
Ah, that's less reduction than I imagined, might still be a win at low compression levels, though they probably compress worse individually. -[Unknown] |
I think C++11 guarantees that local statics are initialized exactly once (on "first" invocation), even in multi-threaded setting, which is what I'm trying to leverage (can these functions be called from multiple threads?), rather than rolling my own thread-safe loader. The actual code looks like this: float vfpu_exp2(float x) {
static bool loaded =
LOAD_TABLE(vfpu_exp2_lut65536, 512)&&
LOAD_TABLE(vfpu_exp2_lut, 262144);
// Rest of the function.
} The tables themselves are still declared globally. |
As unknown says, the threading behavior is not relevant since we only run the CPU emulation on one thread. |
Some tidbits, if you find them useful.
but, perhaps, that's what it's supposed to do.
|
Load-on-demand stuff (no fallbacks presently). |
Quick-fixed the AVX bug in #17101. It's just that we were testing AVX codegen and disassembly. -[Unknown] |
Actually
No, I don't know why. |
Hm, vfpu_sin_lut8192 is nullptr for some reason on Windows when running the unit tests, which is why it crashes. Let's see. Or not, my working dir for the debug session had the wrong assets I think. -[Unknown] |
Ok, lets try to bring in #17102 (and just up-to-date with master). |
Actually, I think it's still not working - I forgot to merge when I pushed to actions, I only tested the merge locally. I suspect it's still broken, testing a change on my fork's actions now in case. If so, might have to merge one more time (sorry.) -[Unknown] |
Ok, now (after #17105) it seems to work. To reiterate: JIT is still mostly not using these functions (it already uses At least one hex-float constant (needs C++17) is still there, but it seems to compile fine on all platforms, so leaving it as-is. No fallback for now: if it cannot load the tables most calls would just crash (e.g.
Can be both HP or BRV attack, from either player or opponent. |
Problematic replays from #6710 are still available, but just in case, here's a copy, in zip format: Haven't tested yet. |
Hm, Replay16 looks different now, but (I think) still desyncs. |
Well, since artifacts are available, someone might want to check the status of: |
Oh, I'm dumb.
I started to wonder, why the tables for Also, tables for |
Hm, with Dissidia replays are also affected, but not fixed. |
Building this, I'm getting lot of warnings about applying unary minus to an unsigned value, like these:
But I think we can indeed trust that all compilers will do the same thing with that, so we should probably just add a #pragma warning(ignore) kind of thing? Anyway, I tested a bit on Android, no unknown issues (not that I would expect any). |
Sounds ok. |
Alright, let's just get it in now. Not super happy about the large tables of course, but the improved accuracy should be worth it in the long run. I'm thinking maybe we should fall back to inaccurate functions if the files are not there, but we already have several hard dependencies on the assets folder so we'd be screwed anyway. Though we are largely compatible with mismatched-version assets. |
Now that it's in the codebase, just to list some remaining things:
Probably not by me (at least, I'm not confident enough to mess with JIT). |
…sing. PR #16984 added more accurate versions of these functions, but they require large lookup tables stored in assets/. If these files are missing, PPSSPP would simply crash, which isn't good. We should probably try to warn the user somehow that these files are missing, though...
Implements #16946, with the recent fix.