-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv: Initial vertexjit #16957
riscv: Initial vertexjit #16957
Conversation
This don't feel very efficient, but they overall beat non-jit.
Looks very nice and straightforward. By the way, before you do the skinning, I've been informed that I've been doing skinning backwards from the "common wisdom" - instead of accumulating the matrices using the weights and then using that matrix to transform the vector, the standard approach is to transform the input vector by each matrix, and then accumulate those small transformed vectors by the weights. Since it's all linear, those are equivalent barring floating point errors. I haven't counted operations to verify whether the common wisdom really is the best though, it may be that interpolating the matrices is faster in practice or maybe it doesn't matter at all. |
Oh yeah, would be curious to know how hardware skinning does on the chip. Depending on how bad drawcall overhead is, of course, it can be hard to tell where the win or loss comes from (since hw skinning usually causes a lot more drawcalls due to all the matrix switching). |
Yeah, that'll be worth testing. It could be better. I'm still trapped in GLES land which, to my understanding, is actually just translating to Vulkan via ANGLE. There might be interesting overheads there. Hm. That'd mean doing say 8 vector-matrix multiplies (probably at least 6+ instructions each), and then weight summing them (7 simd muladds.) Currently we do the weight sums (7 * 4 simd muladds) and one vector-matrix multiply (6+.) I feel like our current way should be much cheaper especially when we reuse the calculated matrix for norm and pos... -[Unknown] |
Oh yeah, when we reuse the matrix for both nrm and pos makes the matrix interpolation way better, of course... Probably not even worth testing the other way. |
This lacks skinning (probably most important) and morph, but at least has all the standard stuff. I didn't try to do any of it with vector instructions since I don't have a device with them.
Gave a modest improvement of Crisis Core from around ~75% speed to ~77% speed, while using IR and forced on vertexjit. For now, not actually enabled for anyone due to the jit setting.
-[Unknown]