Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: Initial vertexjit #16957

Merged
merged 7 commits into from
Feb 13, 2023
Merged

Conversation

unknownbrackets
Copy link
Collaborator

This lacks skinning (probably most important) and morph, but at least has all the standard stuff. I didn't try to do any of it with vector instructions since I don't have a device with them.

Gave a modest improvement of Crisis Core from around ~75% speed to ~77% speed, while using IR and forced on vertexjit. For now, not actually enabled for anyone due to the jit setting.

-[Unknown]

@unknownbrackets unknownbrackets added this to the v1.15.0 milestone Feb 13, 2023
@hrydgard
Copy link
Owner

Looks very nice and straightforward.

By the way, before you do the skinning, I've been informed that I've been doing skinning backwards from the "common wisdom" - instead of accumulating the matrices using the weights and then using that matrix to transform the vector, the standard approach is to transform the input vector by each matrix, and then accumulate those small transformed vectors by the weights. Since it's all linear, those are equivalent barring floating point errors. I haven't counted operations to verify whether the common wisdom really is the best though, it may be that interpolating the matrices is faster in practice or maybe it doesn't matter at all.

@hrydgard
Copy link
Owner

Oh yeah, would be curious to know how hardware skinning does on the chip. Depending on how bad drawcall overhead is, of course, it can be hard to tell where the win or loss comes from (since hw skinning usually causes a lot more drawcalls due to all the matrix switching).

@hrydgard hrydgard merged commit 86a19ce into hrydgard:master Feb 13, 2023
@unknownbrackets
Copy link
Collaborator Author

Yeah, that'll be worth testing. It could be better. I'm still trapped in GLES land which, to my understanding, is actually just translating to Vulkan via ANGLE. There might be interesting overheads there.

Hm. That'd mean doing say 8 vector-matrix multiplies (probably at least 6+ instructions each), and then weight summing them (7 simd muladds.) Currently we do the weight sums (7 * 4 simd muladds) and one vector-matrix multiply (6+.) I feel like our current way should be much cheaper especially when we reuse the calculated matrix for norm and pos...

-[Unknown]

@unknownbrackets unknownbrackets deleted the riscv-vertexjit branch February 13, 2023 07:42
@hrydgard
Copy link
Owner

Oh yeah, when we reuse the matrix for both nrm and pos makes the matrix interpolation way better, of course... Probably not even worth testing the other way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants