riscv: Initial vertexjit #16957

unknownbrackets · 2023-02-13T07:08:39Z

This lacks skinning (probably most important) and morph, but at least has all the standard stuff. I didn't try to do any of it with vector instructions since I don't have a device with them.

Gave a modest improvement of Crisis Core from around ~75% speed to ~77% speed, while using IR and forced on vertexjit. For now, not actually enabled for anyone due to the jit setting.

-[Unknown]

This don't feel very efficient, but they overall beat non-jit.

hrydgard · 2023-02-13T07:19:55Z

Looks very nice and straightforward.

By the way, before you do the skinning, I've been informed that I've been doing skinning backwards from the "common wisdom" - instead of accumulating the matrices using the weights and then using that matrix to transform the vector, the standard approach is to transform the input vector by each matrix, and then accumulate those small transformed vectors by the weights. Since it's all linear, those are equivalent barring floating point errors. I haven't counted operations to verify whether the common wisdom really is the best though, it may be that interpolating the matrices is faster in practice or maybe it doesn't matter at all.

hrydgard · 2023-02-13T07:21:39Z

Oh yeah, would be curious to know how hardware skinning does on the chip. Depending on how bad drawcall overhead is, of course, it can be hard to tell where the win or loss comes from (since hw skinning usually causes a lot more drawcalls due to all the matrix switching).

unknownbrackets · 2023-02-13T07:41:57Z

Yeah, that'll be worth testing. It could be better. I'm still trapped in GLES land which, to my understanding, is actually just translating to Vulkan via ANGLE. There might be interesting overheads there.

Hm. That'd mean doing say 8 vector-matrix multiplies (probably at least 6+ instructions each), and then weight summing them (7 simd muladds.) Currently we do the weight sums (7 * 4 simd muladds) and one vector-matrix multiply (6+.) I feel like our current way should be much cheaper especially when we reuse the calculated matrix for norm and pos...

-[Unknown]

hrydgard · 2023-02-13T09:23:03Z

Oh yeah, when we reuse the matrix for both nrm and pos makes the matrix interpolation way better, of course... Probably not even worth testing the other way.

unknownbrackets added 7 commits February 12, 2023 10:06

Osk: Update by button flag consistently.

0532b35

riscv: Cleanup missing Poison, Crash.

89c18d8

riscv: Fix poison with compressed instructions.

0b05d20

riscv: Initial vertexjit.

219e0db

riscv: Add basic steps to vertex decode.

ee10fae

riscv: Add colors to vertexjit.

77b2e63

This don't feel very efficient, but they overall beat non-jit.

riscv: Add prescale to vertexjit.

dc4136d

unknownbrackets added the RISC-V label Feb 13, 2023

unknownbrackets added this to the v1.15.0 milestone Feb 13, 2023

hrydgard approved these changes Feb 13, 2023

View reviewed changes

hrydgard enabled auto-merge February 13, 2023 07:22

hrydgard merged commit 86a19ce into hrydgard:master Feb 13, 2023

unknownbrackets deleted the riscv-vertexjit branch February 13, 2023 07:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv: Initial vertexjit #16957

riscv: Initial vertexjit #16957

unknownbrackets commented Feb 13, 2023

hrydgard commented Feb 13, 2023

hrydgard commented Feb 13, 2023

unknownbrackets commented Feb 13, 2023

hrydgard commented Feb 13, 2023

riscv: Initial vertexjit #16957

riscv: Initial vertexjit #16957

Conversation

unknownbrackets commented Feb 13, 2023

hrydgard commented Feb 13, 2023

hrydgard commented Feb 13, 2023

unknownbrackets commented Feb 13, 2023

hrydgard commented Feb 13, 2023