Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a few x86 vertexjit optimizations #9674

Merged
merged 3 commits into from
May 8, 2017

Conversation

unknownbrackets
Copy link
Collaborator

Micro-benchmarking results from unittest in commit messages - one only for SSE 4.1.

-[Unknown]

This was commented out, but works fine and goes from 320% -> 450% the speed
of non-jit for simple pos/col verts.
This was actually 270% -> 340% non-jit for pos-only verts.
This takes it from 150% to 390% non-jit for pos only verts.
@zminhquanz
Copy link
Contributor

zminhquanz commented May 8, 2017

Does it perf improvement on SSSE 3 and SSE 3 ?

@unknownbrackets
Copy link
Collaborator Author

Yes, but not the s16 pos throughmode change (which only works for SSE 4.1.) The rest are SSE2+.

Note that these improvements are only measured for vertex decoding. This is only a percentage of the overall rendering process, so don't expect a huge impact in most games.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented May 8, 2017

If you have Haswell or better (BMI2 extensions), it's possible to do an even faster 4444->8888 through PDEP:

(color in eax)
pdep ebx, eax, 0xF0F0F0F0
pdep ecx, eax, 0x0F0F0F0F
or ebx, ecx

@hrydgard hrydgard merged commit f06daba into hrydgard:master May 8, 2017
@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented May 8, 2017

I'm still rocking my ivy, will probably upgrade at some point but it's been serving me quite well (aside from lacking AVX2 and BMI.) Definitely some cool things can be done there.

-[Unknown]

@unknownbrackets unknownbrackets deleted the vertexjit branch May 8, 2017 18:47
@hrydgard
Copy link
Owner

hrydgard commented May 8, 2017

Still rocking an Ivy as a main machine as well, but I've got a more modern CPU in the laptop. It's fast enough though, so not a very important optimization. PDEP/PEXT are really cool instructions though, I'm sure they have more interesting uses.

@iOS4all
Copy link

iOS4all commented May 8, 2017

Is this fix related to iOS micro stutter or not?
Thanks.

@unknownbrackets
Copy link
Collaborator Author

No. This only applies to x86 (Intel) CPUs, which are not used in any iOS devices.

-[Unknown]

@iOS4all
Copy link

iOS4all commented May 9, 2017

Oh.... ok thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants