[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

rootjalex · 2023-08-24T22:05:24Z

There are cases where bounds inference allows us to safely reinterpret unsigned values as signed values, in order to target x86 saturating narrow instructions. I give an example below. I don't add tests to simd_op_check_x86 because we can't test for removing min instructions via the existing mechanism.

Example:

Var x("x");
ImageParam in0(UInt(8), 1);
Func out("out");
out(x) = u8_sat(u16(in0(x)) * 5);
out.vectorize(x, 32);
Target x86("x86-64-linux-avx-avx2-fma-sse41");
out.compile_to_assembly("x86-test0.asm", out.infer_arguments(), x86);

Before:

vpmovzxbw	(%rax,%rcx), %ymm2      # ymm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero,mem[8],zero,mem[9],zero,mem[10],zero,mem[11],zero,mem[12],zero,mem[13],zero,mem[14],zero,mem[15],zero
vpmovzxbw	16(%rax,%rcx), %ymm3    # ymm3 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero,mem[8],zero,mem[9],zero,mem[10],zero,mem[11],zero,mem[12],zero,mem[13],zero,mem[14],zero,mem[15],zero
vpmullw	%ymm0, %ymm3, %ymm3
vpmullw	%ymm0, %ymm2, %ymm2
vpminuw	%ymm1, %ymm2, %ymm2
vpminuw	%ymm1, %ymm3, %ymm3
vpackuswb	%ymm3, %ymm2, %ymm2
vpermq	$216, %ymm2, %ymm2              # ymm2 = ymm2[0,2,1,3]
vmovdqu	%ymm2, (%r13,%rcx)
addq	$32, %rcx
cmpq	%rcx, %rdx
jne	.LBB219_44

After (removed 2 vpminuw instructions):

vpmovzxbw	(%rax,%rcx), %ymm1      # ymm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero,mem[8],zero,mem[9],zero,mem[10],zero,mem[11],zero,mem[12],zero,mem[13],zero,mem[14],zero,mem[15],zero
vpmovzxbw	16(%rax,%rcx), %ymm2    # ymm2 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero,mem[8],zero,mem[9],zero,mem[10],zero,mem[11],zero,mem[12],zero,mem[13],zero,mem[14],zero,mem[15],zero
vpmullw	%ymm0, %ymm2, %ymm2
vpmullw	%ymm0, %ymm1, %ymm1
vpackuswb	%ymm2, %ymm1, %ymm1
vpermq	$216, %ymm1, %ymm1              # ymm1 = ymm1[0,2,1,3]
vmovdqu	%ymm1, (%r13,%rcx)
addq	$32, %rcx
cmpq	%rcx, %rdx
jne	.LBB219_44

steven-johnson · 2023-08-24T22:08:50Z

I don't add tests to simd_op_check_x86

Understood, but is there any other reasonable way to add checks for this? This sort of stuff is often pretty fragile.

src/Type.cpp

steven-johnson

LGTM but I feel like if we don't have some way to test this, we'll eventually find it broken :-/

rootjalex · 2023-08-24T22:23:17Z

I don't add tests to simd_op_check_x86

Understood, but is there any other reasonable way to add checks for this? This sort of stuff is often pretty fragile.

I thought about adding a "check_not" to simd_op_check, because that's the only way I could think to test this sort of optimization, but that seems like heavy functionality to use for just this case (I'm also not sure we can verify that this optimization runs by saying "don't give me a vpmin", because other parts of the generated code could contain a vpmin and we don't want false failures). I'm open to suggestions (especially because I intend to make a similar PR for HVX).

rootjalex · 2023-08-25T22:25:46Z

We can do the same thing for HVX saturating narrow instructions, which I was going to do in a separate PR, but it requires the same changes to Bounds.cpp and Type.h that that the x86 version does, so I am making this one PR.

…tjalex/x86-sat

rootjalex · 2023-08-25T22:29:36Z

@steven-johnson Luckily, the improvement to HVX can be tested (unlike x86). Unluckily, this revealed that constant bounds inference does not work on u32 like I expected (which is part of why @abadams opened #7807)

src/CodeGen_X86.cpp

src/HexagonOptimize.cpp

steven-johnson · 2023-09-25T23:12:04Z

Is this PR still an active one? It seems pretty stale.

steven-johnson · 2023-11-28T15:22:23Z

What's the story, needs more work?

rootjalex · 2024-04-28T21:33:44Z

This has gotten stale, but I am trying to find time to update it. Andrew and I discussed some issues with using bounds inference in this way, but I think they are fixed by using the new infrastructure in #8179. I will aim to do this in the next two weeks, I've finally started to catch up on my insane backlog.

abadams · 2024-04-28T21:47:04Z

I think it should be fairly straight-forward to adapt it to use ConstantInterval, now that the ConstantInterval refactor is in. You could do it now and there wouldn't be much of a merge conflict, or you could wait for #8155

Here's how I use ConstantInterval to do something similar: https://github.com/halide/Halide/pull/8155/files#diff-6cd3cc01186f9ce4abaa1a98c9201bd440983ecc5620fbccae383df1116bdce6R704

All your uses are about something being safe to reinterpret as signed/unsigned, so I think you want a few instances of const bool safe_to_reinterpret = some_expr.type().with_code(...).can_represent(constant_integer_bounds(some_expr))

rootjalex · 2024-04-28T22:12:49Z

Yeah, realized it wasn't a ton of work, and was bored during an ASPLOS workshop. This is good for review now! Also, the u32 HVX tests now work, because the new ConstantInterval stuff is stronger than find_constant_bounds on u32. Nice side effect.

src/CodeGen_X86.cpp

src/HexagonOptimize.cpp

rootjalex · 2024-04-29T18:31:01Z

@abadams Sorry to add a change after your approval, but I just saw a wasm talk and was reminded that we can do this for wasm too. It's basically a copy-paste from x86, but would be nice to have eyes just to be safe.
@steven-johnson just want to ping you about the wasm change, if you're curious to look

abadams · 2024-04-29T18:38:22Z

The big TODO for wasm now is exploiting constant_integer_bounds to hit the relaxed simd instructions in case where we know they don't have UB

rootjalex · 2024-04-29T18:39:42Z

The big TODO for wasm now is exploiting constant_integer_bounds to hit the relaxed simd instructions in case where we know they don't have UB

Yep, I think #7312 is for tracking that. This is still a nice side benefit of constant bounds for wasm though

rootjalex · 2024-04-29T18:49:31Z

I don't think relaxed SIMD has actually shipped in wasm yet, so that's still waiting on the wasm folks

steven-johnson · 2024-04-29T19:46:21Z

I don't think relaxed SIMD has actually shipped in wasm yet, so that's still waiting on the wasm folks

https://webassembly.org/features/

TL;DR: shipped in Chrome, behind flags elsewhere

rootjalex added 3 commits August 23, 2023 14:28

first attempt at x86 bounds inference

9c7a40a

Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-sat

e04f520

clean-up truncate cast attempt

a6bcfeb

rootjalex requested review from steven-johnson and abadams August 24, 2023 22:05

clang format

ff4e838

steven-johnson reviewed Aug 24, 2023

View reviewed changes

src/Type.cpp Outdated Show resolved Hide resolved

steven-johnson self-requested a review August 24, 2023 22:12

steven-johnson approved these changes Aug 24, 2023

View reviewed changes

constexpr min/max int functions

47ee2ec

rootjalex and others added 3 commits August 24, 2023 15:35

clang format

ebac6b9

Merge branch 'main' into rootjalex/x86-sat

42858d0

bounds inference for HVX too

b4faf7a

rootjalex changed the title ~~[x86] Use bounds inference for saturating_narrow instruction selection~~ [x86 & HVX] Use bounds inference for saturating_narrow instruction selection Aug 25, 2023

rootjalex requested review from steven-johnson and pranavb-ca August 25, 2023 22:25

Merge branch 'rootjalex/x86-sat' of github.com:halide/Halide into roo…

f826002

…tjalex/x86-sat

steven-johnson approved these changes Aug 25, 2023

View reviewed changes

abadams reviewed Aug 28, 2023

View reviewed changes

src/CodeGen_X86.cpp Outdated Show resolved Hide resolved

abadams reviewed Aug 28, 2023

View reviewed changes

src/HexagonOptimize.cpp Outdated Show resolved Hide resolved

abadams reviewed Aug 28, 2023

View reviewed changes

src/HexagonOptimize.cpp Outdated Show resolved Hide resolved

pranavb-ca mentioned this pull request Aug 29, 2023

[HVX] Failure on seemingly valid schedule #7806

Closed

pranavb-ca approved these changes Sep 6, 2023

View reviewed changes

rootjalex added 2 commits April 28, 2024 15:08

address reviewer comments + use new constant_bounds infra

7b99f11

revert unneeded changes to Type.h

d182915

abadams reviewed Apr 28, 2024

View reviewed changes

src/CodeGen_X86.cpp Outdated Show resolved Hide resolved

abadams reviewed Apr 28, 2024

View reviewed changes

src/HexagonOptimize.cpp Outdated Show resolved Hide resolved

abadams reviewed Apr 28, 2024

View reviewed changes

src/HexagonOptimize.cpp Outdated Show resolved Hide resolved

rootjalex added 2 commits April 29, 2024 06:23

use t.with_code() and update comments

36a8b5f

use can_represent(ConstantInterval) + clang-format

5e6397c

abadams approved these changes Apr 29, 2024

View reviewed changes

use bounds inference for WASM IS too + add tests

9dcd113

rootjalex changed the title ~~[x86 & HVX] Use bounds inference for saturating_narrow instruction selection~~ [x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection Apr 29, 2024

rootjalex mentioned this pull request Apr 29, 2024

Might want to keep track of constant bounds during codegen #8212

Open

rootjalex added 2 commits April 29, 2024 11:35

add tracking issue for scoped constant bounds

c9203e7

add tracking issue for scoped constant bounds

5281406

add TODO about lossless_cast usage

3a3e995

rootjalex merged commit 8141197 into main Apr 30, 2024
19 checks passed

rootjalex deleted the rootjalex/x86-sat branch April 30, 2024 13:44

rootjalex mentioned this pull request May 25, 2024

Rework the simplifier to use ConstantInterval for bounds #8222

Merged

BrewTestBot mentioned this pull request Jul 17, 2024

halide 18.0.0 Homebrew/homebrew-core#177657

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

rootjalex commented Aug 24, 2023 •

edited

Loading

steven-johnson commented Aug 24, 2023

steven-johnson left a comment

rootjalex commented Aug 24, 2023

rootjalex commented Aug 25, 2023

rootjalex commented Aug 25, 2023

steven-johnson commented Sep 25, 2023

steven-johnson commented Nov 28, 2023

rootjalex commented Apr 28, 2024

abadams commented Apr 28, 2024

rootjalex commented Apr 28, 2024

rootjalex commented Apr 29, 2024

abadams commented Apr 29, 2024

rootjalex commented Apr 29, 2024

rootjalex commented Apr 29, 2024

steven-johnson commented Apr 29, 2024

[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

Conversation

rootjalex commented Aug 24, 2023 • edited Loading

steven-johnson commented Aug 24, 2023

steven-johnson left a comment

Choose a reason for hiding this comment

rootjalex commented Aug 24, 2023

rootjalex commented Aug 25, 2023

rootjalex commented Aug 25, 2023

steven-johnson commented Sep 25, 2023

steven-johnson commented Nov 28, 2023

rootjalex commented Apr 28, 2024

abadams commented Apr 28, 2024

rootjalex commented Apr 28, 2024

rootjalex commented Apr 29, 2024

abadams commented Apr 29, 2024

rootjalex commented Apr 29, 2024

rootjalex commented Apr 29, 2024

steven-johnson commented Apr 29, 2024

rootjalex commented Aug 24, 2023 •

edited

Loading