Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outsmart the LLVM optimizer #8073

Merged
merged 1 commit into from
Feb 7, 2024
Merged

Outsmart the LLVM optimizer #8073

merged 1 commit into from
Feb 7, 2024

Conversation

steven-johnson
Copy link
Contributor

The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)

The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
@steven-johnson steven-johnson merged commit 84fe565 into main Feb 7, 2024
17 of 19 checks passed
@steven-johnson steven-johnson deleted the srj/llvm-fp-fix branch February 7, 2024 17:41
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
The old definitions of bool_1, bool_2, bool_3 in simd_op_check_x86 (etc) all referred to the same entry in in_f32; as of llvm/llvm-project#76367, the LLVM optimizer is smart enough to realize that (eg) bool1 != bool2 by construction, and optimizes away the code that tests their conditions, such as the one for andps and orps. Initing them from different locations is enough to outsmart the compiler.

(bug was only noticed in the x86 test, but I updated the other tests to guard against future improvements there too.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants