Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM assertion failure when casting a SIMD type with -O3 #10425

Closed
abrown opened this issue Feb 12, 2020 · 11 comments
Closed

LLVM assertion failure when casting a SIMD type with -O3 #10425

abrown opened this issue Feb 12, 2020 · 11 comments
Assignees

Comments

@abrown
Copy link
Contributor

abrown commented Feb 12, 2020

In attempting to build a Wasm SIMD version of libaom using SIMDe, I ran into the following crash when I add the -O3 flag (it does not crash without this):

...
clang: /b/s/w/ir/cache/builder/emscripten-releases/llvm-project/llvm/include/llvm/Support/Casting.h:264: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::ConstantInt, Y = llvm::Value]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Stack dump:
0.      Program arguments: /home/abrown/Code/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -D__EMSCRIPTEN_major__=1 -D__EMSCRIPTEN_minor__=39 -D__EMSCRIPTEN_tiny__=6 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__ -Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/include/libcxx -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/lib/libcxxabi/include -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/include/compat -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/include -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/include/libc -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/lib/libc/musl/arch/emscripten -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/local/include -Xclang -isystem/home/abrown/.emscripten_cache/wasm-obj/include -DENABLE_SIMDE=1 -DSIMDE_ENABLE_NATIVE_ALIASES=1 -I/home/abrown/Code/aom -I/home/abrown/Code/aom-build-analyzer -I/home/abrown/Code/aom/apps -I/home/abrown/Code/aom/common -I/home/abrown/Code/aom/examples -I/home/abrown/Code/aom/stats -I/home/abrown/Code/aom/third_party/simde -I/home/abrown/Code/aom/third_party/libyuv/include -gline-tables-only -std=c99 -Wall -Wdisabled-optimization -Wextra -Wfloat-conversion -Wimplicit-function-declaration -Wpointer-arith -Wsign-compare -Wstring-conversion -Wtype-limits -Wuninitialized -Wunused -Wvla -Wshorten-64-to-32 -Wshadow -Wundef -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -O3 -D_POSIX_SOURCE -D_POSIX_SOURCE -fopenmp-simd -DSIMDE_ENABLE_OPENMP -mssse3 -c -DEMSCRIPTEN -msimd128 /home/abrown/Code/aom/aom_dsp/x86/intrapred_ssse3.c -Xclang -isystem/home/abrown/Code/emsdk/upstream/emscripten/system/include/SDL -c -o CMakeFiles/aom_dsp_common_ssse3_intrinsics.dir/aom_dsp/x86/intrapred_ssse3.c.o -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr 
1.      <eof> parser at end of file
2.      Per-module optimization passes
3.      Running pass 'Function Pass Manager' on module '/home/abrown/Code/aom/aom_dsp/x86/intrapred_ssse3.c'.
4.      Running pass 'SLP Vectorizer' on function '@aom_paeth_predictor_16x4_ssse3'
...

This failure comes using the following emcc -v:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 1.39.6
clang version 10.0.0 (/b/s/w/ir/cache/git/chromium.googlesource.com-external-github.com-llvm-llvm--project 06cfcdcca7de9c88a1e885eff0d0c4c07090ad48)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/abrown/Code/emsdk/upstream/bin
Found candidate GCC installation: /usr/lib/gcc/x86_64-redhat-linux/9
Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/9
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
shared:INFO: (Emscripten: Running sanity checks)

The code being compiled is here and @nemequ reduced that file to a smaller test case:

#include <stddef.h>
#include <stdint.h>
#include <string.h>

typedef int32_t __m128i __attribute__((aligned(16), vector_size(16), __may_alias__));
typedef int16_t __m128i_i16 __attribute__((aligned(16), vector_size(16), __may_alias__));
typedef int8_t __m128i_i8 __attribute__((aligned(16), vector_size(16), __may_alias__));

static __m128i _mm_cvtsi32_si128(int32_t a) { return ((__m128i){a, 0, 0, 0}); }

static __m128i _mm_set1_epi16(int16_t v) {
  __m128i_i16 r16 = {v, v, v, v, v, v, v, v};
  __m128i r32;
  memcpy(&r32, &r16, sizeof(r32));
  return r32;
}

static __m128i _mm_shuffle_epi8(__m128i a, __m128i b) {
  __m128i r;
  __m128i_i8 ta, tb, tr;

  memcpy(&ta, &a, sizeof(ta));
  memcpy(&tb, &b, sizeof(tb));

  for (size_t i = 0; i < (sizeof(tr) / sizeof(tr[0])); i++) {
    tr[i] = ta[tb[i] & 15] & (tb[i] >> 7);
  }

  memcpy(&r, &tr, sizeof(r));
  return r;
}

static void _mm_store_si128(__m128i *mem_addr, __m128i a) {
  *mem_addr = a;
}

static __m128i _mm_add_epi16(__m128i a, __m128i b) {
  return a + b;
}

void aom_paeth_predictor_16x4_ssse3(uint8_t *dst, ptrdiff_t stride,
                                    const uint8_t *left) {
  // All fail
  //__m128i l = _mm_cvtsi32_si128(((const uint32_t *)left)[0]);
  //__m128i l = _mm_cvtsi32_si128(*((const uint32_t *)left));
  __m128i l; memcpy(&l, left, sizeof(l));

  __m128i rep = { 0, };

  for (int i = 0; i < 4; ++i) {
    const __m128i l16 = _mm_shuffle_epi8(l, rep);

    // Both fail
    // memcpy(dst, &l16, sizeof(l16));
    _mm_store_si128((__m128i *)dst, l16);

    dst += stride;

    rep = _mm_add_epi16(rep, _mm_set1_epi16(1));
  }
}

I'm not exactly sure what is going on here so feel free to update the title or re-route this to LLVM if it is a bug there; I felt it was best to triage in Emscripten first to see what you all think.

@nemequ
Copy link

nemequ commented Feb 13, 2020

It's worth noting that it also requires -s SIMD=1. And it happens at -O2 as well, though not at -O1.

@tlively tlively self-assigned this Feb 14, 2020
@tlively
Copy link
Member

tlively commented Feb 14, 2020

Will investigate! Thanks for the report :)

@tlively
Copy link
Member

tlively commented Feb 14, 2020

So it looks like this is a bug in the SLP vectorizer pass. I have no idea how that pass works and it is not WebAssembly-specific, so it would be great if you could file an LLVM bug about this.

@tlively
Copy link
Member

tlively commented Feb 14, 2020

Also, whoever wants to fix this bug probably doesn't have emscripten installed, so it would be helpful if you uploaded the automatically generated test file and script when you file that bug report.

@nemequ
Copy link

nemequ commented Feb 18, 2020

Thanks for looking into this!

Reported to LLVM as https://bugs.llvm.org/show_bug.cgi?id=44954

@abrown
Copy link
Contributor Author

abrown commented Feb 20, 2020

@tlively, looks like a fix was merged. I'm not too familiar with how Emscripten tracks LLVM; when do you think the fix would get downstreamed into Emscripten?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 20, 2020

It should already be available via ./emsdk install tot.. and will be available as latest as soon as we tag a release.

@abrown
Copy link
Contributor Author

abrown commented Mar 11, 2020

I'm not seeing it fixed when I do ./emsdk install tot && ./emsdk activate tot; I posted a stack trace in the LLVM issue.

@tlively
Copy link
Member

tlively commented Mar 11, 2020

@abrown, as mentioned in the LLVM issue, this looks like a different bug. Can you cc me on any new issue you file?

@abrown
Copy link
Contributor Author

abrown commented Mar 11, 2020

@tlively, in case I didn't do things the right way in BugZilla, here's a link to the new issue: https://bugs.llvm.org/show_bug.cgi?id=45178.

@tlively
Copy link
Member

tlively commented Jun 1, 2020

This has been fixed.

@tlively tlively closed this as completed Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants