Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Clang is incompatible with deflate_quick asm #6

Closed
mp15 opened this issue Jun 10, 2014 · 3 comments
Closed

Clang is incompatible with deflate_quick asm #6

mp15 opened this issue Jun 10, 2014 · 3 comments

Comments

@mp15
Copy link
Contributor

mp15 commented Jun 10, 2014

Clang (masquerading as gcc on OS X) doesn't like your assembler syntax for some reason. From what I understand it may be related to this http://llvm.org/bugs/show_bug.cgi?id=7459#c5 where clang is being stricter about certain assumptions than gas.

gcc -O3  -DHAVE_HIDDEN -DX86_64 -DUNALIGNED_OK -DADLER32_UNROLL_LESS -DCRC32_UNROLL_LESS -UCHECK_SSE2 -DHAVE_SSE2 -DUSE_SSE4_2_CRC_HASH -DHAVE_PCLMULQDQ -DUSE_QUICK -DUSE_MEDIUM -msse4 -I. -c -o deflate_quick.o deflate_quick.c
deflate_quick.c:46:27: error: invalid instruction mnemonic 'movzxw'
        "jb         loop\n\t"
                          ^
<inline asm>:12:2: note: instantiated into assembly here
        movzxw     (%rsi, %rbx), %rax
        ^~~~~~
deflate_quick.c:48:50: error: invalid operand for instruction
        "movzxw     (%[src0], %[result]), %[ax]\n\t"
                                                 ^
<inline asm>:13:28: note: instantiated into assembly here
        xorw        (%rdi, %rbx), %rax
                                  ^~~~
deflate_quick.c:34:5: error: invalid symbol redefinition
    "loop:\n\t"
    ^
<inline asm>:1:2: note: instantiated into assembly here
        loop:
        ^
deflate_quick.c:46:27: error: invalid instruction mnemonic 'movzxw'
        "jb         loop\n\t"
                          ^
<inline asm>:12:2: note: instantiated into assembly here
        movzxw     (%rsi, %rbx), %rax
        ^~~~~~
deflate_quick.c:48:50: error: invalid operand for instruction
        "movzxw     (%[src0], %[result]), %[ax]\n\t"
                                                 ^
<inline asm>:13:28: note: instantiated into assembly here
        xorw        (%rdi, %rbx), %rax
                                  ^~~~
deflate_quick.c:54:6: error: invalid symbol redefinition
    "miscompare16:\n\t"
     ^
<inline asm>:17:1: note: instantiated into assembly here
miscompare16:
^
deflate_quick.c:57:6: error: invalid symbol redefinition
    "miscompare:\n\t"
     ^
<inline asm>:20:1: note: instantiated into assembly here
miscompare:
^
deflate_quick.c:59:6: error: invalid symbol redefinition
    "end:\n\t"
     ^
<inline asm>:22:1: note: instantiated into assembly here
end:
^
8 errors generated.
make: *** [deflate_quick.o] Error 1
@mp15
Copy link
Contributor Author

mp15 commented Jun 11, 2014

I've attempted a patch for this to deal with the fact the assembly needs local labels, and the mov/xor errors and incorporated it into pull request #5

@jtkukunas
Copy link
Contributor

mp15,

I've pushed a branch, issue6, that should resolve fix this problem. Can you verify that it resolves the issue for you? Specifically, the fix should be introduced in commit 4026219.

Thanks.

@mp15
Copy link
Contributor Author

mp15 commented Jun 16, 2014

Seems to compile cleanly for me and passes checks:

$ make check
hello world
zlib version 1.2.8 = 0x1280, compile flags = 0xa9
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek:  hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
        *** zlib test OK ***
hello world
zlib version 1.2.8 = 0x1280, compile flags = 0xa9
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek:  hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
        *** zlib shared test OK ***

jtkukunas added a commit that referenced this issue Jul 26, 2014
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
    - only checking the first hash match
    - using a small inline SSE4.2-optimized longest_match
    - forcing a window size of 8K, and using a precomputed dist/len
      table
    - forcing the static Huffman tree and emitting codes immediately
      instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
jtkukunas added a commit that referenced this issue Jun 21, 2018
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
busykai pushed a commit to busykai/zlib that referenced this issue Jan 26, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes intel#6
Fixes intel#8
busykai pushed a commit to busykai/zlib that referenced this issue Jan 28, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes intel#6
Fixes intel#8
jtkukunas added a commit that referenced this issue Apr 15, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
jtkukunas added a commit that referenced this issue Apr 25, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
jtkukunas added a commit that referenced this issue Aug 29, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
busykai pushed a commit to busykai/zlib that referenced this issue Nov 9, 2022
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes intel#6
Fixes intel#8
busykai pushed a commit that referenced this issue Nov 19, 2023
The deflate_quick strategy is designed to provide maximum
deflate performance.

deflate_quick achieves this through:
- only checking the first hash match
- using a small inline SSE4.2-optimized longest_match
- forcing a window size of 8K, and using a precomputed dist/len
  table
- forcing the static Huffman tree and emitting codes immediately
  instead of tallying

This patch changes the scope of flush_pending, bi_windup, and
static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
put_short, and send_bits to deflate.h.

Updates the configure script to enable by default for x86. On systems
without SSE4.2, fallback is to deflate_fast strategy.

Fixes #6
Fixes #8
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants