bug fix to encode_arm64.s: some registers overwritten in memmove call #56

AWSjswinney · 2020-10-02T15:53:27Z

In an upcoming change to the Go runtime, the memmove function makes use of new registers that exposed this bug. If that change is merged, it will be more important to also merge this change.

This change has passed go-fuzz testing:

2020/10/01 17:56:44 workers: 2, corpus: 228 (1h2m ago), crashers: 0, restarts: 1/10000, execs: 596937882 (11640/sec), cover: 157, uptime: 14h14m

In encode_arm64.s, encodeBlock, two of the registers added during the port from
amd64 were not saved or restored for the memmove call. Instead of saving them,
just recalculate their values. Additionally, I made a few small changes to
improve things since I've learned a bit more about ARMv8 assembly.

The CMP instruction accepts an immediate as the first argument
use LDP/STP instead of SIMD instructions

The change to use the load-pair and store-pair instructions instead of the SIMD
instructions results in some modest performance improvements as meastured on
Neoverse N1 (Graviton 2).

name              old time/op    new time/op    delta
WordsDecode1e1-2    25.9ns ± 1%    26.1ns ± 1%  +0.66%  (p=0.005 n=10+10)
WordsDecode1e2-2     107ns ± 0%     105ns ± 0%  -1.87%  (p=0.000 n=10+10)
WordsDecode1e3-2     953ns ± 0%     901ns ± 0%  -5.50%  (p=0.000 n=10+10)
WordsDecode1e4-2    10.6µs ± 0%     9.9µs ± 2%  -6.60%  (p=0.000 n=7+10)
WordsDecode1e5-2     170µs ± 1%     164µs ± 1%  -3.12%  (p=0.000 n=10+9)
WordsDecode1e6-2    1.71ms ± 0%    1.66ms ± 0%  -2.98%  (p=0.000 n=10+10)
WordsEncode1e1-2    22.0ns ± 1%    21.9ns ± 1%  -0.67%  (p=0.006 n=8+10)
WordsEncode1e2-2     248ns ± 0%     245ns ± 0%  -1.21%  (p=0.002 n=8+10)
WordsEncode1e3-2    2.50µs ± 0%    2.49µs ± 0%    ~     (p=0.103 n=10+9)
WordsEncode1e4-2    27.8µs ± 3%    28.0µs ± 2%    ~     (p=0.075 n=10+10)
WordsEncode1e5-2     339µs ± 0%     343µs ± 0%  +1.18%  (p=0.000 n=9+10)
WordsEncode1e6-2    3.39ms ± 0%    3.42ms ± 0%  +0.94%  (p=0.000 n=10+10)
RandomEncode-2      74.8µs ± 1%    77.1µs ± 1%  +3.16%  (p=0.000 n=10+10)
_UFlat0-2           68.8µs ± 1%    66.4µs ± 2%  -3.54%  (p=0.000 n=10+10)
_UFlat1-2            770µs ± 0%     740µs ± 1%  -3.93%  (p=0.000 n=10+10)
_UFlat2-2           6.57µs ± 0%    6.55µs ± 0%  -0.25%  (p=0.000 n=8+10)
_UFlat3-2            183ns ± 0%     178ns ± 1%  -2.84%  (p=0.000 n=9+10)
_UFlat4-2           9.76µs ± 1%    9.56µs ± 0%  -2.07%  (p=0.000 n=10+9)
_UFlat5-2            301µs ± 0%     293µs ± 0%  -2.67%  (p=0.000 n=9+10)
_UFlat6-2            280µs ± 1%     267µs ± 1%  -4.63%  (p=0.000 n=10+10)
_UFlat7-2            241µs ± 0%     230µs ± 1%  -4.68%  (p=0.000 n=9+10)
_UFlat8-2            745µs ± 0%     715µs ± 1%  -4.11%  (p=0.000 n=10+10)
_UFlat9-2           1.01ms ± 0%    0.96ms ± 0%  -4.60%  (p=0.000 n=10+10)
_UFlat10-2          62.3µs ± 1%    59.3µs ± 1%  -4.72%  (p=0.000 n=10+9)
_UFlat11-2           258µs ± 0%     252µs ± 1%  -2.56%  (p=0.000 n=10+10)
_ZFlat0-2            135µs ± 1%     132µs ± 1%  -1.88%  (p=0.000 n=10+8)
_ZFlat1-2           1.76ms ± 0%    1.74ms ± 0%  -1.00%  (p=0.000 n=9+9)
_ZFlat2-2           9.54µs ± 0%    9.84µs ± 5%  +3.18%  (p=0.000 n=10+10)
_ZFlat3-2            449ns ± 0%     447ns ± 0%  -0.38%  (p=0.000 n=10+9)
_ZFlat4-2           15.6µs ± 0%    16.0µs ± 4%    ~     (p=0.118 n=9+10)
_ZFlat5-2            560µs ± 1%     555µs ± 1%  -0.89%  (p=0.000 n=9+9)
_ZFlat6-2            531µs ± 0%     534µs ± 0%  +0.64%  (p=0.000 n=10+10)
_ZFlat7-2            466µs ± 0%     468µs ± 0%  +0.32%  (p=0.003 n=10+10)
_ZFlat8-2           1.42ms ± 0%    1.42ms ± 0%  +0.43%  (p=0.000 n=10+10)
_ZFlat9-2           1.93ms ± 0%    1.94ms ± 0%  +0.44%  (p=0.000 n=10+10)
_ZFlat10-2           120µs ± 0%     121µs ± 3%    ~     (p=0.436 n=9+9)
_ZFlat11-2           433µs ± 0%     437µs ± 0%  +1.03%  (p=0.000 n=10+10)
ExtendMatch-2       9.77µs ± 0%    9.76µs ± 0%  -0.13%  (p=0.050 n=10+10)

As measured on Cortex-A53 (Raspberry Pi 3)

name              old time/op    new time/op    delta
WordsDecode1e1-4     152ns ± 2%     151ns ± 0%    ~     (p=0.536 n=10+8)
WordsDecode1e2-4     639ns ± 0%     617ns ± 0%  -3.54%  (p=0.000 n=9+8)
WordsDecode1e3-4    6.74µs ± 2%    6.35µs ± 0%  -5.75%  (p=0.000 n=10+9)
WordsDecode1e4-4    66.7µs ± 0%    63.5µs ± 0%  -4.69%  (p=0.000 n=9+9)
WordsDecode1e5-4     715µs ± 0%     684µs ± 0%  -4.38%  (p=0.000 n=8+8)
WordsDecode1e6-4    6.87ms ± 2%    6.53ms ± 1%  -4.99%  (p=0.000 n=10+9)
WordsEncode1e1-4     127ns ± 2%     126ns ± 0%    ~     (p=0.065 n=10+9)
WordsEncode1e2-4    1.58µs ± 0%    1.57µs ± 0%  -0.99%  (p=0.000 n=8+8)
WordsEncode1e3-4    15.1µs ± 0%    14.9µs ± 0%  -1.46%  (p=0.000 n=9+8)
WordsEncode1e4-4     148µs ± 0%     148µs ± 4%    ~     (p=0.497 n=9+10)
WordsEncode1e5-4    1.54ms ± 0%    1.54ms ± 0%  +0.12%  (p=0.012 n=10+8)
WordsEncode1e6-4    14.4ms ± 0%    14.4ms ± 1%  -0.47%  (p=0.015 n=9+8)
RandomEncode-4      1.13ms ± 1%    1.13ms ± 1%    ~     (p=0.529 n=10+10)
_UFlat0-4            294µs ± 0%     288µs ± 1%  -2.08%  (p=0.000 n=9+9)
_UFlat1-4           3.05ms ± 1%    2.98ms ± 1%  -2.22%  (p=0.000 n=9+9)
_UFlat2-4           37.3µs ± 0%    37.4µs ± 1%    ~     (p=0.093 n=8+9)
_UFlat3-4            909ns ± 0%     914ns ± 2%    ~     (p=0.526 n=8+10)
_UFlat4-4           58.7µs ± 0%    58.1µs ± 0%  -1.09%  (p=0.000 n=8+10)
_UFlat5-4           1.22ms ± 0%    1.19ms ± 1%  -2.14%  (p=0.000 n=8+8)
_UFlat6-4           1.03ms ± 0%    0.99ms ± 0%  -3.28%  (p=0.000 n=9+8)
_UFlat7-4            895µs ± 0%     861µs ± 0%  -3.79%  (p=0.000 n=8+8)
_UFlat8-4           2.83ms ± 0%    2.75ms ± 0%  -2.88%  (p=0.000 n=7+8)
_UFlat9-4           3.85ms ± 1%    3.73ms ± 1%  -3.03%  (p=0.000 n=8+9)
_UFlat10-4           286µs ± 0%     282µs ± 0%  -1.59%  (p=0.000 n=9+9)
_UFlat11-4          1.06ms ± 0%    1.02ms ± 0%  -3.58%  (p=0.000 n=8+9)
_ZFlat0-4            620µs ± 0%     620µs ± 1%    ~     (p=0.963 n=9+8)
_ZFlat1-4           9.49ms ± 1%    9.67ms ± 3%  +1.87%  (p=0.000 n=9+10)
_ZFlat2-4           61.8µs ± 0%    62.3µs ± 3%    ~     (p=0.829 n=8+10)
_ZFlat3-4           2.80µs ± 1%    2.79µs ± 0%  -0.55%  (p=0.000 n=8+8)
_ZFlat4-4            108µs ± 0%     109µs ± 0%  +0.55%  (p=0.000 n=10+8)
_ZFlat5-4           2.59ms ± 2%    2.58ms ± 1%    ~     (p=0.274 n=10+8)
_ZFlat6-4           2.39ms ± 3%    2.40ms ± 1%    ~     (p=0.631 n=10+10)
_ZFlat7-4           2.11ms ± 0%    2.08ms ± 1%  -1.23%  (p=0.000 n=10+9)
_ZFlat8-4           6.86ms ± 0%    6.92ms ± 1%  +0.78%  (p=0.000 n=9+8)
_ZFlat9-4           9.42ms ± 0%    9.40ms ± 1%    ~     (p=0.606 n=8+9)
_ZFlat10-4           620µs ± 1%     621µs ± 4%    ~     (p=0.173 n=8+10)
_ZFlat11-4          1.94ms ± 0%    1.93ms ± 0%  -0.52%  (p=0.001 n=9+8)
ExtendMatch-4       69.3µs ± 2%    69.2µs ± 0%    ~     (p=0.515 n=10+8)

In encode_arm64.s, encodeBlock, two of the registers added during the port from amd64 were not saved or restored for the memmove call. Instead of saving them, just recalculate their values. Additionally, I made a few small changes to improve things since I've learned a bit more about ARMv8 assembly. - The CMP instruction accepts an immediate as the first argument - use LDP/STP instead of SIMD instructions The change to use the load-pair and store-pair instructions instead of the SIMD instructions results in some modest performance improvements as meastured on Neoverse N1 (Graviton 2). name old time/op new time/op delta WordsDecode1e1-2 25.9ns ± 1% 26.1ns ± 1% +0.66% (p=0.005 n=10+10) WordsDecode1e2-2 107ns ± 0% 105ns ± 0% -1.87% (p=0.000 n=10+10) WordsDecode1e3-2 953ns ± 0% 901ns ± 0% -5.50% (p=0.000 n=10+10) WordsDecode1e4-2 10.6µs ± 0% 9.9µs ± 2% -6.60% (p=0.000 n=7+10) WordsDecode1e5-2 170µs ± 1% 164µs ± 1% -3.12% (p=0.000 n=10+9) WordsDecode1e6-2 1.71ms ± 0% 1.66ms ± 0% -2.98% (p=0.000 n=10+10) WordsEncode1e1-2 22.0ns ± 1% 21.9ns ± 1% -0.67% (p=0.006 n=8+10) WordsEncode1e2-2 248ns ± 0% 245ns ± 0% -1.21% (p=0.002 n=8+10) WordsEncode1e3-2 2.50µs ± 0% 2.49µs ± 0% ~ (p=0.103 n=10+9) WordsEncode1e4-2 27.8µs ± 3% 28.0µs ± 2% ~ (p=0.075 n=10+10) WordsEncode1e5-2 339µs ± 0% 343µs ± 0% +1.18% (p=0.000 n=9+10) WordsEncode1e6-2 3.39ms ± 0% 3.42ms ± 0% +0.94% (p=0.000 n=10+10) RandomEncode-2 74.8µs ± 1% 77.1µs ± 1% +3.16% (p=0.000 n=10+10) _UFlat0-2 68.8µs ± 1% 66.4µs ± 2% -3.54% (p=0.000 n=10+10) _UFlat1-2 770µs ± 0% 740µs ± 1% -3.93% (p=0.000 n=10+10) _UFlat2-2 6.57µs ± 0% 6.55µs ± 0% -0.25% (p=0.000 n=8+10) _UFlat3-2 183ns ± 0% 178ns ± 1% -2.84% (p=0.000 n=9+10) _UFlat4-2 9.76µs ± 1% 9.56µs ± 0% -2.07% (p=0.000 n=10+9) _UFlat5-2 301µs ± 0% 293µs ± 0% -2.67% (p=0.000 n=9+10) _UFlat6-2 280µs ± 1% 267µs ± 1% -4.63% (p=0.000 n=10+10) _UFlat7-2 241µs ± 0% 230µs ± 1% -4.68% (p=0.000 n=9+10) _UFlat8-2 745µs ± 0% 715µs ± 1% -4.11% (p=0.000 n=10+10) _UFlat9-2 1.01ms ± 0% 0.96ms ± 0% -4.60% (p=0.000 n=10+10) _UFlat10-2 62.3µs ± 1% 59.3µs ± 1% -4.72% (p=0.000 n=10+9) _UFlat11-2 258µs ± 0% 252µs ± 1% -2.56% (p=0.000 n=10+10) _ZFlat0-2 135µs ± 1% 132µs ± 1% -1.88% (p=0.000 n=10+8) _ZFlat1-2 1.76ms ± 0% 1.74ms ± 0% -1.00% (p=0.000 n=9+9) _ZFlat2-2 9.54µs ± 0% 9.84µs ± 5% +3.18% (p=0.000 n=10+10) _ZFlat3-2 449ns ± 0% 447ns ± 0% -0.38% (p=0.000 n=10+9) _ZFlat4-2 15.6µs ± 0% 16.0µs ± 4% ~ (p=0.118 n=9+10) _ZFlat5-2 560µs ± 1% 555µs ± 1% -0.89% (p=0.000 n=9+9) _ZFlat6-2 531µs ± 0% 534µs ± 0% +0.64% (p=0.000 n=10+10) _ZFlat7-2 466µs ± 0% 468µs ± 0% +0.32% (p=0.003 n=10+10) _ZFlat8-2 1.42ms ± 0% 1.42ms ± 0% +0.43% (p=0.000 n=10+10) _ZFlat9-2 1.93ms ± 0% 1.94ms ± 0% +0.44% (p=0.000 n=10+10) _ZFlat10-2 120µs ± 0% 121µs ± 3% ~ (p=0.436 n=9+9) _ZFlat11-2 433µs ± 0% 437µs ± 0% +1.03% (p=0.000 n=10+10) ExtendMatch-2 9.77µs ± 0% 9.76µs ± 0% -0.13% (p=0.050 n=10+10) As measured on Cortex-A53 (Raspberry Pi 3) name old time/op new time/op delta WordsDecode1e1-4 152ns ± 2% 151ns ± 0% ~ (p=0.536 n=10+8) WordsDecode1e2-4 639ns ± 0% 617ns ± 0% -3.54% (p=0.000 n=9+8) WordsDecode1e3-4 6.74µs ± 2% 6.35µs ± 0% -5.75% (p=0.000 n=10+9) WordsDecode1e4-4 66.7µs ± 0% 63.5µs ± 0% -4.69% (p=0.000 n=9+9) WordsDecode1e5-4 715µs ± 0% 684µs ± 0% -4.38% (p=0.000 n=8+8) WordsDecode1e6-4 6.87ms ± 2% 6.53ms ± 1% -4.99% (p=0.000 n=10+9) WordsEncode1e1-4 127ns ± 2% 126ns ± 0% ~ (p=0.065 n=10+9) WordsEncode1e2-4 1.58µs ± 0% 1.57µs ± 0% -0.99% (p=0.000 n=8+8) WordsEncode1e3-4 15.1µs ± 0% 14.9µs ± 0% -1.46% (p=0.000 n=9+8) WordsEncode1e4-4 148µs ± 0% 148µs ± 4% ~ (p=0.497 n=9+10) WordsEncode1e5-4 1.54ms ± 0% 1.54ms ± 0% +0.12% (p=0.012 n=10+8) WordsEncode1e6-4 14.4ms ± 0% 14.4ms ± 1% -0.47% (p=0.015 n=9+8) RandomEncode-4 1.13ms ± 1% 1.13ms ± 1% ~ (p=0.529 n=10+10) _UFlat0-4 294µs ± 0% 288µs ± 1% -2.08% (p=0.000 n=9+9) _UFlat1-4 3.05ms ± 1% 2.98ms ± 1% -2.22% (p=0.000 n=9+9) _UFlat2-4 37.3µs ± 0% 37.4µs ± 1% ~ (p=0.093 n=8+9) _UFlat3-4 909ns ± 0% 914ns ± 2% ~ (p=0.526 n=8+10) _UFlat4-4 58.7µs ± 0% 58.1µs ± 0% -1.09% (p=0.000 n=8+10) _UFlat5-4 1.22ms ± 0% 1.19ms ± 1% -2.14% (p=0.000 n=8+8) _UFlat6-4 1.03ms ± 0% 0.99ms ± 0% -3.28% (p=0.000 n=9+8) _UFlat7-4 895µs ± 0% 861µs ± 0% -3.79% (p=0.000 n=8+8) _UFlat8-4 2.83ms ± 0% 2.75ms ± 0% -2.88% (p=0.000 n=7+8) _UFlat9-4 3.85ms ± 1% 3.73ms ± 1% -3.03% (p=0.000 n=8+9) _UFlat10-4 286µs ± 0% 282µs ± 0% -1.59% (p=0.000 n=9+9) _UFlat11-4 1.06ms ± 0% 1.02ms ± 0% -3.58% (p=0.000 n=8+9) _ZFlat0-4 620µs ± 0% 620µs ± 1% ~ (p=0.963 n=9+8) _ZFlat1-4 9.49ms ± 1% 9.67ms ± 3% +1.87% (p=0.000 n=9+10) _ZFlat2-4 61.8µs ± 0% 62.3µs ± 3% ~ (p=0.829 n=8+10) _ZFlat3-4 2.80µs ± 1% 2.79µs ± 0% -0.55% (p=0.000 n=8+8) _ZFlat4-4 108µs ± 0% 109µs ± 0% +0.55% (p=0.000 n=10+8) _ZFlat5-4 2.59ms ± 2% 2.58ms ± 1% ~ (p=0.274 n=10+8) _ZFlat6-4 2.39ms ± 3% 2.40ms ± 1% ~ (p=0.631 n=10+10) _ZFlat7-4 2.11ms ± 0% 2.08ms ± 1% -1.23% (p=0.000 n=10+9) _ZFlat8-4 6.86ms ± 0% 6.92ms ± 1% +0.78% (p=0.000 n=9+8) _ZFlat9-4 9.42ms ± 0% 9.40ms ± 1% ~ (p=0.606 n=8+9) _ZFlat10-4 620µs ± 1% 621µs ± 4% ~ (p=0.173 n=8+10) _ZFlat11-4 1.94ms ± 0% 1.93ms ± 0% -0.52% (p=0.001 n=9+8) ExtendMatch-4 69.3µs ± 2% 69.2µs ± 0% ~ (p=0.515 n=10+8)

AWSjswinney · 2020-11-03T15:44:20Z

@nigeltao The change to the Go compiler has now been merged and will expose this bug in snappy if we don't merge this fix. Can you take a look or refer someone who can?

Thanks!

See golang/snappy#56 Signed-off-by: Matthew Sykes <[email protected]>

bobrik · 2021-02-17T07:22:25Z

Should there be a test for this? The following does the trick, but there's probably a smaller input:

snappy.Encode(nil, []byte{0xa, 0x5c, 0xa, 0x17, 0xa, 0x8, 0x5f, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x5f, 0x5f, 0x12, 0xb, 0x4d, 0x61, 0x74, 0x63, 0x68, 0x53, 0x65, 0x72, 0x69, 0x65, 0x73, 0xa, 0xf, 0xa, 0x2, 0x64, 0x62, 0x12, 0x9, 0x54, 0x65, 0x6e, 0x4d, 0x69, 0x6e, 0x75, 0x74, 0x65, 0xa, 0x13, 0xa, 0xb, 0x54, 0x75, 0x72, 0x62, 0x69, 0x6e, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4, 0x56, 0x31, 0x31, 0x32, 0xa, 0x9, 0xa, 0x4, 0x50, 0x61, 0x72, 0x6b, 0x12, 0x1, 0x31, 0x12, 0x10, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x10, 0xb5, 0xaf, 0xdc, 0xf6, 0xfa, 0x2e, 0xa, 0x5c, 0xa, 0x17, 0xa, 0x8, 0x5f, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x5f, 0x5f, 0x12, 0xb, 0x4d, 0x61, 0x74, 0x63, 0x68, 0x53, 0x65, 0x72, 0x69, 0x65, 0x73, 0xa, 0xf, 0xa, 0x2, 0x64, 0x62, 0x12, 0x9, 0x54, 0x65, 0x6e, 0x4d, 0x69, 0x6e, 0x75, 0x74, 0x65, 0xa, 0x13, 0xa, 0xb, 0x54, 0x75, 0x72, 0x62, 0x69, 0x6e, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4, 0x56, 0x31, 0x31, 0x32, 0xa, 0x9, 0xa, 0x4, 0x50, 0x61, 0x72, 0x6b, 0x12, 0x1, 0x32, 0x12, 0x10, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x10, 0xb5, 0xaf, 0xdc, 0xf6, 0xfa, 0x2e, 0xa, 0x5c, 0xa, 0x17, 0xa, 0x8, 0x5f, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x5f, 0x5f, 0x12, 0xb, 0x4d, 0x61, 0x74, 0x63, 0x68, 0x53, 0x65, 0x72, 0x69, 0x65, 0x73, 0xa, 0xf, 0xa, 0x2, 0x64, 0x62, 0x12, 0x9, 0x54, 0x65, 0x6e, 0x4d, 0x69, 0x6e, 0x75, 0x74, 0x65, 0xa, 0x13, 0xa, 0xb, 0x54, 0x75, 0x72, 0x62, 0x69, 0x6e, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4, 0x56, 0x31, 0x31, 0x32, 0xa, 0x9, 0xa, 0x4, 0x50, 0x61, 0x72, 0x6b, 0x12, 0x1, 0x33, 0x12, 0x10, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x10, 0xb5, 0xaf, 0xdc, 0xf6, 0xfa, 0x2e, 0xa, 0x5c, 0xa, 0x17, 0xa, 0x8, 0x5f, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x5f, 0x5f, 0x12, 0xb, 0x4d, 0x61, 0x74, 0x63, 0x68, 0x53, 0x65, 0x72, 0x69, 0x65, 0x73, 0xa, 0xf, 0xa, 0x2, 0x64, 0x62, 0x12, 0x9, 0x54, 0x65, 0x6e, 0x4d, 0x69, 0x6e, 0x75, 0x74, 0x65, 0xa, 0x13, 0xa, 0xb, 0x54, 0x75, 0x72, 0x62, 0x69, 0x6e, 0x65, 0x54, 0x79, 0x70, 0x65, 0x12, 0x4, 0x56, 0x31, 0x31, 0x32, 0xa, 0x9, 0xa, 0x4, 0x50, 0x61, 0x72, 0x6b, 0x12, 0x1, 0x34, 0x12, 0x10, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf0, 0x3f, 0x10, 0xb5, 0xaf, 0xdc, 0xf6, 0xfa, 0x2e})

It may be also good to create a new tag now that Go 1.16 is out.

nigeltao · 2021-02-25T12:07:38Z

Is it possible to get this fix tagged with v.0.0.3?

Done. Sorry for the delay.

…11396) Pick up golang/snappy#56 to handle arm64 architectures to fix panics. tldr; Golang 1.16 changed `memmove` implementation for arm64 requiring additional cpu registers that snappy wasn't preserving in its assembly implementation. Other projects have experienced this issue as well, searching for `encode_arm64.s:666` on your favorite search engine will reveal some. Vault updated the dependency earlier this August: hashicorp/vault#12371 . I believe this issue affects Nomad 1.2.x and 1.1.x. Nomad 1.0.x use Golang 1.15 and isn't affected. However, backporting the change to 1.0.x should be harmless. Fixed #11385 .

nigeltao merged commit 674baa8 into golang:master Nov 3, 2020

ucwong mentioned this pull request Nov 4, 2020

go.mod: update github.com/golang/snappy for encode_arm64 fix ethereum/go-ethereum#21785

Closed

yperbasis mentioned this pull request Feb 5, 2021

Update snappy to prevent crash on Apple M1 erigontech/erigon#1469

Merged

sykesm added a commit to sykesm/fabric that referenced this pull request Feb 15, 2021

deps: bump golang/snappy for arm64 fixes

e75f7ba

See golang/snappy#56 Signed-off-by: Matthew Sykes <[email protected]>

sykesm added a commit to sykesm/fabric that referenced this pull request Feb 16, 2021

deps: bump golang/snappy for arm64 fixes

54b7898

See golang/snappy#56 Signed-off-by: Matthew Sykes <[email protected]>

mastersingh24 pushed a commit to hyperledger/fabric that referenced this pull request Feb 16, 2021

deps: bump golang/snappy for arm64 fixes

84b332d

See golang/snappy#56 Signed-off-by: Matthew Sykes <[email protected]>

prestonvanloon mentioned this pull request Mar 2, 2021

Update snappy to fix ARM64 with go.1.16 (Reverts #8538) prysmaticlabs/prysm#8539

Merged

2 tasks

dependabot bot mentioned this pull request Mar 18, 2021

Bump github.com/golang/snappy from 0.0.2 to 0.0.3 mx-psi/opentelemetry-collector#10

Closed

cyriltovena mentioned this pull request Mar 22, 2021

promtail: journal scrape not working for aarch64 / rpi4 grafana/loki#3522

Closed

notnoop mentioned this pull request Oct 27, 2021

Fix arm64 panics by updating google/snappy library to latest, 0.0.4 hashicorp/nomad#11396

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug fix to encode_arm64.s: some registers overwritten in memmove call #56

bug fix to encode_arm64.s: some registers overwritten in memmove call #56

AWSjswinney commented Oct 2, 2020

AWSjswinney commented Nov 3, 2020

bobrik commented Feb 17, 2021

nigeltao commented Feb 25, 2021

bug fix to encode_arm64.s: some registers overwritten in memmove call #56

bug fix to encode_arm64.s: some registers overwritten in memmove call #56

Conversation

AWSjswinney commented Oct 2, 2020

AWSjswinney commented Nov 3, 2020

bobrik commented Feb 17, 2021

nigeltao commented Feb 25, 2021