Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not working on apple m1 #62

Closed
leejw51crypto opened this issue May 6, 2021 · 10 comments
Closed

Not working on apple m1 #62

leejw51crypto opened this issue May 6, 2021 · 10 comments

Comments

@leejw51crypto
Copy link

leejw51crypto commented May 6, 2021

version 0.0.2, 0.0.3: crash
version 0.0.1: OK
crashes in encode_arm64.s

error cause is
related to snappy, arm64
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:741 +0x230 fp=0x1400159e140 sp=0x1400159e100 pc=0x10303b9f0
github.com/golang/snappy.encodeBlock(0x14000dc7502, 0x1304, 0x1304, 0x14000dc6000, 0x102f, 0x13bc, 0x14000195b01)
        $HOME/go/pkg/mod/github.com/golang/[email protected]/encode_arm64.s:666 +0x360 fp=0x140015a61e0 sp=0x1400159e150 pc=0x1037118c0
github.com/golang/snappy.Encode(0x14000dc7500, 0x1306, 0x1306, 0x0, 0x0, 0x0, 0x2, 0x4, 0x140015a62f8)
       $HOME/go/pkg/mod/github.com/golang/[email protected]/encode.go:39 +0x17c fp=0x140015a6230 sp=0x140015a61e0 pc=0x103710dfc
@istae
Copy link

istae commented May 24, 2021

What is the go version?

@alfianabdi
Copy link

alfianabdi commented Jun 8, 2021

Hi I am experiencing this issue as well.
I built cortex binary for arm64 ( golang 1.14.9 )
During runtime I got the following error

unexpected fault address 0x7900ef40aabd0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x7900ef40aabd0 pc=0x4ec610]

goroutine 766 [running]:
runtime.throw(0x1d3340d, 0x5)
        /usr/local/go/src/runtime/panic.go:1117 +0x54 fp=0x4002529c00 sp=0x4002529bd0 pc=0x44614
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:741 +0x230 fp=0x4002529c40 sp=0x4002529c00 pc=0x5c220
github.com/golang/snappy.encodeBlock(0x4001e39105, 0x6ea, 0x6ea, 0x4001b69800, 0x5d4, 0x600, 0x50301)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/golang/snappy/encode_arm64.s:666 +0x360 fp=0x4002531ce0 sp=0x4002529c50 pc=0x4ec610
github.com/golang/snappy.Encode(0x4001e39103, 0x6ec, 0x6ec, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/golang/snappy/encode.go:39 +0x17c fp=0x4002531d30 sp=0x4002531ce0 pc=0x4eb29c
github.com/thanos-io/thanos/pkg/store.diffVarintSnappyEncode(0x2208670, 0x400087b700, 0x3be, 0x4002c05904, 0xefc, 0x101a6c, 0x0, 0x0)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/thanos-io/thanos/pkg/store/postings_codec.go:47 +0x100 fp=0x4002531dc0 sp=0x4002531d30 pc=0x1479430
github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).fetchPostings.func3(0x4000259d01, 0x4000cd08c0)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/thanos-io/thanos/pkg/store/bucket.go:1862 +0x354 fp=0x4002531f60 sp=0x4002531dc0 pc=0x147cb44
golang.org/x/sync/errgroup.(*Group).Go.func1(0x400279bdd0, 0x4000e03950)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x58 fp=0x4002531fc0 sp=0x4002531f60 pc=0x4e8b88
runtime.goexit()
        /usr/local/go/src/runtime/asm_arm64.s:1130 +0x4 fp=0x4002531fc0 sp=0x4002531fc0 pc=0x7bbc4
created by golang.org/x/sync/errgroup.(*Group).Go
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x60

Update: rebuild using golang 1.16 resulting in the same binary, and also the same error.

@nigeltao
Copy link
Contributor

@AWSjswinney is this an arm64 assembly thing?

@AWSjswinney
Copy link
Contributor

Yeah, definitely looks like it. I'll look into it.

@AWSjswinney
Copy link
Contributor

@alfianabdi or @leejw51crypto Could one of you provide minimal instructions to reproduce this issue? It would be better if it worked on Linux/aarch64 because it's harder for me to get access to an M1 system. I built the cortext containers, but I'm not sure how to use them to reproduce the bug.

@alfianabdi
Copy link

alfianabdi commented Jun 24, 2021

@AWSjswinney

I do the following on my local machine: WSL with Docker Desktop. The error I posted above was from Kubernetes with different configuration for cortex. I run cortex container ( arm64 ) with emulator.

Create docker image for cortex

FROM       arm64v8/alpine:3.12
RUN        apk add --no-cache ca-certificates
RUN        addgroup -g 10002 -S cortex && \
           adduser -u 10002 -S cortex -G cortex
RUN        mkdir -p /etc/cortex /data && \
           chown -R cortex:cortex /etc/cortex /data
COPY       --chown=cortex:cortex migrations /migrations/
COPY       cortex-linux-arm64v8 /bin/cortex
WORKDIR    /data
EXPOSE     80
USER       cortex
ENTRYPOINT [ "/bin/cortex" ]

ARG revision
LABEL org.opencontainers.image.title="cortex" \
      org.opencontainers.image.source="https://github.com/cortexproject/cortex/tree/master/cmd/cortex" \
      org.opencontainers.image.revision="${revision}"

Create config file for cortex single binary: cortex.yaml

auth_enabled: false
server:
  http_listen_port: 8080
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  grpc_server_max_concurrent_streams: 1000
distributor:
  shard_by_all_labels: true
  pool:
    health_check_ingesters: true
ingester_client:
  grpc_client_config:
    max_recv_msg_size: 104857600
    max_send_msg_size: 104857600
    grpc_compression: snappy
ingester:
  lifecycler:
    join_after: 0
    min_ready_duration: 0s
    final_sleep: 0s
    num_tokens: 512
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
storage:
  engine: blocks
blocks_storage:
  tsdb:
    dir: /tmp/cortex/tsdb
  bucket_store:
    sync_dir: /tmp/cortex/tsdb-sync
  backend: filesystem
compactor:
  data_dir: /tmp/cortex/compactor
  sharding_ring:
    kvstore:
      store: inmemory
frontend_worker:
  match_max_concurrent: true

Run the cortex container

docker run -d -v ${PWD}/cortex.yaml:/etc/cortex/cortex.yaml:ro -p 8080:8080 <Cortex ARM Image> -config.file=/etc/cortex/cortex.yaml

Cortex container should be running, get the IP and verify the logs

docker inspect <container id> --format '{{.NetworkSettings.Networks.bridge.IPAddress}}'
docker logs <container id>

Create config file for prometheus: prometheus.yaml. Replace the IP in remote write with the cortex container IP.

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
remote_write:
- headers:
    X-Scope-OrgID: fake
  url: http://172.17.0.2:8080/api/v1/push
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

Create docker volume and set the permission and run the prometheus container

docker volume create prometheus
docker run --rm  --mount source=prometheus,target=/data alpine:3.13 chown -R 65534:65534 /data
docker run -d -v ${PWD}/prometheus.yaml:/etc/config/prometheus.yml:ro --mount source=prometheus,target=/data quay.io/prometheus/prometheus:v2.26.1 --storage.tsdb.retention.time=15d --config.file=/etc/config/prometheus.yml --storage.tsdb.path=/data --web.console.libraries=/etc/prometheus/console_libraries --web.console.templates=/etc/prometheus/consoles

The cortex container should be exited now

unexpected fault address 0xffffffffff073000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0xffffffffff073000 pc=0x4ec610]

goroutine 1092 [running]:
runtime.throw(0x1d3340d, 0x5)
        /usr/local/go/src/runtime/panic.go:1117 +0x54 fp=0x4000dbceb0 sp=0x4000dbce80 pc=0x44614
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:741 +0x230 fp=0x4000dbcef0 sp=0x4000dbceb0 pc=0x5c220
github.com/golang/snappy.encodeBlock(0x4000d60015, 0x12ac7, 0x12ac7, 0x4000c0a000, 0xc079, 0x10000, 0x406367f31b)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/golang/snappy/encode_arm64.s:666 +0x360 fp=0x4000dc4f90 sp=0x4000dbcf00 pc=0x4ec610
github.com/golang/snappy.Encode(0x4000d60012, 0x12aca, 0x12aca, 0x0, 0x0, 0x0, 0x40008210c8, 0x1a0e0, 0x4000ac0500)
        /home/alfian/work/go/src/github.com/cortexproject/cortex/vendor/github.com/golang/snappy/encode.go:39 +0x17c fp=0x4000dc4fe0 sp=0x4000dc4f90 pc=0x4eb29c

@AWSjswinney
Copy link
Contributor

AWSjswinney commented Jun 24, 2021

Thanks @alfianabdi I followed the instructions, but I haven't been able to reproduce the bug. Can you give me the commit hash of cortex you are working on?

Can you verify that the project is using Snappy v0.0.3? I'm looking at the stack trace and I see that it points to a crash on encode_arm64.s:666. In the latest version that line is blank. However in v0.0.2, there is a known bug which could explain a segfault on that line.

After looking more closely at the original report, I see that Snappy v0.0.2 is being used, which definitely explains the problem. @leejw51crypto Can you please try upgrading to v0.0.3?

@alfianabdi
Copy link

@AWSjswinney

Thanks for your help. The release 1.7 was indeed using v0.02 but the latest is using v0.03.
I will try build the latest version.

@nigeltao
Copy link
Contributor

@alfianabdi any news?

@alfianabdi
Copy link

@nigeltao and all

Cortex 1.9 with snappy compression (v0.0.3) works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants