Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

boogeroccam · 2024-12-17T15:03:02Z

Describe the bug

A clear and concise description of what the bug is.

Nitro can't stop successfully after getting sigint signal.

To Reproduce

Steps to reproduce the behavior:

docker-compose.yaml

version: '3.8'
services:
  arbitrum-node:
    image: 'offchainlabs/nitro-node:v3.2.1-d81324d'
    container_name: arbitrum-node
    ports:
      - "0.0.0.0:8547:8547"
      - "0.0.0.0:8546:8546"
      - "0.0.0.0:6070:6070"
    command:
      #- --init.prune=full
      #- --init.url=file:///home/user/.arbitrum/archive.tar
      - --parent-chain.connection.url=http://eth-node:8545
      - --parent-chain.blob-client.beacon-url=http://eth-node:5052
      - --chain.id=42161
      - --ws.addr=0.0.0.0
      - --ws.origins=*
      - --http.vhosts=*
      - --http.addr=0.0.0.0
      - --http.corsdomain=*
      - --metrics
      - --metrics-server.addr=0.0.0.0
      - --metrics-server.port=6070
    volumes:
      - /data/arbitrum:/home/user/.arbitrum
    restart: unless-stopped
    deploy:
      restart_policy:
        condition: on-failure
      update_config:
        delay: 10s
    stop_grace_period: 30s
    network_mode: host
    user: 1000:1000
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: "10"

docker compose up
docker exec arbitrum-node kill -15 1
waiting for end up node process more than 30 second and get error

Expected behavior

A clear and concise description of what you expected to happen.

The process is expected to complete successfully and create a snapshot from which we can perform prune.

Additional context

Add any other context about the problem here.

Firstly snapshot gathered from https://snapshot-explorer.arbitrum.io/ (pruned), started and after hitting 2.9TB in db-size on order to prune db read some another issues with prunning #2441 #2805, decide to give a chance nitro to terminate successfully

Full log from nitro container in file
arbitrium-logs.log

INFO [12-17|13:58:37.697] shutting down because of sigint
INFO [12-17|13:58:37.697] HTTP server stopped                      endpoint=[::]:8547
INFO [12-17|13:58:37.697] HTTP server stopped                      endpoint=[::]:8548
INFO [12-17|13:58:37.697] delayed sequencer: context done          err="context canceled"
INFO [12-17|13:58:38.394] created block                            l2Block=285,722,652 l2BlockHash=5a33dd..b036e3
...
WARN [12-17|13:59:07.700] taking too long to stop                  name=arbnode.MessagePruner delay[s]=30.000
WARN [12-17|13:59:07.701] goroutine 1 [running]:
github.com/offchainlabs/nitro/util/stopwaiter.getAllStackTraces()
	/workspace/util/stopwaiter/stopwaiter.go:121 +0x3d
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiterSafe).stopAndWaitImpl(0xc037e9e300, 0x6fc23ac00)
	/workspace/util/stopwaiter/stopwaiter.go:139 +0xe5
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiterSafe).StopAndWait(...)
	/workspace/util/stopwaiter/stopwaiter.go:116
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiter).StopAndWait(...)
	/workspace/util/stopwaiter/stopwaiter.go:324
github.com/offchainlabs/nitro/arbnode.(*Node).StopAndWait(0xc000764fc0)
	/workspace/arbnode/node.go:977 +0x19a
main.mainImpl.func14()
	/workspace/cmd/nitro/nitro.go:638 +0x17
main.mainImpl.func6()
	/workspace/cmd/nitro/nitro.go:434 +0x35
main.mainImpl()
	/workspace/cmd/nitro/nitro.go:675 +0x5172
main.main()
	/workspace/cmd/nitro/nitro.go:142 +0x13
...
goroutine 64031 [select]:
github.com/ethereum/go-ethereum/core/state.(*subfetcher).loop(0xc11aeda800)
	/workspace/go-ethereum/core/state/trie_prefetcher.go:319 +0x553
created by github.com/ethereum/go-ethereum/core/state.newSubfetcher in goroutine 187
	/workspace/go-ethereum/core/state/trie_prefetcher.go:246 +0x206

INFO [12-17|13:59:08.451] created block                            l2Block=285,722,771 l2BlockHash=fbbf31..d89917
INFO [12-17|13:59:09.452] created block                            l2Block=285,722,775 l2BlockHash=da8fcb..dd1751
INFO [12-17|13:59:10.452] created block                            l2Block=285,722,779 l2BlockHash=87f0ef..50c265

The text was updated successfully, but these errors were encountered:

joshuacolvin0 · 2024-12-23T20:51:17Z

Latest arb1 snapshot has been updated to use PebbleDB, and it is slightly smaller now at 2.67GB. This snapshot is also using PebbleDB, which doesn't have the sigint issue.

All the same, we are working on a fix for shutting down if you are using LevelDB.

joshuacolvin0 closed this as completed Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

boogeroccam commented Dec 17, 2024

joshuacolvin0 commented Dec 23, 2024

Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

Comments

boogeroccam commented Dec 17, 2024

joshuacolvin0 commented Dec 23, 2024