Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nitro can't stop successfully: 'taking too long to stop' and after goes forward #2839

Closed
boogeroccam opened this issue Dec 17, 2024 · 1 comment

Comments

@boogeroccam
Copy link

Describe the bug

A clear and concise description of what the bug is.

Nitro can't stop successfully after getting sigint signal.

To Reproduce

Steps to reproduce the behavior:

  1. docker-compose.yaml

    version: '3.8'
    services:
      arbitrum-node:
        image: 'offchainlabs/nitro-node:v3.2.1-d81324d'
        container_name: arbitrum-node
        ports:
          - "0.0.0.0:8547:8547"
          - "0.0.0.0:8546:8546"
          - "0.0.0.0:6070:6070"
        command:
          #- --init.prune=full
          #- --init.url=file:///home/user/.arbitrum/archive.tar
          - --parent-chain.connection.url=http://eth-node:8545
          - --parent-chain.blob-client.beacon-url=http://eth-node:5052
          - --chain.id=42161
          - --ws.addr=0.0.0.0
          - --ws.origins=*
          - --http.vhosts=*
          - --http.addr=0.0.0.0
          - --http.corsdomain=*
          - --metrics
          - --metrics-server.addr=0.0.0.0
          - --metrics-server.port=6070
        volumes:
          - /data/arbitrum:/home/user/.arbitrum
        restart: unless-stopped
        deploy:
          restart_policy:
            condition: on-failure
          update_config:
            delay: 10s
        stop_grace_period: 30s
        network_mode: host
        user: 1000:1000
        logging:
          driver: json-file
          options:
            max-size: 10m
            max-file: "10"
  2. docker compose up

  3. docker exec arbitrum-node kill -15 1

  4. waiting for end up node process more than 30 second and get error

Expected behavior

A clear and concise description of what you expected to happen.

The process is expected to complete successfully and create a snapshot from which we can perform prune.

Additional context

Add any other context about the problem here.

Firstly snapshot gathered from https://snapshot-explorer.arbitrum.io/ (pruned), started and after hitting 2.9TB in db-size on order to prune db read some another issues with prunning #2441 #2805, decide to give a chance nitro to terminate successfully

Full log from nitro container in file
arbitrium-logs.log

INFO [12-17|13:58:37.697] shutting down because of sigint
INFO [12-17|13:58:37.697] HTTP server stopped                      endpoint=[::]:8547
INFO [12-17|13:58:37.697] HTTP server stopped                      endpoint=[::]:8548
INFO [12-17|13:58:37.697] delayed sequencer: context done          err="context canceled"
INFO [12-17|13:58:38.394] created block                            l2Block=285,722,652 l2BlockHash=5a33dd..b036e3
...
WARN [12-17|13:59:07.700] taking too long to stop                  name=arbnode.MessagePruner delay[s]=30.000
WARN [12-17|13:59:07.701] goroutine 1 [running]:
github.com/offchainlabs/nitro/util/stopwaiter.getAllStackTraces()
	/workspace/util/stopwaiter/stopwaiter.go:121 +0x3d
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiterSafe).stopAndWaitImpl(0xc037e9e300, 0x6fc23ac00)
	/workspace/util/stopwaiter/stopwaiter.go:139 +0xe5
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiterSafe).StopAndWait(...)
	/workspace/util/stopwaiter/stopwaiter.go:116
github.com/offchainlabs/nitro/util/stopwaiter.(*StopWaiter).StopAndWait(...)
	/workspace/util/stopwaiter/stopwaiter.go:324
github.com/offchainlabs/nitro/arbnode.(*Node).StopAndWait(0xc000764fc0)
	/workspace/arbnode/node.go:977 +0x19a
main.mainImpl.func14()
	/workspace/cmd/nitro/nitro.go:638 +0x17
main.mainImpl.func6()
	/workspace/cmd/nitro/nitro.go:434 +0x35
main.mainImpl()
	/workspace/cmd/nitro/nitro.go:675 +0x5172
main.main()
	/workspace/cmd/nitro/nitro.go:142 +0x13
...
goroutine 64031 [select]:
github.com/ethereum/go-ethereum/core/state.(*subfetcher).loop(0xc11aeda800)
	/workspace/go-ethereum/core/state/trie_prefetcher.go:319 +0x553
created by github.com/ethereum/go-ethereum/core/state.newSubfetcher in goroutine 187
	/workspace/go-ethereum/core/state/trie_prefetcher.go:246 +0x206

INFO [12-17|13:59:08.451] created block                            l2Block=285,722,771 l2BlockHash=fbbf31..d89917
INFO [12-17|13:59:09.452] created block                            l2Block=285,722,775 l2BlockHash=da8fcb..dd1751
INFO [12-17|13:59:10.452] created block                            l2Block=285,722,779 l2BlockHash=87f0ef..50c265
@joshuacolvin0
Copy link
Member

Latest arb1 snapshot has been updated to use PebbleDB, and it is slightly smaller now at 2.67GB. This snapshot is also using PebbleDB, which doesn't have the sigint issue.

All the same, we are working on a fix for shutting down if you are using LevelDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants