op-geth stuck with "State not available" #130

d10r · 2023-09-09T10:46:30Z

My optimism-goerli node got stuck with op-geth outputting this:

op-geth_1  | INFO [09-09|10:28:46.160] Regenerating historical state            block=12,566,031 target=12,572,804 remaining=6773    elapsed=7m20.25015045s
op-geth_1  | INFO [09-09|10:28:54.177] Regenerating historical state            block=12,568,349 target=12,572,804 remaining=4455    elapsed=7m28.26743038s
op-geth_1  | INFO [09-09|10:29:02.191] Regenerating historical state            block=12,570,850 target=12,572,804 remaining=1954    elapsed=7m36.281140218s
op-geth_1  | INFO [09-09|10:29:09.168] Historical state regenerated             block=12,572,804 elapsed=7m43.258507511s nodes=107.24MiB preimages=0.00B
op-geth_1  | INFO [09-09|10:29:09.832] Starting work on payload                 id=0x06e60832d5e72719
op-geth_1  | WARN [09-09|10:29:17.419] State not available, ignoring new payload 
op-geth_1  | WARN [09-09|10:29:27.424] State not available, ignoring new payload 
op-geth_1  | WARN [09-09|10:29:37.428] State not available, ignoring new payload

I can connect to the http RPC interface, but the head block is stuck at 12573071.

op-node meanwhile shows:

op-node_1  | t=2023-09-09T10:43:36+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x7488a42e0d5460fd50245a28a5d9adda1ff4792a32e5e88cbfc8d4f0a42da6cb:14414074
op-node_1  | t=2023-09-09T10:43:37+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804 prev_payload_id=0x06e60832d5e72719 new_onto=0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804
op-node_1  | t=2023-09-09T10:43:37+0000 lvl=warn msg="Derivation process temporary error"     attempts=121 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804, id: 0x06e60832d5e72719, error (1): execution payload cannot be validated yet, latest valid hash is <nil>"
op-node_1  | t=2023-09-09T10:43:38+0000 lvl=info msg="Received signed execution payload from p2p" id=0xd9716eeabed250e8016cfc49bd3e498b171f544286c0b34ebeaf5adf03d01b65:14414075 peer=16Uiu2HAmNiobrb62Y6p5bzmqL5oqnvkoK3wJmefcWAtvoGJriWWu

It's not clear to me what this means and what to do about it.
I suspect it's relate to the connected goerli L1 having been unavailable for a while. But it's now good again, yet this Optimism node seems to not be recovering.

The text was updated successfully, but these errors were encountered:

0xChupaCabra · 2023-09-10T11:37:15Z

same issues here with mainnet node

sbvegan · 2023-09-11T13:03:19Z

Can you both provide the following information:

System Specs:

OS:
op-geth package version (or commit hash):
Archive Node or Full Node?:
op-node package version (or commit hash):

Config:

op-geth config:

op-nodeconfig:

Add any other context about the problem here:

d10r · 2023-09-15T17:45:39Z

System specs

Ubuntu 20.04.5 LTS
op-geth: v1.101200.1
syncmode=full
op-node: v1.1.4

Config:
Extract from docker-compose.yml

op-geth:
    # see https://github.com/ethereum-optimism/op-geth/releases
    image: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-geth:v1.101200.1
    restart: unless-stopped
    stop_grace_period: 30s
    # see https://community.optimism.io/docs/developers/bedrock/node-operator-guide/#configuration
    # and https://community.optimism.io/docs/developers/bedrock/public-testnets/#goerli-upgrade-rehearsal
    command: |
      --datadir=/data
      --networkid=420
      --http
      --http.addr=0.0.0.0
      --http.vhosts=*
      --http.corsdomain=*
      --ws
      --ws.port=8546
      --ws.addr=0.0.0.0
      --ws.origins=*
      --authrpc.addr=0.0.0.0
      --authrpc.port=8551
      --authrpc.jwtsecret="/config/jwt.hex"
      --authrpc.vhosts=*
      --rollup.disabletxpoolgossip=true
      --rollup.sequencerhttp=https://goerli-sequencer.optimism.io
      --nodiscover
      --syncmode=full
      --maxpeers=0

  op-node:
    # see https://github.com/ethereum-optimism/optimism/releases
    image: us-docker.pkg.dev/oplabs-tools-artifacts/images/op-node:v1.1.4
    restart: unless-stopped
    command: |
      op-node
      --network=goerli
      --l1=${L1RPC:-https://eth-goerli.somedomain.tld}
      --l2=http://op-geth:8551
      --rpc.addr=0.0.0.0
      --rpc.port=9545
      --l2.jwt-secret=/config/jwt.hex
      --l1.trustrpc=true

Not sure what other context could be useful. Here is complete log output with verbosity set to debug:

op-geth_1  | INFO [09-15|17:43:27.709] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
op-geth_1  | WARN [09-15|17:43:27.710] Using deprecated resource file, please move this file to the 'geth' subdirectory of datadir. file=/data/nodekey
op-geth_1  | DEBUG[09-15|17:43:27.710] FS scan times                            list="35.018µs" set=771ns diff="1.794µs"
op-geth_1  | DEBUG[09-15|17:43:27.712] Sanitizing Go's GC trigger               percent=100
op-geth_1  | INFO [09-15|17:43:27.713] Set global gas cap                       cap=50,000,000
op-geth_1  | INFO [09-15|17:43:27.713] Initializing the KZG library             backend=gokzg
op-geth_1  | INFO [09-15|17:43:27.739] Allocated trie memory caches             clean=154.00MiB dirty=256.00MiB
op-geth_1  | INFO [09-15|17:43:27.769] Using leveldb as the backing database 
op-geth_1  | INFO [09-15|17:43:27.769] Allocated cache and file handles         database=/data/chaindata cache=512.00MiB handles=524,288
op-geth_1  | INFO [09-15|17:43:27.866] Using LevelDB as the backing database 
op-geth_1  | INFO [09-15|17:43:27.866] Found legacy ancient chain path          location=/data/chaindata/ancient
op-geth_1  | DEBUG[09-15|17:43:27.866] Chain freezer table opened               database=/data/chaindata/ancient table=headers items=12,483,072 size=113.09MiB
op-geth_1  | DEBUG[09-15|17:43:27.867] Chain freezer table opened               database=/data/chaindata/ancient table=hashes  items=12,483,072 size=380.95MiB
op-geth_1  | DEBUG[09-15|17:43:27.867] Chain freezer table opened               database=/data/chaindata/ancient table=bodies  items=12,483,072 size=1.53GiB
op-geth_1  | DEBUG[09-15|17:43:27.867] Chain freezer table opened               database=/data/chaindata/ancient table=receipts items=12,483,072 size=1.06GiB
op-geth_1  | DEBUG[09-15|17:43:27.868] Chain freezer table opened               database=/data/chaindata/ancient table=diffs    items=12,483,072 size=23.49MiB
op-geth_1  | INFO [09-15|17:43:27.868] Opened ancient database                  database=/data/chaindata/ancient readonly=false
op-geth_1  | DEBUG[09-15|17:43:27.871] Ancient blocks frozen already            number=12,573,071 hash=8e2f89..27762d frozen=12,483,072
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.091] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
op-geth_1  | INFO [09-15|17:43:28.091] Chain ID:  420 (OP-Goerli) 
op-geth_1  | INFO [09-15|17:43:28.091] Consensus: Optimism 
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.091] Pre-Merge hard forks (block based): 
op-geth_1  | INFO [09-15|17:43:28.091]  - Homestead:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/homestead.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Tangerine Whistle (EIP 150): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/tangerine-whistle.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Spurious Dragon/1 (EIP 155): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Spurious Dragon/2 (EIP 158): #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/spurious-dragon.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Byzantium:                   #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/byzantium.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Constantinople:              #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/constantinople.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Petersburg:                  #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/petersburg.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Istanbul:                    #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/istanbul.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Muir Glacier:                #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/muir-glacier.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Berlin:                      #0        (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/berlin.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - London:                      #4061224  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/london.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Arrow Glacier:               #4061224  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/arrow-glacier.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  - Gray Glacier:                #4061224  (https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/gray-glacier.md) 
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.091] Merge configured: 
op-geth_1  | INFO [09-15|17:43:28.091]  - Hard-fork specification:    https://github.com/ethereum/execution-specs/blob/master/network-upgrades/mainnet-upgrades/paris.md 
op-geth_1  | INFO [09-15|17:43:28.091]  - Network known to be merged: true 
op-geth_1  | INFO [09-15|17:43:28.091]  - Total terminal difficulty:  0 
op-geth_1  | INFO [09-15|17:43:28.091]  - Merge netsplit block:       #4061224  
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.091] Post-Merge hard forks (timestamp based): 
op-geth_1  | INFO [09-15|17:43:28.091]  - Regolith:                    @1679079600 
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.091] --------------------------------------------------------------------------------------------------------------------------------------------------------- 
op-geth_1  | INFO [09-15|17:43:28.091]  
op-geth_1  | INFO [09-15|17:43:28.093] Loaded most recent local block           number=12,573,071 hash=8e2f89..27762d td=0 age=1mo2w4d
op-geth_1  | INFO [09-15|17:43:28.093] Loaded most recent local finalized block number=12,572,804 hash=793b08..90d4e2 td=0 age=1mo2w4d
op-geth_1  | INFO [09-15|17:43:28.106] Initialising Ethereum protocol           network=420 dbversion=8
op-geth_1  | DEBUG[09-15|17:43:28.106] Reinjecting stale transactions           count=0
op-geth_1  | INFO [09-15|17:43:28.106] Loaded local transaction journal         transactions=0 dropped=0
op-geth_1  | INFO [09-15|17:43:28.106] Regenerated local transaction journal    transactions=0 accounts=0
op-geth_1  | INFO [09-15|17:43:28.107] Chain post-merge, sync via beacon client 
op-geth_1  | INFO [09-15|17:43:28.107] Gasprice oracle is ignoring threshold set threshold=2
op-geth_1  | WARN [09-15|17:43:28.107] Sanitizing invalid optimism gasprice oracle min priority fee suggestion provided=<nil> updated=100,000,000
op-geth_1  | WARN [09-15|17:43:28.107] Unclean shutdown detected                booted=2023-08-29T08:49:00+0000 age=2w3d8h
op-geth_1  | WARN [09-15|17:43:28.107] Unclean shutdown detected                booted=2023-09-15T17:38:39+0000 age=4m49s
op-geth_1  | WARN [09-15|17:43:28.107] Engine API enabled                       protocol=eth
op-geth_1  | INFO [09-15|17:43:28.107] Starting peer-to-peer node               instance=Geth/v0.1.0-unstable-36831023/linux-amd64/go1.20.8
op-geth_1  | DEBUG[09-15|17:43:28.111] TCP listener up                          addr=[::]:30303
op-geth_1  | DEBUG[09-15|17:43:28.114] IPCs registered                          namespaces=admin,debug,web3,eth,txpool,miner,net,engine
op-geth_1  | INFO [09-15|17:43:28.114] IPC endpoint opened                      url=/data/geth.ipc
op-geth_1  | DEBUG[09-15|17:43:28.114] Allowed origin(s) for WS RPC interface [*]

and then an endless stream of:

op-geth_1  | DEBUG[09-15|17:45:06.716] Dereferenced trie from memory database   nodes=38  size=15.35KiB  time="71.489µs"  gcnodes=1,487,236 gcsize=555.72MiB  gctime=3.038144861s livenodes=99296 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.718] Dereferenced trie from memory database   nodes=60  size=20.06KiB  time="99.813µs"  gcnodes=1,487,296 gcsize=555.74MiB  gctime=3.038244494s livenodes=99296 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.721] Dereferenced trie from memory database   nodes=80  size=31.99KiB  time="157.996µs" gcnodes=1,487,376 gcsize=555.77MiB  gctime=3.03840234s  livenodes=99303 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.722] Dereferenced trie from memory database   nodes=15  size=5.85KiB   time="27.203µs"  gcnodes=1,487,391 gcsize=555.78MiB  gctime=3.038429423s livenodes=99303 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.723] Dereferenced trie from memory database   nodes=62  size=20.45KiB  time="141.574µs" gcnodes=1,487,453 gcsize=555.80MiB  gctime=3.038570746s livenodes=99303 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.724] Dereferenced trie from memory database   nodes=15  size=5.85KiB   time="27.633µs"  gcnodes=1,487,468 gcsize=555.81MiB  gctime=3.038598229s livenodes=99303 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.733] Dereferenced trie from memory database   nodes=39  size=15.42KiB  time="71.569µs"  gcnodes=1,487,507 gcsize=555.82MiB  gctime=3.038669597s livenodes=99303 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.735] Dereferenced trie from memory database   nodes=62  size=23.64KiB  time="182.303µs" gcnodes=1,487,569 gcsize=555.84MiB  gctime=3.038851609s livenodes=99305 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.736] Dereferenced trie from memory database   nodes=15  size=5.85KiB   time="28.134µs"  gcnodes=1,487,584 gcsize=555.85MiB  gctime=3.038879603s livenodes=99305 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.736] Dereferenced trie from memory database   nodes=15  size=5.85KiB   time="26.711µs"  gcnodes=1,487,599 gcsize=555.86MiB  gctime=3.038906184s livenodes=99305 livesize=21.46MiB
op-geth_1  | DEBUG[09-15|17:45:06.736] Dereferenced trie from memory database   nodes=15  size=5.85KiB   time="39.427µs"  gcnodes=1,487,614 gcsize=555.86MiB  gctime=3.03894544s  livenodes=99305 livesize=21.46MiB

eyooooo · 2023-10-03T20:00:47Z

My optimism-goerli node got stuck with op-geth outputting this:

op-geth_1  | INFO [09-09|10:28:46.160] Regenerating historical state            block=12,566,031 target=12,572,804 remaining=6773    elapsed=7m20.25015045s
op-geth_1  | INFO [09-09|10:28:54.177] Regenerating historical state            block=12,568,349 target=12,572,804 remaining=4455    elapsed=7m28.26743038s
op-geth_1  | INFO [09-09|10:29:02.191] Regenerating historical state            block=12,570,850 target=12,572,804 remaining=1954    elapsed=7m36.281140218s
op-geth_1  | INFO [09-09|10:29:09.168] Historical state regenerated             block=12,572,804 elapsed=7m43.258507511s nodes=107.24MiB preimages=0.00B
op-geth_1  | INFO [09-09|10:29:09.832] Starting work on payload                 id=0x06e60832d5e72719
op-geth_1  | WARN [09-09|10:29:17.419] State not available, ignoring new payload 
op-geth_1  | WARN [09-09|10:29:27.424] State not available, ignoring new payload 
op-geth_1  | WARN [09-09|10:29:37.428] State not available, ignoring new payload

I can connect to the http RPC interface, but the head block is stuck at 12573071.

op-node meanwhile shows:

op-node_1  | t=2023-09-09T10:43:36+0000 lvl=info msg="Optimistically queueing unsafe L2 execution payload" id=0x7488a42e0d5460fd50245a28a5d9adda1ff4792a32e5e88cbfc8d4f0a42da6cb:14414074
op-node_1  | t=2023-09-09T10:43:37+0000 lvl=warn msg="did not finish previous block building, starting new building now" prev_onto=0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804 prev_payload_id=0x06e60832d5e72719 new_onto=0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804
op-node_1  | t=2023-09-09T10:43:37+0000 lvl=warn msg="Derivation process temporary error"     attempts=121 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x793b08cf542ab32b409c6bc08d14387af7954b8dcb4f8642668a23cb3390d4e2:12572804, id: 0x06e60832d5e72719, error (1): execution payload cannot be validated yet, latest valid hash is <nil>"
op-node_1  | t=2023-09-09T10:43:38+0000 lvl=info msg="Received signed execution payload from p2p" id=0xd9716eeabed250e8016cfc49bd3e498b171f544286c0b34ebeaf5adf03d01b65:14414075 peer=16Uiu2HAmNiobrb62Y6p5bzmqL5oqnvkoK3wJmefcWAtvoGJriWWu

It's not clear to me what this means and what to do about it. I suspect it's relate to the connected goerli L1 having been unavailable for a while. But it's now good again, yet this Optimism node seems to not be recovering.

many folks in optimism discord with same issue and there is no fix. anyone finding this issue, just nuke your node because this is unfixable.

op team will tell you to try debug_sethead, then tell you "well its never worked before but maybe it will for you"

0xChupaCabra · 2023-10-03T20:49:39Z

@eyooooo im 0xChupa from vfat lol, yes indeed the only way to get our mainnet node running again was a fresh install

zainirfan13 · 2023-10-04T04:51:15Z

It took 10 days to resync and guess what it died again after two days when it was synced.
I hardly do anything with the node except checking pool reserves and state.

This is such a shame and carelessness of the people responsible.

imtipi · 2023-10-04T09:49:33Z

Regenerating historical state
basiclly your geth database broken...
make sure start op-node first,and shutdown op-geth last

zainirfan13 · 2023-10-04T10:08:44Z

@imtipi I am sure this is not the case. It never happened before and it can't start happening to multiple people all of a sudden.
the node worked superfine for 2 months since i deployed it can't happen twice in two weeks as my server never restarts nor did I access it during this time. the service logs showed op-node and op-geth didn't restart before the "State not available" error.

d10r · 2023-10-04T10:53:23Z

Regenerating historical state basiclly your geth database broken... make sure start op-node first,and shutdown op-geth last

Are you saying that anytime op-node is down while op-geth is running (e.g. because it's being updated in a docker-compose setting), there's a risk of the DB getting corrupted?

imtipi · 2023-10-04T11:30:51Z

Regenerating historical state basiclly your geth database broken... make sure start op-node first,and shutdown op-geth last

Are you saying that anytime op-node is down while op-geth is running (e.g. because it's being updated in a docker-compose setting), there's a risk of the DB getting corrupted?

no,you need to shutdown op-node first to make sure op-geth cannot getting data anymore

d10r · 2023-10-04T11:48:34Z

Doesn't really make sense to me. If I shut down op-geth first, it can't get data anyway, because it's not running.
But regardless, it's not clear to me how this translates to ops workflows. E.g. when updating nodes, I'll usually set new version tags in an .env file and then do docker compose up -d, which re-creates and restarts the containers where the image has changed.
I now added a depends_on: op-node clause to the op-geth service configuration to ensure correct startup sequence. But afaik that won't affect shutdown order.
Is this way of handling updates safe?

As I initially wrote, I suspect that in my case the data breakage was caused by the L1 being temporarily unavailable.
Either way, imo this happening repeatedly points to a lack of robustness of the node software. It seems to make assumptions about the reliability of dependent services which isn't realistic, except maybe for commercial node operators with sophisticated and expensive setups guaranteeing high availability.
I hope this view is shared by the team, because decentralization also requires enough people being able to run a node.

imtipi · 2023-10-04T12:09:56Z

If I shut down op-geth first, it can't get data anyway,

more like a speeding car suddenly crashing into a wall
op-geth is too fragile.

protolambda · 2023-10-04T12:49:22Z

Geth needs to persist in-memory state on shutdown, at least with the hash-based DB format. They introduced an experimental new format, which we're working on adopting, but for now the hash-based DB format is more considered the more stable option.

After DB corruption because of force-shutdown what ends up happening is that the "state" (storage/account mutations) is missing for the latest few blocks, but the block data is there.

The corrupted geth node does not expose that the state is "missing", and ends up misleading the op-node into trying to build on top of the latest state, which is not there.

And so unless geth finishes the "regenerating state" phase (which it often does not, due to tricky assumptions around the corrupted DB) it gets stuck.

If you want to manually repair the DB, to avoid a sync from scratch, the steps to follow are:

Stop op-geth and op-node
Determine a block-number that a state-snapshot exists for by running op-geth db metadata --datadir=your-dir-here
Start op-geth
Run cast rpc debug_setHead hexnumberhere (RPC call to debug-namespace geth method, with single hex-number argument) where the hexnumberhere is something before the above snapshot number, to force geth to drop the stale blocks without state, to not mislead op-node again.
Wait for geth to complete (should be fast) and determine the latest block number (e.g. run cast block --rpc-url=http://op-geth:8545 latest)
Run a tool from the monorepo to correct the forkchoice of geth: go run ./op-wheel/cmd engine set-forkchoice --unsafe=X --safe=X --finalized=X --engine=http://op-geth:8551 --engine.jwt-secret=that_jwt_file_from_op_node.txt where X should be set to the latest block number.
Start op-node again. It should start syncing again.

It's not a pretty fix, but geth DB corruption is already a bad situation, and due to issues in geth we cannot automate this recovery process. Luckily the new Path-DB format is fixing this DB corruption issue, and we will be adopting these improvements from upstream geth as soon as we can.

protolambda · 2023-10-04T12:56:19Z

The geth DB corruption that causes "state not available" happens when geth shuts down, and is unrelated to any op-node or L1 interactions.

You must configure your op-geth setup to allow it to gracefully shut down, and not hit some docker/systemd/other time-out. If it does time-out, the op-geth process is killed, and state will not be fully written to disk, causing it to be not available.

d10r · 2023-10-04T14:36:34Z

thx @protolambda for this detailed explanation!
If it's an unorderly shutdown of op-geth which causes this issue, it's nothing specific to Optimism. I've seen the same before on various networks with geth or geth forks.
I didn't however expect that to happen in my setup, as I have stop_grace_period: 30s in the docker-compose service definition, which I assumed to be plenty on a fast machine. Thus I assumed it to be related to the temporary L1 unavailability.

Will try the recovery procedure, thanks again!

opfocus · 2023-10-04T14:42:53Z

Will try the recovery procedure, thanks again!

hope to see your fix feedback here.

sbvegan · 2023-10-04T14:47:16Z

I didn't however expect that to happen in my setup, as I have stop_grace_period: 30s in the docker-compose service definition, which I assumed to be plenty on a fast machine.

I'll default to @protolambda's opinion, but I think you're going to want to have a much longer grace period. Something like 20 minutes to be safe.

dandavid3000 · 2023-10-04T22:13:13Z

Geth needs to persist in-memory state on shutdown, at least with the hash-based DB format. They introduced an experimental new format, which we're working on adopting, but for now the hash-based DB format is more considered the more stable option.

After DB corruption because of force-shutdown what ends up happening is that the "state" (storage/account mutations) is missing for the latest few blocks, but the block data is there.

The corrupted geth node does not expose that the state is "missing", and ends up misleading the op-node into trying to build on top of the latest state, which is not there.

And so unless geth finishes the "regenerating state" phase (which it often does not, due to tricky assumptions around the corrupted DB) it gets stuck.

If you want to manually repair the DB, to avoid a sync from scratch, the steps to follow are:

Stop op-geth and op-node

Determine a block-number that a state-snapshot exists for by running op-geth db metadata --datadir=your-dir-here

Start op-geth

Run cast rpc debug_setHead hexnumberhere (RPC call to debug-namespace geth method, with single hex-number argument) where the hexnumberhere is something before the above snapshot number, to force geth to drop the stale blocks without state, to not mislead op-node again.

Wait for geth to complete (should be fast) and determine the latest block number (e.g. run cast block --rpc-url=http://op-geth:8545 latest)

Run a tool from the monorepo to correct the forkchoice of geth: go run ./op-wheel/cmd engine set-forkchoice --unsafe=X --safe=X --finalized=X --engine=http://op-geth:8551 --engine.jwt-secret=that_jwt_file_from_op_node.txt where X should be set to the latest block number.

Start op-node again. It should start syncing again.

It's not a pretty fix, but geth DB corruption is already a bad situation, and due to issues in geth we cannot automate this recovery process. Luckily the new Path-DB format is fixing this DB corruption issue, and we will be adopting these improvements from upstream geth as soon as we can.

@protolambda Could you please help me with some questions. I got the same issue from the author due to a power cut.

At step 2. I got this result. What number should I use to put in step 4 ?

simple-optimism-node-op-geth-1  | INFO [10-04|21:27:54.903] Maximum peer count                       ETH=50 LES=0 total=50
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:54.904] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:54.905] Set global gas cap                       cap=50,000,000
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:55.139] Using leveldb as the backing database 
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:55.140] Allocated cache and file handles         database=/geth/geth/chaindata cache=512.00MiB handles=524,288 readonly=true
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:55.644] Using LevelDB as the backing database 
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:55.644] Found legacy ancient chain path          location=/geth/geth/chaindata/ancient
simple-optimism-node-op-geth-1  | INFO [10-04|21:27:55.697] Opened ancient database                  database=/geth/geth/chaindata/ancient readonly=true
simple-optimism-node-op-geth-1  | +-------------------------+--------------------------------------------------------------------+
simple-optimism-node-op-geth-1  | |          FIELD          |                               VALUE                                |
simple-optimism-node-op-geth-1  | +-------------------------+--------------------------------------------------------------------+
simple-optimism-node-op-geth-1  | | databaseVersion         | 8 (0x8)                                                            |
simple-optimism-node-op-geth-1  | | headBlockHash           | 0x43e970bc8cbb5c8b035c1d4719b01aed1517389c32ff68657ad16a4358c49bb0 |
simple-optimism-node-op-geth-1  | | headFastBlockHash       | 0xf1b13813581ac504193e1c663897aaca189beaf51181aa50c18b66e2e9b89fd3 |
simple-optimism-node-op-geth-1  | | headHeaderHash          | 0xf1b13813581ac504193e1c663897aaca189beaf51181aa50c18b66e2e9b89fd3 |
simple-optimism-node-op-geth-1  | | lastPivotNumber         | <nil>                                                              |
simple-optimism-node-op-geth-1  | | len(snapshotSyncStatus) | 0 bytes                                                            |
simple-optimism-node-op-geth-1  | | snapshotDisabled        | false                                                              |
simple-optimism-node-op-geth-1  | | snapshotJournal         | 34 bytes                                                           |
simple-optimism-node-op-geth-1  | | snapshotRecoveryNumber  | <nil>                                                              |
simple-optimism-node-op-geth-1  | | snapshotRoot            | 0x6a5031570cc315f5b2816b3387d57e4201996fe974ac4888073df6aaab56ffd7 |
simple-optimism-node-op-geth-1  | | txIndexTail             | 108060460 (0x670df2c)                                              |
simple-optimism-node-op-geth-1  | | fastTxLookupLimit       | <nil>                                                              |
simple-optimism-node-op-geth-1  | | frozen                  | 110320460 items                                                    |
simple-optimism-node-op-geth-1  | | snapshotGenerator       | Done: false, Accounts: 19662, Slots:                               |
simple-optimism-node-op-geth-1  | |                         | 38039, Storage: 4056005, Marker: at                                |
simple-optimism-node-op-geth-1  | |                         | 0x0035ff3687b9ba7411e7427a575a3115a65b575b193279d17d253249505c5050 |
simple-optimism-node-op-geth-1  | | headBlock.Hash          | 0x43e970bc8cbb5c8b035c1d4719b01aed1517389c32ff68657ad16a4358c49bb0 |
simple-optimism-node-op-geth-1  | | headBlock.Root          | 0x6a5031570cc315f5b2816b3387d57e4201996fe974ac4888073df6aaab56ffd7 |
simple-optimism-node-op-geth-1  | | headBlock.Number        | 110381062 (0x6944806)                                              |
simple-optimism-node-op-geth-1  | | headHeader.Hash         | 0xf1b13813581ac504193e1c663897aaca189beaf51181aa50c18b66e2e9b89fd3 |
simple-optimism-node-op-geth-1  | | headHeader.Root         | 0x7c2379fd88278e0b28c9423a8a597cc2544ea9565a8a96cb28bd2e60e0b07a1c |
simple-optimism-node-op-geth-1  | | headHeader.Number       | 110410459 (0x694badb)                                              |
simple-optimism-node-op-geth-1  | +-------------------------+--------------------------------------------------------------------+

I tried to put this number and the return is null.

cast rpc debug_setHead 0x6944800
null

I checked the latest block and it seemed it changed but I'm not sure it's correct or not.

$ cast block --rpc-url=http://localhost:8545 latest


baseFeePerGas        50470473
difficulty           0
extraData            0x
gasLimit             30000000
gasUsed              17933671
hash                 0xea1cb8310377bb617b779978e526084d03e7479a393f27587c5acee52fe0132a
logsBloom            0x82069100b0085828201894a200820060610002a8000005e008064006582040000400180060804084004100900a10808802001048512024000418084082e560406210212804202588a05419698800804000c20816000c020608000020c02941100029138001e44008004040001800041091001104241101048d10803a0c2d041204020842980010410004820e08000702001010310860a050102400008864c81012004000040020301c00004488408010200220060001a86a00a200c4010814c484010102104b248018006044002008a480a0a2c2008103018000008240705081003286090c00040028001000220005108000122085a418505080200401231020
miner                0x4200000000000000000000000000000000000011
mixHash              0x0daccf3f428bddf710bcb9deb9f7bc0e223c828bb4cc36b318b24f9418f4b1b0
nonce                0x0000000000000000
number               110291757
parentHash           0x440df832462d686c6b0f1310f68ccc8389dc08ba718180442de43eb387327ed3
receiptsRoot         0x510f27ef2462ab58ecb9cd0168b72a607a20a707bdbe5765c71aa0215df44a48
sealFields           []
sha3Uncles           0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
size                 3980
stateRoot            0xb725a181eb60f09a78244e4008184a8064b7cdf13d637b6c541d999558dedcea
timestamp            1696182291
totalDifficulty      0
transactions:        [
	0xee9f43f01ccf9a136b99af35242e5b97bd2ecaf2092607c21edd34a3d9487f2a
	0x8ccc68b807bc29417d102001513e706c362bbe99ce15e68c0bccf45f8e4538bd
	0xda0c2eb564c7b05cebedf793575a1f0507b2ae48f40ae53564fa107fd16aa6aa
	0x5f1e9d3f9e530ad0d0318b96c4292306b65861ab3b8a5b01c4b36b918dce61b0
	0x62053008afc46d5a67d0d413f9c3a8bffa57589d953c41c789e08aab2a7f273e
	0x98a3e45211e60b361dde01ca7c0aa8ff3be8ed7dfb4c5d46e9dfbf32c9b8fd90
	0x1647b6a5b759566c8796e918643f059f9ed37b355ace61e7a58c90c0ce74c48e
	0x43a1bf2ccce3dbfda09bf6af8c971ccc8ebb9839bac0b4ad1465cf8c2a62a22a
]

I cannot run the last step to finish. It always returns me an error code.

go run ./op-wheel/cmd/main.go engine set-forkchoice --unsafe=110291757 --safe=110291757 --finalized=110291757 --engine=http://localhost:8551 --engine.jwt-secret=/media/dvo/0E20ECBA42049AE22/jwt.txt
No help topic for 'set-forkchoice'
exit status 3

Could you please help me ? I'm new to this. Thanks a lot.

opfocus · 2023-10-06T04:59:31Z

Try converting 110291757 to a hexadecimal string and run it under optimism path

opfocus · 2023-10-10T12:01:00Z

Completed a repair as per the instructions, and I wanted to share the process here. I might provide more test results (currently testing for stability and efforts to reduce data loss). @sbvegan @protolambda @dandavid3000
Error
INFO [10-09|15:11:18.410] Loaded most recent local block number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:18.410] Loaded most recent local finalized block number=4,180,380 hash=b57567..eb20a6 td=0 age=8mo3w6d WARN [10-09|15:11:18.411] Head state missing, repairing number=4,180,757 hash=9cae61..6156d3 snaproot=9acd96..b8173f INFO [10-09|15:11:19.917] Loaded most recent local header number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local block number=4,173,600 hash=29f99c..906b4b td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local snap block number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local finalized block number=4,180,380 hash=b57567..eb20a6 td=0 age=8mo3w6d WARN [10-09|15:11:19.917] Enabling snapshot recovery chainhead=4,173,600 diskbase=4,173,600 INFO [10-09|15:11:19.917] Initialising Ethereum protocol network=420 dbversion=8 INFO [10-09|15:11:19.918] Loaded local transaction journal transactions=0 dropped=0 INFO [10-09|15:11:19.918] Regenerated local transaction journal transactions=0 accounts=0 INFO [10-09|15:11:19.918] Chain post-merge, sync via beacon client INFO [10-09|15:11:19.918] Gasprice oracle is ignoring threshold set threshold=2 WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=23h40m7s WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=17h47m35s WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=3m31s WARN [10-09|15:11:19.918] Engine API enabled protocol=eth INFO [10-09|15:11:19.919] Starting peer-to-peer node instance=Geth/v0.1.0-unstable-b84ba119-20230914/linux-amd64/go1.20

……..

WARN [10-10|04:01:59.289] State not available, ignoring new payload WARN [10-10|04:01:59.373] State not available, ignoring new payload WARN [10-10|04:01:59.507] State not available, ignoring new payload WARN [10-10|04:01:59.545] State not available, ignoring new payload WARN [10-10|04:01:59.767] State not available, ignoring new payload WARN [10-10|04:01:59.788] State not available, ignoring new payload WARN [10-10|04:01:59.852] State not available, ignoring new payload WARN [10-10|04:01:59.985] State not available, ignoring new payload

op-node log

WARN [10-10|04:01:29.099] did not finish previous block building, starting new building now prev_onto=8d572c..2c55e4:4173601 prev_payload_id=0x232689dee9c7b372 new_onto=8d572c..2c55e4:4173601 WARN [10-10|04:01:29.103] Derivation process temporary error attempts=240,424 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x8d572cfa21d619216630a433789676ac1104bcc9f99f384f765a55f4c62c55e4:4173601, id: 0x232689dee9c7b372, error (1): execution payload cannot be validated yet, latest valid hash is <nil>" WARN [10-10|04:01:29.154] did not finish previous block building, starting new building now prev_onto=8d572c..2c55e4:4173601 prev_payload_id=0x232689dee9c7b372 new_onto=8d572c..2c55e4:4173601 WARN [10-10|04:01:29.157] Derivation process temporary error attempts=240,425 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x8d572cfa21d619216630a433789676ac1104bcc9f99f384f765a55f4c62c55e4:4173601, id: 0x232689dee9c7b372, error (1): execution payload cannot be validated yet, latest valid hash is <nil>"

7.Fix process
1. Stop op-geth and op-node
2. Determine a block-number that a state-snapshot exists for by running op-geth db metadata --datadir=your-dir-here
  
  root@simple:~/op-geth# ./build/bin/geth db metadata --datadir=datadir3
  INFO [10-10|04:26:42.225] Maximum peer count ETH=50 LES=0 total=50
  INFO [10-10|04:26:42.225] Smartcard socket not found, disabling err="stat /run/pcscd/pcscd.comm: no such file or directory"
  INFO [10-10|04:26:42.227] Set global gas cap cap=50,000,000
  INFO [10-10|04:26:42.227] Initializing the KZG library backend=gokzg
  INFO [10-10|04:26:42.331] Using leveldb as the backing database
  INFO [10-10|04:26:42.331] Allocated cache and file handles database=/root/op-geth/datadir3/geth/chaindata cache=512.00MiB handles=524,288 readonly=true
  INFO [10-10|04:26:42.348] Using LevelDB as the backing database
  INFO [10-10|04:26:42.348] Found legacy ancient chain path location=/root/op-geth/datadir3/geth/chaindata/ancient
  INFO [10-10|04:26:42.349] Opened ancient database database=/root/op-geth/datadir3/geth/chaindata/ancient readonly=true
  +-------------------------+--------------------------------------------------------------------+
  | FIELD | VALUE |
  +-------------------------+--------------------------------------------------------------------+
  | databaseVersion | 8 (0x8) |
  | headBlockHash | 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b |
  | headFastBlockHash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
  | headHeaderHash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
  | lastPivotNumber | |
  | len(snapshotSyncStatus) | 0 bytes |
  | snapshotDisabled | false |
  | snapshotJournal | 34 bytes |
  | snapshotRecoveryNumber | 4173600 (0x3faf20) |
  | snapshotRoot | 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f |
  | txIndexTail | 1830758 (0x1bef66) |
  | fastTxLookupLimit | |
  | frozen | 4173601 items |
  | snapshotGenerator | Done: true, Accounts: 0, |
  | | Slots: 0, Storage: 0, Marker: |
  | headBlock.Hash | 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b |
  | headBlock.Root | 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f |
  | headBlock.Number | 4173600 (0x3faf20) |
  | headHeader.Hash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
  | headHeader.Root | 0xd62740e9b49f5717b7b63dcc30f53443d7bdc13491daac21ff6e020628cb6f2a |
  | headHeader.Number | 4180757 (0x3fcb15) |
  +-------------------------+--------------------------------------------------------------------+
3. Start op-geth
4. Run cast rpc debug_setHead hexnumberhere (RPC call to debug-namespace geth method, with single hex-number argument) where the hexnumberhere is something before the above snapshot number, to force geth to drop the stale blocks without state, to not mislead op-node again.
  
  root@simple:~/op-geth/build/bin# ./geth attach ipc:/root/op-geth/datadir3/geth.ipc Welcome to the Geth JavaScript console!
  
  instance: Geth/v0.1.0-unstable-b84ba119-20230914/linux-amd64/go1.20 at block: 4173600 (Sun Jan 15 2023 09:34:28 GMT+0000 (UTC)) datadir: /root/op-geth/datadir3 modules: admin:1.0 debug:1.0 engine:1.0 eth:1.0 miner:1.0 net:1.0 rpc:1.0 txpool:1.0 web3:1.0
  
  To exit, press ctrl-d or type exit
  
  debug.setHead("0x3faf20") null
  
  op-geth log:
  
  WARN [10-10|04:31:07.212] Served eth_coinbase reqid=3 duration="111.931µs" err="etherbase must be explicitly specified" WARN [10-10|04:35:36.014] Rewinding blockchain to block target=4,173,600 WARN [10-10|04:35:36.916] SetHead invalidated safe block ERROR[10-10|04:35:36.916] SetHead invalidated finalized block INFO [10-10|04:35:36.917] Loaded most recent local block number=4,173,600 hash=29f99c..906b4b td=0 age=8mo3w6d
5. Wait for geth to complete (should be fast) and determine the latest block number (e.g. run cast block --rpc-url=http://op-geth:8545 latest)
  
  root@simple:~# cast block --rpc-url=http://localhost:8845 latest
  
  baseFeePerGas 49
  difficulty 0
  extraData 0x
  gasLimit 25000000
  gasUsed 0
  hash 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b
  logsBloom 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
  miner 0x4200000000000000000000000000000000000011
  mixHash 0x9caa3c8460c2c003d140ae42b9bc7279b64a501f260ec888a9e17b865c5c92a6
  nonce 0x0000000000000000
  number 4173600
  parentHash 0x587ece9e9cd9b0534134ae8403edc2c898426db7786528c888d59137eeec0466
  receiptsRoot 0x3fd774ff8fe515813d7a8f9d3748e58e857bc002823a458a93be90a3bc2e0894
  sealFields []
  sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
  size 868
  stateRoot 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f
  timestamp 1673775268
  withdrawalsRoot
  totalDifficulty 0
  transactions: [
  0xe1ad078d5ced638eb56b4647e3725d972a2352625d215788fe65e939d1877c7a
  ]
6. Run a tool from the monorepo to correct the forkchoice of geth: go run ./op-wheel/cmd engine set-forkchoice --unsafe=X --safe=X --finalized=X --engine=http://op-geth:8551 --engine.jwt-secret=that_jwt_file_from_op_node.txt where X should be set to the latest block number.
  
  root@simple:~/optimism# go run ./op-wheel/cmd engine set-forkchoice --unsafe='0x3faf20' --safe='0x3faf20' --finalized='0x3faf20' --engine=ws://localhost:8551 --engine.jwt-secret=/root/op-geth/jwt.txt
7. Start op-node again.
  
  op-node log:
  
  INFO [10-10|04:53:08.776] Sync progress reason="processed safe block derived from L1" l2_finalized=29f99c..906b4b:4173600 l2_safe=81c235..3cfa31:4173705 l2_unsafe=81c235..3cfa31:4173705 l2_engineSyncTarget=81c235..3cfa31:4173705 l2_time=1,673,775,478 l1_derived=a47e94..75898f:8314898
  
  op-geth log:
  
  INFO [10-10|04:53:51.197] Imported new potential chain segment number=4,173,870 hash=76084e..8553d2 blocks=1 txs=1 mgas=0.000 elapsed=1.237ms mgasps=0.000 age=8mo3w6d dirty=1.62MiB INFO [10-10|04:53:51.198] Chain head was updated number=4,173,870 hash=76084e..8553d2 root=07412c..78a11d elapsed="150.322µs" age=8mo3w6d

opfocus · 2023-10-16T06:01:34Z

Overview：After completing the aforementioned fix process, it has been observed for 7 days. @sbvegan @protolambda

Within 30 hours after the first fix, an unexplained crash was observed, pointing to the same error as before the fix.

op-node log:

WARN [10-12|05:50:12.400] Derivation process temporary error attempts=680,718 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x573d201c442d663b9d262308e371503c9314582837a02a22d7f8ef9c3e0eb11a:4487626, id: 0x6e4ec0e089c0f645, error (1): execution payload cannot be validated yet, latest valid hash is <nil>”

op-geth log:

INFO [10-12|06:11:52.673] Loaded most recent local block number=4,487,625 hash=486cf4..ba17ff td=0 age=8mo3w1d INFO [10-12|06:11:52.674] Loaded most recent local finalized block number=4,487,625 hash=486cf4..ba17ff td=0 age=8mo3w1d INFO [10-12|06:11:52.688] Initialising Ethereum protocol network=420 dbversion=8 INFO [10-12|06:11:52.688] Loaded local transaction journal transactions=0 dropped=0 INFO [10-12|06:11:52.688] Regenerated local transaction journal transactions=0 accounts=0 INFO [10-12|06:11:52.689] Chain post-merge, sync via beacon client INFO [10-12|06:11:52.689] Gasprice oracle is ignoring threshold set threshold=2 WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=3d14h40m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=3d8h48m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=2d15h4m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-09T15:12:19+0000 age=2d14h59m ….

WARN [10-12|06:03:38.257] State not available, ignoring new payload WARN [10-12|06:03:40.467] State not available, ignoring new payload WARN [10-12|06:03:44.607] State not available, ignoring new payload WARN [10-12|06:03:52.858] State not available, ignoring new payload
A second fix was performed following the same process. It is concerning that the snapshotRecoveryNumber is the same as it was during the first fix, which is disappointing.
1. root@simple:~/op-geth# ./build/bin/geth db metadata --datadir=datadir3
  
  INFO [10-12|06:07:43.565] Using leveldb as the backing database
  INFO [10-12|06:07:43.565] Allocated cache and file handles database=/root/op-geth/datadir3/geth/chaindata cache=512.00MiB handles=524,288 readonly=true
  INFO [10-12|06:07:43.589] Using LevelDB as the backing database
  INFO [10-12|06:07:43.589] Found legacy ancient chain path location=/root/op-geth/datadir3/geth/chaindata/ancient
  INFO [10-12|06:07:43.590] Opened ancient database database=/root/op-geth/datadir3/geth/chaindata/ancient readonly=true
  +-------------------------+--------------------------------------------------------------------+
  | FIELD | VALUE |
  +-------------------------+--------------------------------------------------------------------+
  | databaseVersion | 8 (0x8) |
  | headBlockHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
  | headFastBlockHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
  | headHeaderHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
  | lastPivotNumber | |
  | len(snapshotSyncStatus) | 0 bytes |
  | snapshotDisabled | false |
  | snapshotJournal | 377790 bytes |
  | snapshotRecoveryNumber | 4173600 (0x3faf20) |
  | snapshotRoot | 0xdc0d84bb7d6d8c923df1965115e702f63ebc49368830024ac5c069e38511d0ab |
  | txIndexTail | 2137626 (0x209e1a) |
  | fastTxLookupLimit | |
  | frozen | 4397626 items |
  | snapshotGenerator | Done: true, Accounts: 0, |
  | | Slots: 0, Storage: 0, Marker: |
  | headBlock.Hash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
  | headBlock.Root | 0xf94beb1ad6a20863648a225724dbb7b46ad5db92a4faa1d5fa2ea48c7830cd41 |
  | headBlock.Number | 4487625 (0x4479c9) |
  | headHeader.Hash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
  | headHeader.Root | 0xf94beb1ad6a20863648a225724dbb7b46ad5db92a4faa1d5fa2ea48c7830cd41 |
  | headHeader.Number | 4487625 (0x4479c9) |
  +-------------------------+--------------------------------------------------------------------+
2. root@simple:~/op-geth/build/bin# cast block --rpc-url=http://localhost:8845 latest
  
  baseFeePerGas 49
  difficulty 0
  extraData 0x
  gasLimit 25000000
  gasUsed 0
  hash 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b
  logsBloom 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
  miner 0x4200000000000000000000000000000000000011
  mixHash 0x9caa3c8460c2c003d140ae42b9bc7279b64a501f260ec888a9e17b865c5c92a6
  nonce 0x0000000000000000
  number 4173600
  parentHash 0x587ece9e9cd9b0534134ae8403edc2c898426db7786528c888d59137eeec0466
  receiptsRoot 0x3fd774ff8fe515813d7a8f9d3748e58e857bc002823a458a93be90a3bc2e0894
  sealFields []
  sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
  size 868
  stateRoot 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f
  timestamp 1673775268
  withdrawalsRoot
  totalDifficulty 0
  transactions: [
  0xe1ad078d5ced638eb56b4647e3725d972a2352625d215788fe65e939d1877c7a
  ]
3. debug.setHead("0x3faf20")
4. root@simple:~/optimism# go run ./op-wheel/cmd engine set-forkchoice --unsafe='0x3faf20' --safe='0x3faf20' --finalized='0x3faf20' --engine=ws://localhost:8551 --engine.jwt-secret=/root/op-geth/jwt.txt
It ran steadily for 5 days, and the observation will conclude at this point.

op-node log：

[10-16|04:56:21.124] generated attributes in payload queue txs=7 timestamp=1,677,051,178
INFO [10-16|04:56:21.156] inserted block hash=be5536..0be407 number=5,811,555 state_root=b1ff17..64bfdd timestamp=1,677,051,178 parent=10da9b..ca3bff prev_randao=f7b1b8..5a8dfb fee_recipient=0x4200000000000000000000000000000000000011 txs=7 update_safe=true
INFO [10-16|04:56:21.156] Sync progress reason="processed safe block derived from L1" l2_finalized=5c3c34..521cff:5811144 l2_safe=be5536..0be407:5811555 l2_unsafe=be5536..0be407:5811555 l2_engineSyncTarget=be5536..0be407:5811555 l2_time=1,677,051,178 l1_derived=9dcf9b..b5deda:8535171

op-geth log：

INFO [10-16|04:56:55.725] Chain head was updated number=5,811,718 hash=cf1288..74b593 root=7a93b3..2cf0c3 elapsed="367.884µs" age=7mo3w4d
INFO [10-16|04:56:55.739] Starting work on payload id=0x111d45ac0b2b982d
INFO [10-16|04:56:55.753] Imported new potential chain segment number=5,811,719 hash=10e1a8..bb9ede blocks=1 txs=2 mgas=0.959 elapsed=11.974ms mgasps=80.054 age=7mo3w4d dirty=433.91MiB

Additional Data
1. INFO [10-16|04:58:45.614] Loaded most recent local block number=5,811,719 hash=10e1a8..bb9ede td=0 age=7mo3w4d
  INFO [10-16|04:58:45.614] Loaded most recent local finalized block number=5,811,623 hash=98a015..0bba12 td=0 age=7mo3w4d
  INFO [10-16|04:58:45.639] Initialising Ethereum protocol network=420 dbversion=8
  INFO [10-16|04:58:45.639] Loaded local transaction journal transactions=0 dropped=0
  INFO [10-16|04:58:45.639] Regenerated local transaction journal transactions=0 accounts=0
  INFO [10-16|04:58:45.640] Chain post-merge, sync via beacon client
  INFO [10-16|04:58:45.640] Gasprice oracle is ignoring threshold set threshold=2
  WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=1w13h27m
  WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=1w7h35m
  WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=6d13h50m
  WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-09T15:12:19+0000 age=6d13h46m
  ….
2. root@simple:~/op-geth/build/bin# cast block --rpc-url=http://localhost:8845 latest
  
  baseFeePerGas 49
  difficulty 0
  extraData 0x
  gasLimit 25000000
  gasUsed 958609
  hash 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede
  logsBloom 0x0010010000000000000000001020000200000000000010000000c010000010000002000003400000004800000000000000000820000006000080000000000000080000020000000000000008000020000000000040408000000000000400000000000000000800000000000000000000008100000000140000200014200010020000040000000010000020000000800004004000000002001000000000000000200000100400400000048000000000000000000002000001020040000000008000000002000100000180000000000008100104010000600040010002000000000000000000800000000000004004000000008000000004000100000000060000
  miner 0x4200000000000000000000000000000000000011
  mixHash 0x6600a69dce79497b41ec85fdb37859c83d51b98a38fc783a1d1ac02e6348613d
  nonce 0x0000000000000000
  number 5811719
  parentHash 0xcf128813e1e4083bc08e3802d62451b53147a71a7bd47cbe136587369274b593
  receiptsRoot 0x6ebecbf97c99cab25fd3f2c9a356b985640404eb51930f04ef6d15c11721530c
  sealFields []
  sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
  size 1174
  stateRoot 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4
  timestamp 1677051506
  withdrawalsRoot
  totalDifficulty 0
  transactions: [
  0x95f9ee962bce647aa84f43a07e5ed5ead145703388a6b19e3387f9f24b240da6
  0xa0f0bfb2900a3dbacd4af7c0749f91ba02110e6afd55df8cdab1d291f1eb0272
  ]
3. +-------------------------+--------------------------------------------------------------------+
  | FIELD | VALUE |
  +-------------------------+--------------------------------------------------------------------+
  | databaseVersion | 8 (0x8) |
  | headBlockHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
  | headFastBlockHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
  | headHeaderHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
  | lastPivotNumber | |
  | len(snapshotSyncStatus) | 0 bytes |
  | snapshotDisabled | false |
  | snapshotJournal | 459578 bytes |
  | snapshotRecoveryNumber | |
  | snapshotRoot | 0xa304ddc535a1722fb593a7cf426c3355dc115c8f371203672e083a572bbba98d |
  | txIndexTail | 3461720 (0x34d258) |
  | fastTxLookupLimit | |
  | frozen | 5721720 items |
  | snapshotGenerator | Done: true, Accounts: 0, |
  | | Slots: 0, Storage: 0, Marker: |
  | headBlock.Hash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
  | headBlock.Root | 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4 |
  | headBlock.Number | 5811719 (0x58ae07) |
  | headHeader.Hash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
  | headHeader.Root | 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4 |
  | headHeader.Number | 5811719 (0x58ae07) |
  +-------------------------+--------------------------------------------------------------------+

opfocus · 2023-10-16T06:05:15Z

Some questions: Regarding the snapshotRecoveryNumber, sometimes it remains unchanged for extended periods, and at times, querying it shows 'nil' (even though the node can sync properly after a restart).

dandavid3000 · 2023-10-16T17:25:54Z

Completed a repair as per the instructions, and I wanted to share the process here. I might provide more test results (currently testing for stability and efforts to reduce data loss). @sbvegan @protolambda @dandavid3000 Error INFO [10-09|15:11:18.410] Loaded most recent local block number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:18.410] Loaded most recent local finalized block number=4,180,380 hash=b57567..eb20a6 td=0 age=8mo3w6d WARN [10-09|15:11:18.411] Head state missing, repairing number=4,180,757 hash=9cae61..6156d3 snaproot=9acd96..b8173f INFO [10-09|15:11:19.917] Loaded most recent local header number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local block number=4,173,600 hash=29f99c..906b4b td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local snap block number=4,180,757 hash=9cae61..6156d3 td=0 age=8mo3w6d INFO [10-09|15:11:19.917] Loaded most recent local finalized block number=4,180,380 hash=b57567..eb20a6 td=0 age=8mo3w6d WARN [10-09|15:11:19.917] Enabling snapshot recovery chainhead=4,173,600 diskbase=4,173,600 INFO [10-09|15:11:19.917] Initialising Ethereum protocol network=420 dbversion=8 INFO [10-09|15:11:19.918] Loaded local transaction journal transactions=0 dropped=0 INFO [10-09|15:11:19.918] Regenerated local transaction journal transactions=0 accounts=0 INFO [10-09|15:11:19.918] Chain post-merge, sync via beacon client INFO [10-09|15:11:19.918] Gasprice oracle is ignoring threshold set threshold=2 WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=23h40m7s WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=17h47m35s WARN [10-09|15:11:19.918] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=3m31s WARN [10-09|15:11:19.918] Engine API enabled protocol=eth INFO [10-09|15:11:19.919] Starting peer-to-peer node instance=Geth/v0.1.0-unstable-b84ba119-20230914/linux-amd64/go1.20

……..

WARN [10-10|04:01:59.289] State not available, ignoring new payload WARN [10-10|04:01:59.373] State not available, ignoring new payload WARN [10-10|04:01:59.507] State not available, ignoring new payload WARN [10-10|04:01:59.545] State not available, ignoring new payload WARN [10-10|04:01:59.767] State not available, ignoring new payload WARN [10-10|04:01:59.788] State not available, ignoring new payload WARN [10-10|04:01:59.852] State not available, ignoring new payload WARN [10-10|04:01:59.985] State not available, ignoring new payload

op-node log

WARN [10-10|04:01:29.099] did not finish previous block building, starting new building now prev_onto=8d572c..2c55e4:4173601 prev_payload_id=0x232689dee9c7b372 new_onto=8d572c..2c55e4:4173601 WARN [10-10|04:01:29.103] Derivation process temporary error attempts=240,424 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x8d572cfa21d619216630a433789676ac1104bcc9f99f384f765a55f4c62c55e4:4173601, id: 0x232689dee9c7b372, error (1): execution payload cannot be validated yet, latest valid hash is <nil>" WARN [10-10|04:01:29.154] did not finish previous block building, starting new building now prev_onto=8d572c..2c55e4:4173601 prev_payload_id=0x232689dee9c7b372 new_onto=8d572c..2c55e4:4173601 WARN [10-10|04:01:29.157] Derivation process temporary error attempts=240,425 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x8d572cfa21d619216630a433789676ac1104bcc9f99f384f765a55f4c62c55e4:4173601, id: 0x232689dee9c7b372, error (1): execution payload cannot be validated yet, latest valid hash is <nil>"

7.Fix process

Stop op-geth and op-node

Determine a block-number that a state-snapshot exists for by running op-geth db metadata --datadir=your-dir-here
root@simple:~/op-geth# ./build/bin/geth db metadata --datadir=datadir3
INFO [10-10|04:26:42.225] Maximum peer count ETH=50 LES=0 total=50
INFO [10-10|04:26:42.225] Smartcard socket not found, disabling err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [10-10|04:26:42.227] Set global gas cap cap=50,000,000
INFO [10-10|04:26:42.227] Initializing the KZG library backend=gokzg
INFO [10-10|04:26:42.331] Using leveldb as the backing database
INFO [10-10|04:26:42.331] Allocated cache and file handles database=/root/op-geth/datadir3/geth/chaindata cache=512.00MiB handles=524,288 readonly=true
INFO [10-10|04:26:42.348] Using LevelDB as the backing database
INFO [10-10|04:26:42.348] Found legacy ancient chain path location=/root/op-geth/datadir3/geth/chaindata/ancient
INFO [10-10|04:26:42.349] Opened ancient database database=/root/op-geth/datadir3/geth/chaindata/ancient readonly=true
+-------------------------+--------------------------------------------------------------------+
| FIELD | VALUE |
+-------------------------+--------------------------------------------------------------------+
| databaseVersion | 8 (0x8) |
| headBlockHash | 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b |
| headFastBlockHash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
| headHeaderHash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
| lastPivotNumber | |
| len(snapshotSyncStatus) | 0 bytes |
| snapshotDisabled | false |
| snapshotJournal | 34 bytes |
| snapshotRecoveryNumber | 4173600 (0x3faf20) |
| snapshotRoot | 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f |
| txIndexTail | 1830758 (0x1bef66) |
| fastTxLookupLimit | |
| frozen | 4173601 items |
| snapshotGenerator | Done: true, Accounts: 0, |
| | Slots: 0, Storage: 0, Marker: |
| headBlock.Hash | 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b |
| headBlock.Root | 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f |
| headBlock.Number | 4173600 (0x3faf20) |
| headHeader.Hash | 0x9cae61ae0c7030e91448b3d6214740cea54b89d81363e555b7d57daedb6156d3 |
| headHeader.Root | 0xd62740e9b49f5717b7b63dcc30f53443d7bdc13491daac21ff6e020628cb6f2a |
| headHeader.Number | 4180757 (0x3fcb15) |
+-------------------------+--------------------------------------------------------------------+

Start op-geth

Run cast rpc debug_setHead hexnumberhere (RPC call to debug-namespace geth method, with single hex-number argument) where the hexnumberhere is something before the above snapshot number, to force geth to drop the stale blocks without state, to not mislead op-node again.
root@simple:~/op-geth/build/bin# ./geth attach ipc:/root/op-geth/datadir3/geth.ipc Welcome to the Geth JavaScript console!
instance: Geth/v0.1.0-unstable-b84ba119-20230914/linux-amd64/go1.20 at block: 4173600 (Sun Jan 15 2023 09:34:28 GMT+0000 (UTC)) datadir: /root/op-geth/datadir3 modules: admin:1.0 debug:1.0 engine:1.0 eth:1.0 miner:1.0 net:1.0 rpc:1.0 txpool:1.0 web3:1.0
To exit, press ctrl-d or type exit

debug.setHead("0x3faf20") null

op-geth log:
WARN [10-10|04:31:07.212] Served eth_coinbase reqid=3 duration="111.931µs" err="etherbase must be explicitly specified" WARN [10-10|04:35:36.014] Rewinding blockchain to block target=4,173,600 WARN [10-10|04:35:36.916] SetHead invalidated safe block ERROR[10-10|04:35:36.916] SetHead invalidated finalized block INFO [10-10|04:35:36.917] Loaded most recent local block number=4,173,600 hash=29f99c..906b4b td=0 age=8mo3w6d

Wait for geth to complete (should be fast) and determine the latest block number (e.g. run cast block --rpc-url=http://op-geth:8545 latest)
root@simple:~# cast block --rpc-url=http://localhost:8845 latest
baseFeePerGas 49
difficulty 0
extraData 0x
gasLimit 25000000
gasUsed 0
hash 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b
logsBloom 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
miner 0x4200000000000000000000000000000000000011
mixHash 0x9caa3c8460c2c003d140ae42b9bc7279b64a501f260ec888a9e17b865c5c92a6
nonce 0x0000000000000000
number 4173600
parentHash 0x587ece9e9cd9b0534134ae8403edc2c898426db7786528c888d59137eeec0466
receiptsRoot 0x3fd774ff8fe515813d7a8f9d3748e58e857bc002823a458a93be90a3bc2e0894
sealFields []
sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
size 868
stateRoot 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f
timestamp 1673775268
withdrawalsRoot
totalDifficulty 0
transactions: [
0xe1ad078d5ced638eb56b4647e3725d972a2352625d215788fe65e939d1877c7a
]

Run a tool from the monorepo to correct the forkchoice of geth: go run ./op-wheel/cmd engine set-forkchoice --unsafe=X --safe=X --finalized=X --engine=http://op-geth:8551 --engine.jwt-secret=that_jwt_file_from_op_node.txt where X should be set to the latest block number.
root@simple:~/optimism# go run ./op-wheel/cmd engine set-forkchoice --unsafe='0x3faf20' --safe='0x3faf20' --finalized='0x3faf20' --engine=ws://localhost:8551 --engine.jwt-secret=/root/op-geth/jwt.txt

Start op-node again.
op-node log:
INFO [10-10|04:53:08.776] Sync progress reason="processed safe block derived from L1" l2_finalized=29f99c..906b4b:4173600 l2_safe=81c235..3cfa31:4173705 l2_unsafe=81c235..3cfa31:4173705 l2_engineSyncTarget=81c235..3cfa31:4173705 l2_time=1,673,775,478 l1_derived=a47e94..75898f:8314898
op-geth log:
INFO [10-10|04:53:51.197] Imported new potential chain segment number=4,173,870 hash=76084e..8553d2 blocks=1 txs=1 mgas=0.000 elapsed=1.237ms mgasps=0.000 age=8mo3w6d dirty=1.62MiB INFO [10-10|04:53:51.198] Chain head was updated number=4,173,870 hash=76084e..8553d2 root=07412c..78a11d elapsed="150.322µs" age=8mo3w6d

Thanks for the instructions. I tried your method many times and I did not succeed. Op-node keeps saying "Walking back .....etc.." to a really old eth tx that was too far away from the current one. It never stopped because the last command to finish the process always gave me a return error.

I ended up start a fresh node. It maybe long but better and I will keep backing up the database once every 2 weeks.

opfocus · 2023-10-16T17:33:35Z

Thanks for the instructions. I tried your method many times and I did not succeed. Op-node keeps saying "Walking back .....etc.." to a really old eth tx that was too far away from the current one. It never stopped because the last command to finish the process always gave me a return error.

I ended up start a fresh node. It maybe long but better and I will keep backing up the database once every 2 weeks.

Walking back.. is not bad, it takes time and I believe there is value in waiting. It is a good habit to backup your data regularly

sbvegan · 2023-11-03T21:40:54Z

@kaliubuntu0206 can you explain what this does?

--state.scheme value                                                   ($GETH_STATE_SCHEME)
          Scheme to use for storing ethereum state ('hash' or 'path')

rayn316 · 2023-11-29T07:54:03Z

After synchronizing using the --state.scheme=path parameter, does it mean that the node will automatically prune the data?

sbvegan · 2023-11-29T16:48:08Z

I don't think the path option is supported on op-geth yet. I don't know when/if the default database will be updated. cc @protolambda

protolambda · 2023-12-01T12:56:29Z

path is supported, but the datadir that op-goerli and op-mainnet start with is not in that format. Other op-stack chains, which start Bedrock from genesis, can be used with the path setting. We are working on making it available to op-mainnet and op-goerli as well.

algtm · 2023-12-19T09:09:07Z

Overview：After completing the aforementioned fix process, it has been observed for 7 days. @sbvegan @protolambda

Within 30 hours after the first fix, an unexplained crash was observed, pointing to the same error as before the fix.
op-node log:
WARN [10-12|05:50:12.400] Derivation process temporary error attempts=680,718 err="engine stage failed: temp: temporarily cannot insert new safe block: failed to complete building on top of L2 chain 0x573d201c442d663b9d262308e371503c9314582837a02a22d7f8ef9c3e0eb11a:4487626, id: 0x6e4ec0e089c0f645, error (1): execution payload cannot be validated yet, latest valid hash is <nil>”
op-geth log:
INFO [10-12|06:11:52.673] Loaded most recent local block number=4,487,625 hash=486cf4..ba17ff td=0 age=8mo3w1d INFO [10-12|06:11:52.674] Loaded most recent local finalized block number=4,487,625 hash=486cf4..ba17ff td=0 age=8mo3w1d INFO [10-12|06:11:52.688] Initialising Ethereum protocol network=420 dbversion=8 INFO [10-12|06:11:52.688] Loaded local transaction journal transactions=0 dropped=0 INFO [10-12|06:11:52.688] Regenerated local transaction journal transactions=0 accounts=0 INFO [10-12|06:11:52.689] Chain post-merge, sync via beacon client INFO [10-12|06:11:52.689] Gasprice oracle is ignoring threshold set threshold=2 WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=3d14h40m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=3d8h48m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=2d15h4m WARN [10-12|06:11:52.690] Unclean shutdown detected booted=2023-10-09T15:12:19+0000 age=2d14h59m ….
WARN [10-12|06:03:38.257] State not available, ignoring new payload WARN [10-12|06:03:40.467] State not available, ignoring new payload WARN [10-12|06:03:44.607] State not available, ignoring new payload WARN [10-12|06:03:52.858] State not available, ignoring new payload

A second fix was performed following the same process. It is concerning that the snapshotRecoveryNumber is the same as it was during the first fix, which is disappointing.

root@simple:~/op-geth# ./build/bin/geth db metadata --datadir=datadir3
INFO [10-12|06:07:43.565] Using leveldb as the backing database
INFO [10-12|06:07:43.565] Allocated cache and file handles database=/root/op-geth/datadir3/geth/chaindata cache=512.00MiB handles=524,288 readonly=true
INFO [10-12|06:07:43.589] Using LevelDB as the backing database
INFO [10-12|06:07:43.589] Found legacy ancient chain path location=/root/op-geth/datadir3/geth/chaindata/ancient
INFO [10-12|06:07:43.590] Opened ancient database database=/root/op-geth/datadir3/geth/chaindata/ancient readonly=true
+-------------------------+--------------------------------------------------------------------+
| FIELD | VALUE |
+-------------------------+--------------------------------------------------------------------+
| databaseVersion | 8 (0x8) |
| headBlockHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
| headFastBlockHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
| headHeaderHash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
| lastPivotNumber | |
| len(snapshotSyncStatus) | 0 bytes |
| snapshotDisabled | false |
| snapshotJournal | 377790 bytes |
| snapshotRecoveryNumber | 4173600 (0x3faf20) |
| snapshotRoot | 0xdc0d84bb7d6d8c923df1965115e702f63ebc49368830024ac5c069e38511d0ab |
| txIndexTail | 2137626 (0x209e1a) |
| fastTxLookupLimit | |
| frozen | 4397626 items |
| snapshotGenerator | Done: true, Accounts: 0, |
| | Slots: 0, Storage: 0, Marker: |
| headBlock.Hash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
| headBlock.Root | 0xf94beb1ad6a20863648a225724dbb7b46ad5db92a4faa1d5fa2ea48c7830cd41 |
| headBlock.Number | 4487625 (0x4479c9) |
| headHeader.Hash | 0x486cf4391a2a3b7a60716b4ba9a797845caad98f327b4ed0083259a0a7ba17ff |
| headHeader.Root | 0xf94beb1ad6a20863648a225724dbb7b46ad5db92a4faa1d5fa2ea48c7830cd41 |
| headHeader.Number | 4487625 (0x4479c9) |
+-------------------------+--------------------------------------------------------------------+

root@simple:~/op-geth/build/bin# cast block --rpc-url=http://localhost:8845 latest
baseFeePerGas 49
difficulty 0
extraData 0x
gasLimit 25000000
gasUsed 0
hash 0x29f99c38cd4d8299e979a7e6ed48192df3ab191abaf6c24b49e8a8f138906b4b
logsBloom 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
miner 0x4200000000000000000000000000000000000011
mixHash 0x9caa3c8460c2c003d140ae42b9bc7279b64a501f260ec888a9e17b865c5c92a6
nonce 0x0000000000000000
number 4173600
parentHash 0x587ece9e9cd9b0534134ae8403edc2c898426db7786528c888d59137eeec0466
receiptsRoot 0x3fd774ff8fe515813d7a8f9d3748e58e857bc002823a458a93be90a3bc2e0894
sealFields []
sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
size 868
stateRoot 0x9acd9600b19b8feb71c03c0d39976e5d5c84c973c5e6b65fd01ce7ff05b8173f
timestamp 1673775268
withdrawalsRoot
totalDifficulty 0
transactions: [
0xe1ad078d5ced638eb56b4647e3725d972a2352625d215788fe65e939d1877c7a
]

debug.setHead("0x3faf20")

root@simple:~/optimism# go run ./op-wheel/cmd engine set-forkchoice --unsafe='0x3faf20' --safe='0x3faf20' --finalized='0x3faf20' --engine=ws://localhost:8551 --engine.jwt-secret=/root/op-geth/jwt.txt

It ran steadily for 5 days, and the observation will conclude at this point.
op-node log：
[10-16|04:56:21.124] generated attributes in payload queue txs=7 timestamp=1,677,051,178
INFO [10-16|04:56:21.156] inserted block hash=be5536..0be407 number=5,811,555 state_root=b1ff17..64bfdd timestamp=1,677,051,178 parent=10da9b..ca3bff prev_randao=f7b1b8..5a8dfb fee_recipient=0x4200000000000000000000000000000000000011 txs=7 update_safe=true
INFO [10-16|04:56:21.156] Sync progress reason="processed safe block derived from L1" l2_finalized=5c3c34..521cff:5811144 l2_safe=be5536..0be407:5811555 l2_unsafe=be5536..0be407:5811555 l2_engineSyncTarget=be5536..0be407:5811555 l2_time=1,677,051,178 l1_derived=9dcf9b..b5deda:8535171
op-geth log：
INFO [10-16|04:56:55.725] Chain head was updated number=5,811,718 hash=cf1288..74b593 root=7a93b3..2cf0c3 elapsed="367.884µs" age=7mo3w4d
INFO [10-16|04:56:55.739] Starting work on payload id=0x111d45ac0b2b982d
INFO [10-16|04:56:55.753] Imported new potential chain segment number=5,811,719 hash=10e1a8..bb9ede blocks=1 txs=2 mgas=0.959 elapsed=11.974ms mgasps=80.054 age=7mo3w4d dirty=433.91MiB
Additional Data

INFO [10-16|04:58:45.614] Loaded most recent local block number=5,811,719 hash=10e1a8..bb9ede td=0 age=7mo3w4d
INFO [10-16|04:58:45.614] Loaded most recent local finalized block number=5,811,623 hash=98a015..0bba12 td=0 age=7mo3w4d
INFO [10-16|04:58:45.639] Initialising Ethereum protocol network=420 dbversion=8
INFO [10-16|04:58:45.639] Loaded local transaction journal transactions=0 dropped=0
INFO [10-16|04:58:45.639] Regenerated local transaction journal transactions=0 accounts=0
INFO [10-16|04:58:45.640] Chain post-merge, sync via beacon client
INFO [10-16|04:58:45.640] Gasprice oracle is ignoring threshold set threshold=2
WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-08T15:31:12+0000 age=1w13h27m
WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-08T21:23:44+0000 age=1w7h35m
WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-09T15:07:48+0000 age=6d13h50m
WARN [10-16|04:58:45.641] Unclean shutdown detected booted=2023-10-09T15:12:19+0000 age=6d13h46m
….

root@simple:~/op-geth/build/bin# cast block --rpc-url=http://localhost:8845 latest
baseFeePerGas 49
difficulty 0
extraData 0x
gasLimit 25000000
gasUsed 958609
hash 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede
logsBloom 0x0010010000000000000000001020000200000000000010000000c010000010000002000003400000004800000000000000000820000006000080000000000000080000020000000000000008000020000000000040408000000000000400000000000000000800000000000000000000008100000000140000200014200010020000040000000010000020000000800004004000000002001000000000000000200000100400400000048000000000000000000002000001020040000000008000000002000100000180000000000008100104010000600040010002000000000000000000800000000000004004000000008000000004000100000000060000
miner 0x4200000000000000000000000000000000000011
mixHash 0x6600a69dce79497b41ec85fdb37859c83d51b98a38fc783a1d1ac02e6348613d
nonce 0x0000000000000000
number 5811719
parentHash 0xcf128813e1e4083bc08e3802d62451b53147a71a7bd47cbe136587369274b593
receiptsRoot 0x6ebecbf97c99cab25fd3f2c9a356b985640404eb51930f04ef6d15c11721530c
sealFields []
sha3Uncles 0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347
size 1174
stateRoot 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4
timestamp 1677051506
withdrawalsRoot
totalDifficulty 0
transactions: [
0x95f9ee962bce647aa84f43a07e5ed5ead145703388a6b19e3387f9f24b240da6
0xa0f0bfb2900a3dbacd4af7c0749f91ba02110e6afd55df8cdab1d291f1eb0272
]

+-------------------------+--------------------------------------------------------------------+
| FIELD | VALUE |
+-------------------------+--------------------------------------------------------------------+
| databaseVersion | 8 (0x8) |
| headBlockHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
| headFastBlockHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
| headHeaderHash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
| lastPivotNumber | |
| len(snapshotSyncStatus) | 0 bytes |
| snapshotDisabled | false |
| snapshotJournal | 459578 bytes |
| snapshotRecoveryNumber | |
| snapshotRoot | 0xa304ddc535a1722fb593a7cf426c3355dc115c8f371203672e083a572bbba98d |
| txIndexTail | 3461720 (0x34d258) |
| fastTxLookupLimit | |
| frozen | 5721720 items |
| snapshotGenerator | Done: true, Accounts: 0, |
| | Slots: 0, Storage: 0, Marker: |
| headBlock.Hash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
| headBlock.Root | 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4 |
| headBlock.Number | 5811719 (0x58ae07) |
| headHeader.Hash | 0x10e1a84cdde4c8cb504a15c33f78820d70ad7689ca6fe4c6aa3b847c7cbb9ede |
| headHeader.Root | 0x1865470cb1a877684cd62b44148afb1634587fd28c176ff5b27406feb4f15be4 |
| headHeader.Number | 5811719 (0x58ae07) |
+-------------------------+--------------------------------------------------------------------+

I had a similar problem after prune state.Very nice and it worked!Thx!

protolambda mentioned this issue Oct 4, 2023

op-geth stuck at "State not available, ignoring new payload" forever #138

Closed

XWJACK mentioned this issue Feb 21, 2024

Long re-sync period after resetting mainnet nodes base-org/node#184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-geth stuck with "State not available" #130

op-geth stuck with "State not available" #130

d10r commented Sep 9, 2023

0xChupaCabra commented Sep 10, 2023

sbvegan commented Sep 11, 2023

d10r commented Sep 15, 2023 •

edited

Loading

eyooooo commented Oct 3, 2023

0xChupaCabra commented Oct 3, 2023

zainirfan13 commented Oct 4, 2023

imtipi commented Oct 4, 2023

zainirfan13 commented Oct 4, 2023 •

edited

Loading

d10r commented Oct 4, 2023

imtipi commented Oct 4, 2023

d10r commented Oct 4, 2023

imtipi commented Oct 4, 2023

protolambda commented Oct 4, 2023 •

edited

Loading

protolambda commented Oct 4, 2023

d10r commented Oct 4, 2023

opfocus commented Oct 4, 2023

sbvegan commented Oct 4, 2023

dandavid3000 commented Oct 4, 2023 •

edited

Loading

opfocus commented Oct 6, 2023 •

edited

Loading

opfocus commented Oct 10, 2023 •

edited

Loading

opfocus commented Oct 16, 2023

opfocus commented Oct 16, 2023

dandavid3000 commented Oct 16, 2023

opfocus commented Oct 16, 2023

sbvegan commented Nov 3, 2023

rayn316 commented Nov 29, 2023

sbvegan commented Nov 29, 2023

protolambda commented Dec 1, 2023

algtm commented Dec 19, 2023 •

edited

Loading

op-geth stuck with "State not available" #130

op-geth stuck with "State not available" #130

Comments

d10r commented Sep 9, 2023

0xChupaCabra commented Sep 10, 2023

sbvegan commented Sep 11, 2023

d10r commented Sep 15, 2023 • edited Loading

eyooooo commented Oct 3, 2023

0xChupaCabra commented Oct 3, 2023

zainirfan13 commented Oct 4, 2023

imtipi commented Oct 4, 2023

zainirfan13 commented Oct 4, 2023 • edited Loading

d10r commented Oct 4, 2023

imtipi commented Oct 4, 2023

d10r commented Oct 4, 2023

imtipi commented Oct 4, 2023

protolambda commented Oct 4, 2023 • edited Loading

protolambda commented Oct 4, 2023

d10r commented Oct 4, 2023

opfocus commented Oct 4, 2023

sbvegan commented Oct 4, 2023

dandavid3000 commented Oct 4, 2023 • edited Loading

opfocus commented Oct 6, 2023 • edited Loading

opfocus commented Oct 10, 2023 • edited Loading

opfocus commented Oct 16, 2023

opfocus commented Oct 16, 2023

dandavid3000 commented Oct 16, 2023

opfocus commented Oct 16, 2023

sbvegan commented Nov 3, 2023

rayn316 commented Nov 29, 2023

sbvegan commented Nov 29, 2023

protolambda commented Dec 1, 2023

algtm commented Dec 19, 2023 • edited Loading

d10r commented Sep 15, 2023 •

edited

Loading

zainirfan13 commented Oct 4, 2023 •

edited

Loading

protolambda commented Oct 4, 2023 •

edited

Loading

dandavid3000 commented Oct 4, 2023 •

edited

Loading

opfocus commented Oct 6, 2023 •

edited

Loading

opfocus commented Oct 10, 2023 •

edited

Loading

algtm commented Dec 19, 2023 •

edited

Loading