Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Invalid snapshot DiskSnapshot followed by replaying from genesis #4142

Closed
CarlosLopezDeLara opened this issue Jul 6, 2022 · 5 comments
Labels
1.35.1 Include in 1.35.1 bug Something isn't working comp: ledger era: babbage priority high issues/PRs that MUST be addressed. The release can't happen without this; priority: high Needs to be addressed as soon as possible, probably within current sprint. severity: medium Small defects that do not prevent any crucial functionality from working. tag: 1.35.1 type: bug Something is not working type: regression A feature that worked before stoped working type: upstream issue Corresponds to an upstream component (consensus, networking, ledger, etc.) user type: internal Created by an IOG employee Vasil

Comments

@CarlosLopezDeLara
Copy link
Contributor

Internal/External
Internal if an IOHK staff member.

Area
cardano-node/snapshot

Summary
On testnet, node sync 100% (babbage era), when restarting the node Disk Snapshots are invalid and node needs to replay from genesis.

[CLR:cardano.node.ChainDB:Info:5] [2022-07-06 04:30:21.72 UTC] Started opening Ledger DB 
[CLR:cardano.node.ChainDB:Error:5] [2022-07-06 04:30:22.72 UTC] Invalid snapshot DiskSnap
shot {dsNumber = 62643006, dsSuffix = Nothing}InitFailureRead (ReadFailed (DeserialiseFai
lure 31562794 "Decoding TxIx: too many bytes.")) 
[CLR:cardano.node.ChainDB:Error:5] [2022-07-06 04:30:23.69 UTC] Invalid snapshot DiskSnap
shot {dsNumber = 62642581, dsSuffix = Nothing}InitFailureRead (ReadFailed (DeserialiseFai
lure 31562794 "Decoding TxIx: too many bytes.")) 
[CLR:cardano.node.ChainDB:Info:5] [2022-07-06 04:30:23.69 UTC] Replaying ledger from genesis

Steps to reproduce
Steps to reproduce the behavior:

  1. Run a testnet node
  2. Sync to 100%
  3. Allow for at least 2 snapshots from babbage era (use --snapshot-interval to speed up snapshots). Alonzo snapshots work fine.
  4. Restart the node

Expected behavior
Snapshots should be valid,
On restart, the node should resume synchronization from the latest snapshot.

System info (please complete the following information):

cardano-cli 1.35.0 - linux-x86_64 - ghc-8.10
git rev 9f1d7dc

cardano-node 1.35.0 - linux-x86_64 - ghc-8.10
git rev 9f1d7dc

Tested on:

Operating System: Kubuntu 20.04
Kernel Version: 5.14.0-1042-oem
OS Type: 64-bit

AND

Mac OS X 10.15.7 (Build 19H1824)
Architecture: x86_64h

@CarlosLopezDeLara CarlosLopezDeLara added bug Something isn't working Vasil 1.35.1 Include in 1.35.1 priority high issues/PRs that MUST be addressed. The release can't happen without this; labels Jul 6, 2022
@CarlosLopezDeLara
Copy link
Contributor Author

Also reported here: #4128

@CarlosLopezDeLara
Copy link
Contributor Author

Resolved by IntersectMBO/cardano-ledger#2897

@jmalcolea
Copy link

node sync test - node0.json log:
{"app":[],"at":"2022-07-07T06:32:25.55Z","data":{"failure":"InitFailureRead (ReadFailed (DeserialiseFailure 31803211 "Decoding TxIx: too many bytes."))","kind":"TraceLedgerEvent.InvalidSnapshot","snapshot":{"kind":"snapshot"}},"env":"1.35.0:9f1d7","host":"hostname","loc":null,"msg":"","ns":["cardano.node.ChainDB"],"pid":"29128","sev":"Error","thread":"5"}

@CarlosLopezDeLara CarlosLopezDeLara added 1.35.2 and removed 1.35.1 Include in 1.35.1 labels Jul 8, 2022
@CarlosLopezDeLara CarlosLopezDeLara added in-scope This item is being worked and will be part of an upcoming release 1.35.1 Include in 1.35.1 and removed in-scope This item is being worked and will be part of an upcoming release 1.35.2 labels Jul 22, 2022
@andrejpodzimek
Copy link

This is still happening on 1.35.3

@CarlosLopezDeLara
Copy link
Contributor Author

This is still happening on 1.35.3

If you just upgraded from 1.34.x or older it will need to replay from genesis.

I just restarted a 1.35.3 node and it is working as expected.

...
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:56:14.40 UTC] Opened vol db
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:56:14.40 UTC] Started opening Ledger DB
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:06.51 UTC] Replaying ledger from snapshot at 7441c0d358335a158694d684cc932a1cd765ebdb9a4caf4d40fc3a05b976fb8a at slot 72215009
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:06.53 UTC] Replayed block: slot 72215021 out of 72215764. Progress: 1.59%
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:06.84 UTC] Opened lgr db
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:06.84 UTC] Started initial chain selection
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:07.52 UTC] Pushing ledger state for block b6fe5f8f8cafec4776b520473f852e34f73895716c7ded6adb03798531d06d7b at slot 72215777. Progress: 0.00%
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:21.26 UTC] before next, messages elided = 72215782
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:21.26 UTC] Pushing ledger state for block 9238330333b868a64384421bc2ada6df9c2837d7f3c3a139f2d5a825eba5700e at slot 72225740. Progress: 22.61%
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:21.26 UTC] Pushing ledger state for block 5e1aac7d9bd84b3a8b8b9bba9e4a8e14a2d8ae533feb6f978e1f00f7df413d2a at slot 72225797. Progress: 22.74%
[CLR-:cardano.node.ChainDB:Info:5] [2022-09-22 05:58:21.29 UTC] Pushing ledger state for block d2ac336033e69ad1e045981880ee9ac4986f90bd47f5dde5988360cd16920283 at slot 72225812. Progress: 22.77% 
...

% cardano-node --version
cardano-node 1.35.3 - darwin-x86_64 - ghc-8.10
git rev 950c4e222086fed5ca53564e642434ce9307b0b9

@dorin100 dorin100 added type: bug Something is not working user type: internal Created by an IOG employee era: babbage tag: 1.35.1 priority: high Needs to be addressed as soon as possible, probably within current sprint. labels Oct 21, 2022
@dorin100 dorin100 added severity: medium Small defects that do not prevent any crucial functionality from working. type: regression A feature that worked before stoped working type: upstream issue Corresponds to an upstream component (consensus, networking, ledger, etc.) comp: ledger labels Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.35.1 Include in 1.35.1 bug Something isn't working comp: ledger era: babbage priority high issues/PRs that MUST be addressed. The release can't happen without this; priority: high Needs to be addressed as soon as possible, probably within current sprint. severity: medium Small defects that do not prevent any crucial functionality from working. tag: 1.35.1 type: bug Something is not working type: regression A feature that worked before stoped working type: upstream issue Corresponds to an upstream component (consensus, networking, ledger, etc.) user type: internal Created by an IOG employee Vasil
Projects
None yet
Development

No branches or pull requests

4 participants