Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cardano node 1.3.0, syncing stops due to ledger failure on testnet #461

Closed
johnalotoski opened this issue Jan 9, 2020 · 16 comments
Closed
Labels
bug Something isn't working byron Required for a Byron mainnet: replace the old core nodes with cardano-node. priority high issues/PRs that MUST be addressed. The release can't happen without this;
Milestone

Comments

@johnalotoski
Copy link
Contributor

johnalotoski commented Jan 9, 2020

After a 1.2.0 to 1.3.0 deployment and service restart on testnet, cardano-node 1.3.0 stopped syncing as observed by the following changes in behavior:

  1. No more tip or blockHeight updates reported to the journalctl for the cardano-node service
  2. While prometheus metrics are still reported at the expected port, blockHeight is missing
  3. The db storage path shows no filesystem writes to the immutable directory, and the most recent epoch files are not present (last epoch written is 28, current epoch is 33).

Systemd reports the cardano-node service is active and running. There are periodic 1 minute cardano.messagecounters.[aggregation|ekgview|monitoring] journalctl updates, but they don't offer any clue about what the state of the service is. There are no errors or warnings being issued.

The last few log lines which offer information which may help are:

[e-a-1:cardano.node.ChainDB:Debug:5] [2020-01-09 17:45:23.26 UTC] {"event":{"kind":"TraceImmDBEvent"},"kind":"TraceEvent","tip":"genesis(origin)"}
[e-a-1:cardano.node.ChainDB:Debug:5] [2020-01-09 17:45:23.26 UTC] {"event":{"immtip":{"kind":"Tip","tip":"151e2a8@625615"},"kind":"TraceOpenEvent.OpenedImmDB","epoch":"EpochNo {unEpochNo = 28}"},"kind":"TraceEvent","tip":"genesis(origin)"}node.ChainDB:Debug:5] [2020-01-09 17:45:23.36 UTC] {"event":{"kind":"TraceOpenEvent.OpenedVolDB"},"kind":"TraceEvent","tip":"genesis(origin)"}
[e-a-1:cardano.node.ChainDB:Info:5] [2020-01-09 17:45:23.37 UTC] {"event":{"snapshot":{"kind":"snapshot"},"kind":"TraceLedgerReplayEvent.ReplayFromSnapshot","tip":"Tip (At (Block {blockPointSlot = SlotNo {unSlotNo = 625539}, blockPointHash = ByronHash {unByronHash = AbstractHash 6714470a8c8cd83b7b1729913196e2abc377d6ea11280755bb1a0fd72f4df0b7}}))"},"kind":"TraceEvent","tip":"genesis(origin)"}
[e-a-1:cardano.node.ChainDB:Debug:5] [2020-01-09 17:45:23.40 UTC] {"event":{"kind":"TraceOpenEvent.OpenedLgrDB"},"kind":"TraceEvent","tip":"genesis(origin)"}
[e-a-1:cardano.node.ChainDB:Debug:5] [2020-01-09 17:45:24.19 UTC] {"event":{"kind":"InitChainSelValidation"},"kind":"TraceEvent","tip":"genesis(origin)"}
[e-a-1:cardano.node.ChainDB:Info:5] [2020-01-09 17:45:24.19 UTC] {"event":{"immtip":{"kind":"Tip","tip":"151e2a8@625615"},"kind":"TraceOpenEvent.OpenedDB","tip":{"kind":"Tip","tip":"50a1250@627774"}},"kind":"TraceEvent","tip":"genesis(origin)"}ChainDB:Info:70] [2020-01-09 17:45:24.21 UTC] {"event":{"snapshot":{"kind":"snapshot"},"kind":"TraceLedgerEvent.TookSnapshot","tip":"Tip (At (Block {blockPointSlot = SlotNo {unSlotNo = 625539}, blockPointHash = ByronHash {unByronHash = AbstractHash 6714470a8c8cd83b7b1729913196e2abc377d6ea11280755bb1a0fd72f4df0b7}}))"},"kind":"TraceEvent","tip":"50a1250@627774"}TraceEvent","tip":"50a1250@627774"}
[e-a-1:cardano.node.ChainDB:Info:142] [2020-01-09 17:45:24.43 UTC] {"event":{"readerid":"0","kind":"TraceGCEvent.PerformedGC"},"kind":"TraceEvent","tip":"50a1250@627774"}
[e-a-1:cardano.node.ChainDB:Info:181] [2020-01-09 17:45:34.14 UTC] {"event":{"readerid":"1","kind":"TraceGCEvent.PerformedGC"},"kind":"TraceEvent","tip":"50a1250@627774"}
[e-a-1:cardano.node.ChainDB:Info:191] [2020-01-09 17:45:34.28 UTC] {"event":{"readerid":"2","kind":"TraceGCEvent.PerformedGC"},"kind":"TraceEvent","tip":"50a1250@627774"}
[e-a-1:cardano.node.ChainDB:Info:202] [2020-01-09 17:45:43.95 UTC] {"event":{"readerid":"3","kind":"TraceGCEvent.PerformedGC"},"kind":"TraceEvent","tip":"50a1250@627774"}

Restarting the cardano-node service does not resolve the issue.

@disassembler
Copy link
Contributor

Removing volatile db and restarting the service caused metrics to report but node stopped syncing blocks. We're currently testing clearing entire state.

@johnalotoski
Copy link
Contributor Author

Clearing entire state did not resolve. The last blockHeight reported sync'd by the metrics is 626,706

@johnalotoski
Copy link
Contributor Author

While trying to sync a local byron-proxy to test if that works, it complained about an invalid block around a similar area, with an invalid csl-daedalus version and printed the following output:

[cardano_byron_proxy.index:Info:50] [2020-01-09 21:24:35.55 UTC] Rollforward (Block {blockPointSlot = SlotNo {unSlotNo = 627773}, blockPointHash = ByronHash {unByronHash = AbstractHash ce8a94a3314ad4b7cf6713f0ca2edd0570fcb1b32e2a071fa29bc6b46688eb4b}})
[cardano_byron_proxy.ChainDB:Error:156] [2020-01-09 21:24:35.55 UTC] InvalidBlock {_validationErr = ExtValidationErrorLedger (ChainValidationUpdateError (SlotNumber {unSlotNumber = 627775}) (Registration (InvalidSoftwareVersion (fromList [(ApplicationName {unApplicationName = "csl-daedalus"},(6,SlotNumber {unSlotNumber = 461812},fromList [(SystemTag {getSystemTag = "linux64"},InstallerHash {unInstallerHash = AbstractHash 767bdd873dc4d8d2567f691cea94a6a96a62c1ad58a60990470a3fc76e59f036}),(SystemTag {getSystemTag = "macos64"},InstallerHash {unInstallerHash = AbstractHash c9e9249fd922bd970aa0e3180ae28eb538db7c42c2f5cf83916873be4bad8ee3}),(SystemTag {getSystemTag = "win64"},InstallerHash {unInstallerHash = AbstractHash 0e11542d106aa52cab7ed847bc1a2ec33e470f45c2631e8e03bd3ec49c0c6af3})]))]) cardano-sl:0))), _invalidPoint = At (Block {blockPointSlot = SlotNo {unSlotNo = 627775}, blockPointHash = ByronHash {unByronHash = AbstractHash 8899363c32bd425fcea12dc2a06383085492c657b5a1a89a42e1e1851176b243}})}
[cardano_byron_proxy.index:Info:50] [2020-01-09 21:24:35.55 UTC] Rollforward (Block {blockPointSlot = SlotNo {unSlotNo = 627774}, blockPointHash = ByronHash {unByronHash = AbstractHash 50a125068d2097fe996419268aea32f328289394fafcf471dac05d847e206aa6}})
[cardano_byron_proxy.diffusion:Debug:156] [2020-01-09 21:24:35.65 UTC] Read block 4a756b2b difficulty 626915
[cardano_byron_proxy.diffusion:Debug:156] [2020-01-09 21:24:36.22 UTC] Read block 63efec60 difficulty 627171
[cardano_byron_proxy.diffusion:Debug:77] [2020-01-09 21:24:36.80 UTC] Announcing header to others:
MainBlockHeader:
    hash: 50a125068d2097fe996419268aea32f328289394fafcf471dac05d847e206aa6
    previous block: ce8a94a3314ad4b7cf6713f0ca2edd0570fcb1b32e2a071fa29bc6b46688eb4b
    slot: 1374th slot of 29th epoch
    difficulty: 626706
    leader: pub:29e75f1d
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:29e75f1d, dPk = pub:e764b034 } }
    block: v0.0.0
    software: cardano-sl:0

@shenyaqi9527
Copy link

I'm in the same situation. Is there any solution? BlockHeght stopped at 626706

@johnalotoski
Copy link
Contributor Author

I'm in the same situation. Is there any solution? BlockHeght stopped at 626706

Once the issue is solved, this ticket will be updated @shenyaqi9527.

@johnalotoski
Copy link
Contributor Author

A local sync of Byron proxy to testnet, syncs repeatedly, all the way to the real tip, and then loops back this problem block, 626,706 and repeats.

@disassembler
Copy link
Contributor

This is using the default proxy config:
nix-build -A scripts.testnet.proxy -o launch_proxy_testnet && ./launch_proxy_testnet

@disassembler
Copy link
Contributor

proposal UI gives these details for that proposal:

┃UnsafeUpdateProposal {upBlockVersion = BlockVersion {bvMajor = 1, bvMinor = 0, bvAlt = 0}, upBlockVersionMod = BlockVersionModifier  ┃
┃{bvmScriptVersion = Nothing, bvmSlotDuration = Nothing, bvmMaxBlockSize = Nothing, bvmMaxHeaderSize = Nothing, bvmMaxTxSize =        ┃
┃Nothing, bvmMaxProposalSize = Nothing, bvmMpcThd = Nothing, bvmHeavyDelThd = Nothing, bvmUpdateVoteThd = Nothing,                    ┃
┃bvmUpdateProposalThd = Nothing, bvmUpdateImplicit = Nothing, bvmSoftforkRule = Nothing, bvmTxFeePolicy = Nothing, bvmUnlockStakeEpoch┃
┃= Just (EpochIndex {getEpochIndex = 9999999999999999999})}, upSoftwareVersion = cardano-sl:0, upData = fromList [], upAttributes =   ┃
┃Attributes { data: () }, upFrom = PublicKey (XPub {xpubPublicKey =                                                                   ┃
┃"\231d\176\&4\r{5?_tX\145\ETX7t\228\190\171j\161E\138T\255)\161\&2L\ENQ\187\152v", xpubChaincode = ChainCode                         ┃
┃"\181t\207Y*\SI\134\253\209\251\250\251\237\222\158\"\206\149\"[/\DC1\133\SYN):\204\216\202\v0\SYN"}), upSignature = Signature       ┃
┃(XSignature {unXSignature =                                                                                                          ┃
┃"\212\248\156<Hd\200o9\151\155\199\200\&2\ae\ETX\239\174\v\239\235'1\STX>(]kJ\224\179X\EOT%\175\232\237Qi'<\253\255\222h\241G\194[\15┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Implicit: no                                                                                                                           
Proposed: AbstractHash 8899363c32bd425fcea12dc2a06383085492c657b5a1a89a42e1e1851176b243                                                
Decided: AbstractHash 8899363c32bd425fcea12dc2a06383085492c657b5a1a89a42e1e1851176b243                                                 
Confirmed: AbstractHash 4ce9f8194bf1205868ffbc6cf47d5ab92fd36a7f87dc62bc93d4120ed469337d                                               
Adopted: AbstractHash 37cd8cb13f529445b3754e9c151ecb8b27323041ce678181cbf86bd625335e14                                                 
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━StakeholderVotes━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃Votes: 4 fromList [(PublicKey (XPub {xpubPublicKey =                                                                                 ┃
┃"\229\188!\168\&6\SYN\188\204\254\&4>\195k\157\196\192l\144\233\DC3\223\GS\138\v\EOT`\be\USB\202\169", xpubChaincode = ChainCode     ┃
┃"J\174\134\251\253\196Xm\171\213G\167\233-\ap.3+\204\158}\215S#\157\200\r\ETX<Y\194"}),PositiveVote),(PublicKey (XPub {xpubPublicKey ┃
┃= "s\174A\236\162\190\&7\252\NAK\197ZP\214h\200d~\DLE\191\"!r\194\213\138\191\166\233\&1\SOYb", xpubChaincode = ChainCode            ┃
┃"?\143&\248\237\193\206\143i\139\213\220\159\234\163\SUB\203\195d\191\246\251\129\211C\255\&8&\235\221\141U"}),PositiveVote),(PublicK┃
┃(XPub {xpubPublicKey = "\223\182\NAK\166\NAKh\214\134\DELE\168\\2\"\DEL'\STXQ\128\215\&8\168\163\215\253<\146\159bMr9", xpubChaincode┃
┃= ChainCode ")\128\ESC46A\241H\b{\DC1~\135\233i_!]\188\248\&0\255\rX\160fh,v\ETX\246A"}),PositiveVote),(PublicKey (XPub              ┃
┃{xpubPublicKey = "\231d\176\&4\r{5?_tX\145\ETX7t\228\190\171j\161E\138T\255)\161\&2L\ENQ\187\152v", xpubChaincode = ChainCode        ┃
┃"\181t\207Y*\SI\134\253\209\251\250\251\237\222\158\"\206\149\"[/\DC1\133\SYN):\204\216\202\v0\SYN"}),PositiveVote)]                 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Positive: 23999997409054450                                                                                                            
Negative: 0

@disassembler
Copy link
Contributor

could this be because ledger rules require applicationVersion to be greater than previous 1 instead of greater than or equal to? That could be an easy fix to resolve.

@avieth
Copy link
Contributor

avieth commented Jan 10, 2020

could this be because ledger rules require applicationVersion to be greater than previous 1 instead of greater than or equal to? That could be an easy fix to resolve.

There are indeed some rules about how the major/minor/patch version numbers must be incremented but I can't recall off the top of my head. Will find them in the program text.

Edit: I was referring the "block version". I'd bet there's a similar thing for application version.

@avieth
Copy link
Contributor

avieth commented Jan 10, 2020

Old ledger (cardano-sl)

-- Here we check that a software version is either equal to, or 1
-- greater than, the last confirmed version of the given application.
verifySoftwareVersionPure
    :: Maybe NumSoftwareVersion
    -> SoftwareVersion
    -> SVChange
verifySoftwareVersionPure mAdoptedSVNum proposedSV =
    case mAdoptedSVNum of
        -- If there is no confirmed versions for given application,
        -- We check that version is 0.
        Nothing | svNumber proposedSV == 0 -> SVIncrement
                | otherwise                -> SVInvalid
        -- Otherwise we check that version is 1 more than stored
        -- version.
        Just n
            | svNumber proposedSV == n + 1 -> SVIncrement
            | svNumber proposedSV == n     -> SVNoChange
            | otherwise                    -> SVInvalid

New ledger (cardano-ledger)

-- | Check that a new 'SoftwareVersion' is a valid next version
--
--   The new version is valid for a given application if it is the same or one
--   more than the current version
svCanFollow :: ApplicationVersions -> SoftwareVersion -> Bool
svCanFollow avs softwareVersion = case M.lookup appName avs of
  Nothing -> appVersion == 1
  Just (currentAppVersion, _, _) -> appVersion == currentAppVersion + 1
  where SoftwareVersion appName appVersion = softwareVersion

Who is right? Old one says version must increase by 1 or stay the same. New version says it must increase by 1.

@dcoutts
Copy link
Contributor

dcoutts commented Jan 10, 2020

@johnalotoski
Copy link
Contributor Author

johnalotoski commented Jan 10, 2020

I rebuilt proxy with ledger commit e6651b9 from the branch @dcoutts provided and it still throws a InvalidBlock error at the same block upon a fresh state sync:

[cardano_byron_proxy.ChainDB:Error:156] [2020-01-10 18:48:06.82 UTC] InvalidBlock {_validationErr = ExtValidationErrorLedger (ChainValidationUpdateError (SlotNumber {unSlotNumber = 627775}) (Registration (InvalidSoftwareVersion (fromList [(ApplicationName {unApplicationName = "csl-daedalus"},(6,SlotNumber {unSlotNumber = 461812},fromList [(SystemTag {getSystemTag = "linux64"},InstallerHash {unInstallerHash = AbstractHash 767bdd873dc4d8d2567f691cea94a6a96a62c1ad58a60990470a3fc76e59f036}),(SystemTag {getSystemTag = "macos64"},InstallerHash {unInstallerHash = AbstractHash c9e9249fd922bd970aa0e3180ae28eb538db7c42c2f5cf83916873be4bad8ee3}),(SystemTag {getSystemTag = "win64"},InstallerHash {unInstallerHash = AbstractHash 0e11542d106aa52cab7ed847bc1a2ec33e470f45c2631e8e03bd3ec49c0c6af3})]))]) cardano-sl:0))), _invalidPoint = At (Block {blockPointSlot = SlotNo {unSlotNo = 627775}, blockPointHash = ByronHash {unByronHash = AbstractHash 8899363c32bd425fcea12dc2a06383085492c657b5a1a89a42e1e1851176b243}})}
[cardano_byron_proxy.index:Info:50] [2020-01-10 18:48:06.82 UTC] Rollforward (Block {blockPointSlot = SlotNo {unSlotNo = 627774}, blockPointHash = ByronHash {unByronHash = AbstractHash 50a125068d2097fe996419268aea32f328289394fafcf471dac05d847e206aa6}})
[cardano_byron_proxy.diffusion:Debug:156] [2020-01-10 18:48:07.14 UTC] Read block 4a756b2b difficulty 626915
[cardano_byron_proxy.diffusion:Debug:156] [2020-01-10 18:48:07.73 UTC] Read block 63efec60 difficulty 627171
[cardano_byron_proxy.diffusion:Debug:77] [2020-01-10 18:48:08.29 UTC] Announcing header to others:
MainBlockHeader:
    hash: 50a125068d2097fe996419268aea32f328289394fafcf471dac05d847e206aa6
    previous block: ce8a94a3314ad4b7cf6713f0ca2edd0570fcb1b32e2a071fa29bc6b46688eb4b
    slot: 1374th slot of 29th epoch
    difficulty: 626706
    leader: pub:29e75f1d
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:29e75f1d, dPk = pub:e764b034 } }
    block: v0.0.0
    software: cardano-sl:0

@johnalotoski
Copy link
Contributor Author

johnalotoski commented Jan 11, 2020

Rebuilding proxy with your latest commit hash on that branch @dcoutts (b3c8c59), or with @disassembler's similar branch (dedc232) results in a byron proxy which can sync with testnet fully without throwing the InvalidBlock error. Then, rebuilding cardano-node, also with either of these ledger updates and configuring topology to use the fully synced local proxy, the node now fully syncs!

@johnalotoski
Copy link
Contributor Author

A proxy and node sync also works with the cardano-ledger branch dcoutts/issue-cardano-node-461-v2

@i-o-m i-o-m added bug Something isn't working byron Required for a Byron mainnet: replace the old core nodes with cardano-node. priority high issues/PRs that MUST be addressed. The release can't happen without this; labels Jan 16, 2020
@vhulchenko-iohk vhulchenko-iohk added this to the S5 2020-01-30 milestone Jan 17, 2020
@disassembler disassembler changed the title Cardano node 1.3.0, syncing stops for unknown reasons Cardano node 1.3.0, syncing stops due to ledger failure on testnet Jan 21, 2020
@disassembler
Copy link
Contributor

This is resolved in current master and 1.4.0 pre-release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working byron Required for a Byron mainnet: replace the old core nodes with cardano-node. priority high issues/PRs that MUST be addressed. The release can't happen without this;
Projects
None yet
Development

No branches or pull requests

7 participants