Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/upgrade node cannot sync after crash when upgrading by upgrade-time #8540

Closed
4 tasks
orkunkl opened this issue Feb 8, 2021 · 3 comments
Closed
4 tasks
Labels
C:Cosmovisor Issues and PR related to Cosmovisor

Comments

@orkunkl
Copy link

orkunkl commented Feb 8, 2021

Summary of Bug

During cosmwasm musselnet-2 upgrade from wasmd v0.14.0 to v0.15.0, one of our voting majority nodes crashed possibly due to a cosmovisor developer setup failure. I've tested cosmovisor locally with --update-height flag when creating a software upgrade proposal. When one of the nodes crashed I just restarted the node with right configuration and it applies the upgrade and syncs blocks, however on musselnet it was different...

Proposal message

{
  "proposal_id": "2",
  "content": {
    "@type": "/cosmos.upgrade.v1beta1.SoftwareUpgradeProposal",
    "title": "Musselnet wasmd v0.15.0 upgrade proposal",
    "description": "This proposal will upgrade the network to the latest cosmos-sdk and wasmd versions",
    "plan": {
      "name": "musselnet-3",
      "time": "2021-02-08T09:00:00Z",
      "height": "0",
      "info": "",
      "upgraded_client_state": null
    }
  },
  "status": "PROPOSAL_STATUS_PASSED",
  "final_tally_result": {
    "yes": "3346646481914258040",
    "abstain": "0",
    "no": "0",
    "no_with_veto": "0"
  },
  "submit_time": "2021-02-04T10:04:20.602433256Z",
  "deposit_end_time": "2021-02-06T10:04:20.602433256Z",
  "total_deposit": [
    {
      "denom": "ufrites",
      "amount": "10000000"
    }
  ],
  "voting_start_time": "2021-02-04T10:04:20.602433256Z",
  "voting_end_time": "2021-02-06T10:04:20.602433256Z"
}

On 9AM CET, the upgrade has run. Some of the node applied the upgrade succesfully but confio-3 validator crashed. Without this validator the voting power of working nodes were below 67% and network halted. I ssh'ed the machine, fixed cosmovisor setup, re-run the wasmd app. It kept crashing with an error that says on 9AM upgrade needed. When I checked /root/.wasmd/cosmovisor/current/bin cosmovisor applied the upgrade but it still crashed. I suspect it is crashing because upgrade time has passed.

Version

v0.40.0

Steps to Reproduce

  • Setup a local network
  • Setup wasmd or gaiad with cosmovisor.
  • Make sure cosmovisor/upgrades/new-version/bin is empty
  • Schedule an upgrade proposal using --upgrade-time flag
  • Let it fail, and kill the machine
  • Fix cosmovisor directory with adding the new binary to cosmovisor/upgrade/new-version/bin
  • Then gaiad start

I assume this reproduce steps will work.

Ref: #8538


  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@orkunkl
Copy link
Author

orkunkl commented Feb 9, 2021

Seed node for debugging: http://188.34.180.20:26657

@ryanchristo ryanchristo added this to the cosmovisor v1.0 milestone Jun 29, 2021
@ryanchristo ryanchristo added Status: Backlog C:Cosmovisor Issues and PR related to Cosmovisor labels Jun 29, 2021
@robert-zaremba
Copy link
Collaborator

We were running many tests recently (related to v0.44 upgrade and next Gaia upgrade) and didn't get into this error.

From what we see here:

  • the upgrade directory name is new-version, but it should be musselnet-3
  • in the last step, why are you suing gaiad start instead of FLAGS... cosmovisor start?

BTW: upgrade-time is deprecated, and --upgrade-height must be used now.

@robert-zaremba
Copy link
Collaborator

Closing the issue. Please reopen if you will encounter again this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor
Projects
None yet
Development

No branches or pull requests

4 participants