Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart node instead of panicing and exiting #218

Merged
merged 4 commits into from
Apr 19, 2023

Conversation

BrandonWeng
Copy link
Contributor

@BrandonWeng BrandonWeng commented Apr 14, 2023

Describe your changes and provide context

Currently the node will only switch into blocksync mode if the whole process is restarted, this change aims to add a detection go routine and restart the node if it's behind configurable

Testing performed to validate your change

Artificially introduced latency so that the node would fall behind, and see that it self restarted node
image

image

@BrandonWeng BrandonWeng force-pushed the bweng-config-log-level branch from e1b9392 to 64b3018 Compare April 14, 2023 14:30
@BrandonWeng BrandonWeng changed the title Bweng config log level Restart node instead of panicing and exiting Apr 17, 2023
@BrandonWeng BrandonWeng marked this pull request as ready for review April 18, 2023 02:54
@BrandonWeng
Copy link
Contributor Author

Error: stat foobar/config/genesis.json: no such file or directory
Usage:
  export [flags]

Flags:
      --chain-id string              Chain ID
      --for-zero-height              Export state to start at height zero (perform preproccessing)
      --height int                   Export state from a particular height (-1 means latest height) (default -1)
  -h, --help                         help for export
      --home string                  The application home directory (default "/tmp/TestExportCmd_HomeDir378377008/001")
      --jail-allowed-addrs strings   Comma-separated list of operator addresses of jailed validators to unjail

I think the test failures are unrelated

@yzang2019
Copy link
Collaborator

The test failures seems to be new, I think we should aim to fix them or at least figure out why they are failing. If they fail due to flakyness, then that's probably fine

@BrandonWeng BrandonWeng force-pushed the bweng-config-log-level branch from 5bf81b6 to 3f8fbd6 Compare April 18, 2023 18:41
@BrandonWeng BrandonWeng force-pushed the bweng-config-log-level branch from 2bd10c0 to 8306b09 Compare April 18, 2023 19:37
Copy link
Contributor

@philipsu522 philipsu522 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The only thing I'm slightly concerned about is that if we forcefully shut down the process (which I don't think it does) in the middle of a write to the database. This could cause some integrity issues and ultimately bork a machine. Can we:

  1. make sure we pause and wait for all graceful shutdown routines
  2. let it run repeatedly on a node for a while?

@BrandonWeng
Copy link
Contributor Author

Looks good. The only thing I'm slightly concerned about is that if we forcefully shut down the process (which I don't think it does) in the middle of a write to the database. This could cause some integrity issues and ultimately bork a machine. Can we:

  1. make sure we pause and wait for all graceful shutdown routines
  2. let it run repeatedly on a node for a while?
  1. In the StartInProcess call we actually should be gracefully shutting down the process whenever a restart/sig is sent (including database connections). https://github.com/sei-protocol/sei-cosmos/pull/218/files#diff-97289af2ab66016f168df6fccac70626d125f170e5cf0841fcdc8fa624999608R497-R521
  2. Yeah I can spin up a chain with this change and let it run for a while (with a couple of the nodes with smaller instances so they fall behind)

@BrandonWeng BrandonWeng enabled auto-merge (squash) April 19, 2023 19:17
@BrandonWeng BrandonWeng merged commit 9523b30 into main Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants