Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustness of eth2 clients to the eth1 node #1759

Closed
ethers opened this issue Apr 26, 2020 · 3 comments
Closed

Robustness of eth2 clients to the eth1 node #1759

ethers opened this issue Apr 26, 2020 · 3 comments

Comments

@ethers
Copy link
Member

ethers commented Apr 26, 2020

Would like to suggest this reminder to (all) eth2 clients, to be robust on whatever the eth1 node/infrasturcture does.

Here are some recent observations without calling out clients directly:

  • In one testnet, new beacon nodes couldn't sync to the testnet because the provided goerli node was overloaded with all the clients it is serving.

  • On a different client, given a running beacon node, validator, and eth1 node, when the eth1 goerli node started having issues (such as losing peers), after some time the beacon node was not able to continue functioning, and so the validator also stopped working. It's my understanding that the validator should continue unaffected, if an eth1 node goes down. This was not the case as the goerli node took down the beacon node. [To this client's credit, the validator node never had to be restarted: when the goerli node and beacon nodes are restarted and function again, the validator node resumes nicely.]

(I recall there was a pre-launch checklist of some sort [by @djrtwo] but I haven't been able to find it again. I suggest this testing be explicitly added.)

@prestonvanloon
Copy link
Contributor

Chiming in on the Prysm side of things since I believe you may have experienced the above issues in our testnet.

new beacon nodes couldn't sync to the testnet because the provided goerli node was overloaded with all the clients it is serving.

In order to determine the genesis state, the beacon node must have access to all of the deposits involved in creating this state. Another idea is that we hardcode the genesis state into the application post-launch.

On a different client, given a running beacon node, validator, and eth1 node, when the eth1 goerli node started having issues (such as losing peers), after some time the beacon node was not able to continue functioning, and so the validator also stopped working. It's my understanding that the validator should continue unaffected, if an eth1 node goes down.

I am not sure if this was Prysm or not, but eth1 connection only affects block proposals. The beacon node should be able to continue without an eth1 connection. In Prysm, we have recently implemented a timeout of 2 seconds when requesting eth1 information during block proposals (prysmaticlabs/prysm#5583). This is to help mitigate any issues where an eth1 node is slow to respond and a block proposal must be created in a timely fashion to maximize the validator's reward. If the timeout is exceeded, a random vote is used.

Thanks

@paulhauner
Copy link
Contributor

I am not sure if this was Prysm or not, but eth1 connection only affects block proposals.

I think this was with Lighthouse but I can't find the issue anymore. There was nothing to suggest that the eth1 node going down was linked to the beacon node losing peers, apart from one happened some time after the other. I've never observed this, nor can I figure how it might happen so I closed the issue.

@dapplion
Copy link
Member

dapplion commented Dec 8, 2023

pre-genesis issue

@dapplion dapplion closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants