Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Web3Signer support for VC #2522

Closed
wants to merge 81 commits into from

Conversation

paulhauner
Copy link
Member

@paulhauner paulhauner commented Aug 18, 2021

Issue Addressed

Resolves #2498

Proposed Changes

Allows the VC to call out to a Web3Signer remote signer to obtain signatures.

Additional Info

Making Signing Functions async

To allow remote signing, I needed to make all the signing functions async. This caused a bit of noise where I had to convert iterators into for loops.

In duties_service.rs there was a particularly tricky case where we couldn't hold a write-lock across an await, so I had to first take a read-lock, then grab a write-lock.

Move Signing from Core Executor

Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:

http_get_times

I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.

Refactor TaskExecutor

I noticed that the TaskExecutor::spawn_blocking_handle function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.

TODO

  • Restructure to support multiple signing methods.
  • Add calls to remote signer from VC.
  • Documentation
  • Test all endpoints
  • Test HTTPS certificate
  • Allow adding remote signer validators via the API
  • Add Altair support via 21.8.1-rc1
  • Create issue to start using latest version of web3signer. (See Web3Signer tests aren't using latest version #2570)

Notes

@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label Aug 18, 2021
@paulhauner paulhauner changed the title Remote signer support for VC Web3Signer support for VC Aug 26, 2021
@paulhauner paulhauner force-pushed the remote-signer-client branch from f338192 to 43bdab7 Compare August 31, 2021 06:37
bors bot pushed a commit that referenced this pull request Sep 3, 2021
## Proposed Changes

This PR deletes all `remote_signer` code from Lighthouse, for the following reasons:

* The `remote_signer` code is unused, and we have no plans to use it now that we're moving to supporting the Web3Signer APIs: #2522
* It represents a significant maintenance burden. The HTTP API tests have been prone to platform-specific failures, and breakages due to dependency upgrades, e.g. #2400.

Although the code is deleted it remains in the Git history should we ever want to recover it. For ease of reference:

- The last commit containing remote signer code: 5a3bcd2
- The last Lighthouse version: v1.5.1
@paulhauner
Copy link
Member Author

paulhauner commented Sep 15, 2021

Thanks for the review @michaelsproul, they were great points about running the tasks in parallel! I've addressed all your comments and this is ready for review again.

You can start your review now, but before we merge I'm going to:

  • Run on Prater/Pyrmont.
  • Perform manual testing with a testnet validator.

@paulhauner paulhauner added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Sep 15, 2021
Copy link
Member

@michaelsproul michaelsproul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👌

Merge at will 🎉

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Sep 16, 2021
@paulhauner
Copy link
Member Author

Thank you! I add couple more very minor metrics, FYI.

bors r+

bors bot pushed a commit that referenced this pull request Sep 16, 2021
[EIP-3030]: https://eips.ethereum.org/EIPS/eip-3030
[Web3Signer]: https://consensys.github.io/web3signer/web3signer-eth2.html

## Issue Addressed

Resolves #2498

## Proposed Changes

Allows the VC to call out to a [Web3Signer] remote signer to obtain signatures.


## Additional Info

### Making Signing Functions `async`

To allow remote signing, I needed to make all the signing functions `async`. This caused a bit of noise where I had to convert iterators into `for` loops.

In `duties_service.rs` there was a particularly tricky case where we couldn't hold a write-lock across an `await`, so I had to first take a read-lock, then grab a write-lock.

### Move Signing from Core Executor

Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:

![http_get_times](https://user-images.githubusercontent.com/6660660/132143737-82fd3836-2e7e-445b-a143-cb347783baad.png)

I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.

### Refactor TaskExecutor

I noticed that the `TaskExecutor::spawn_blocking_handle` function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.

## TODO

- [x] Restructure to support multiple signing methods.
- [x] Add calls to remote signer from VC.
- [x] Documentation
- [x] Test all endpoints
- [x] Test HTTPS certificate
- [x] Allow adding remote signer validators via the API
- [x] Add Altair support via [21.8.1-rc1](https://github.com/ConsenSys/web3signer/releases/tag/21.8.1-rc1)
- [x] Create issue to start using latest version of web3signer. (See #2570)

## Notes

- ~~Web3Signer doesn't yet support the Altair fork for Prater. See Consensys/web3signer#423
- ~~There is not yet a release of Web3Signer which supports Altair blocks. See Consensys/web3signer#391
@michaelsproul michaelsproul removed the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Sep 16, 2021
@bors
Copy link

bors bot commented Sep 16, 2021

@bors bors bot changed the title Web3Signer support for VC [Merged by Bors] - Web3Signer support for VC Sep 16, 2021
@bors bors bot closed this Sep 16, 2021
paulhauner added a commit to paulhauner/lighthouse that referenced this pull request Sep 21, 2021
commit a73dcb7
Author: Age Manning <[email protected]>
Date:   Fri Sep 17 04:02:31 2021 +0000

    Improved handling of IP Banning (sigp#2530)

    This PR in general improves the handling around peer banning.

    Specifically there were issues when multiple peers under a single IP connected to us after we banned the IP for poor behaviour.

    This PR should now handle these peers gracefully as well as make some improvements around how we previously disconnected and banned peers.

    The logic now goes as follows:
    - Once a peer gets banned, its gets registered with its known IP addresses
    - Once enough banned peers exist under a single IP that IP is banned
    - We retain connections with existing peers under this IP
    - Any new connections under this IP are rejected

commit 64ad2af
Author: Pawan Dhananjay <[email protected]>
Date:   Fri Sep 17 01:11:16 2021 +0000

    Subscribe to altair gossip topics 2 slots before fork (sigp#2532)

    ## Issue Addressed

    N/A

    ## Proposed Changes

    Add a fork_digest to `ForkContext` only if it is set in the config.
    Reject gossip messages on post fork topics before the fork happens.

    Edit: Instead of rejecting gossip messages on post fork topics, we now subscribe to post fork topics 2 slots before the fork.

    Co-authored-by: Age Manning <[email protected]>

commit acdcea9
Author: Age Manning <[email protected]>
Date:   Thu Sep 16 04:45:07 2021 +0000

    Update mainnet bootnodes (sigp#2594)

    Sigma Prime is transitioning our mainnet bootnodes and this PR represents the transition of our bootnodes.

    After a few releases, old boot-nodes will be deprecated.

commit 56e0615
Author: Age Manning <[email protected]>
Date:   Thu Sep 16 04:45:05 2021 +0000

    Experimental discovery (sigp#2577)

    # Description

    A few changes have been made to discovery. In particular a custom re-write of an LRU cache which previously was read/write O(N) for all our sessions ~5k, to a more reasonable hashmap-style O(1).

    Further there has been reported issues in the current discv5, so added error handling to help identify the issue has been added.

commit c5c7476
Author: Paul Hauner <[email protected]>
Date:   Thu Sep 16 03:26:33 2021 +0000

    Web3Signer support for VC (sigp#2522)

    [EIP-3030]: https://eips.ethereum.org/EIPS/eip-3030
    [Web3Signer]: https://consensys.github.io/web3signer/web3signer-eth2.html

    ## Issue Addressed

    Resolves sigp#2498

    ## Proposed Changes

    Allows the VC to call out to a [Web3Signer] remote signer to obtain signatures.

    ## Additional Info

    ### Making Signing Functions `async`

    To allow remote signing, I needed to make all the signing functions `async`. This caused a bit of noise where I had to convert iterators into `for` loops.

    In `duties_service.rs` there was a particularly tricky case where we couldn't hold a write-lock across an `await`, so I had to first take a read-lock, then grab a write-lock.

    ### Move Signing from Core Executor

    Whilst implementing this feature, I noticed that we signing was happening on the core tokio executor. I suspect this was causing the executor to temporarily lock and occasionally trigger some HTTP timeouts (and potentially SQL pool timeouts, but I can't verify this). Since moving all signing into blocking tokio tasks, I noticed a distinct drop in the "atttestations_http_get" metric on a Prater node:

    ![http_get_times](https://user-images.githubusercontent.com/6660660/132143737-82fd3836-2e7e-445b-a143-cb347783baad.png)

    I think this graph indicates that freeing the core executor allows the VC to operate more smoothly.

    ### Refactor TaskExecutor

    I noticed that the `TaskExecutor::spawn_blocking_handle` function would fail to spawn tasks if it were unable to obtain handles to some metrics (this can happen if the same metric is defined twice). It seemed that a more sensible approach would be to keep spawning tasks, but without metrics. To that end, I refactored the function so that it would still function without metrics. There are no other changes made.

    ## TODO

    - [x] Restructure to support multiple signing methods.
    - [x] Add calls to remote signer from VC.
    - [x] Documentation
    - [x] Test all endpoints
    - [x] Test HTTPS certificate
    - [x] Allow adding remote signer validators via the API
    - [x] Add Altair support via [21.8.1-rc1](https://github.com/ConsenSys/web3signer/releases/tag/21.8.1-rc1)
    - [x] Create issue to start using latest version of web3signer. (See sigp#2570)

    ## Notes

    - ~~Web3Signer doesn't yet support the Altair fork for Prater. See Consensys/web3signer#423
    - ~~There is not yet a release of Web3Signer which supports Altair blocks. See Consensys/web3signer#391

commit 58012f8
Author: Michael Sproul <[email protected]>
Date:   Wed Sep 15 00:01:18 2021 +0000

    Shutdown gracefully on panic (sigp#2596)

    ## Proposed Changes

    * Modify the `TaskExecutor` so that it spawns a "monitor" future for each future spawned by `spawn` or `spawn_blocking`. This monitor future joins the handle of the child future and shuts down the executor if it detects a panic.
    * Enable backtraces by default by setting the environment variable `RUST_BACKTRACE`.
    * Spawn the `ProductionBeaconNode` on the `TaskExecutor` so that if a panic occurs during start-up it will take down the whole process. Previously we were using a raw Tokio `spawn`, but I can't see any reason not to use the executor (perhaps someone else can).

    ## Additional Info

    I considered using [`std::panic::set_hook`](https://doc.rust-lang.org/std/panic/fn.set_hook.html) to instantiate a custom panic handler, however this doesn't allow us to send a shutdown signal because `Fn` functions can't move variables (i.e. the shutdown sender) out of their environment. This also prevents it from receiving a `Logger`.  Hence I decided to leave the panic handler untouched, but with backtraces turned on by default.

    I did a run through the code base with all the raw Tokio spawn functions disallowed by Clippy, and found only two instances where we bypass the `TaskExecutor`: the HTTP API and `InitializedValidators` in the VC. In both places we use `spawn_blocking` and handle the return value, so I figured that was OK for now.

    In terms of performance I think the overhead should be minimal. The monitor tasks will just get parked by the executor until their child resolves.

    I've checked that this covers Discv5, as the `TaskExecutor` gets injected into Discv5 here: https://github.com/sigp/lighthouse/blob/f9bba92db3468321b28ddd9010e26b359f88bafe/beacon_node/src/lib.rs#L125-L126

commit 95b1713
Author: Age Manning <[email protected]>
Date:   Tue Sep 14 08:28:35 2021 +0000

    Reduce network debug noise (sigp#2593)

    The identify network debug logs can get quite noisy and are unnecessary to print on every request/response.

    This PR reduces debug noise by only printing messages for identify messages that offer some new information.

commit 4755d4b
Author: Wink Saville <[email protected]>
Date:   Tue Sep 14 06:48:26 2021 +0000

    Update sloggers to v2.0.2 (sigp#2588)

    fixes sigp#2584

commit f9bba92
Author: Paul Hauner <[email protected]>
Date:   Mon Sep 13 23:01:19 2021 +0000

    v1.5.2 (sigp#2595)

    ## Issue Addressed

    NA

    ## Proposed Changes

    Version bump

    ## Additional Info

    Please do not `bors` without my approval, I am still testing.

commit e4ed42a
Author: Squirrel <[email protected]>
Date:   Sun Sep 12 23:55:20 2021 +0000

    Fix nightly bump num bigint (sigp#2591)

    ## Issue Addressed

    Builds again on latest nightly

    ## Proposed Changes

    Break was caused by: rust-lang/rust#88581

commit f7dd24c
Author: Mason Stallmo <[email protected]>
Date:   Sat Sep 11 23:56:16 2021 +0000

    Add quoted u64/u64_vec to SyncCommitteeSubscription (sigp#2589)

    ## Issue Addressed

    Resolves sigp#2582

    ## Proposed Changes

    Use `quoted_u64` and `quoted_u64_vec` custom serde deserializers from `eth2_serde_utils` to support the proper Eth2.0 API spec for `/eth/v1/validator/sync_committee_subscriptions`

    ## Additional Info

    N/A

commit 46cd67d
Author: Mason Stallmo <[email protected]>
Date:   Sat Sep 11 06:07:20 2021 +0000

    Case insensitive match for ForkName (sigp#2587)

    ## Issue Addressed

    sigp#2583

    ## Proposed Changes

    Case insensitive match on `fork_name` when calling `ForkName::from_str`

    ## Additional Info

    N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v2.0.0 Altair on mainnet release (v2.0.0)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants