Add liveness endpoint for doppleganger protection #131

paulhauner · 2021-04-18T00:57:31Z

What

Following from #64, this PR adds a /eth/v1/validator/liveness endpoint which indicates if a validator has been observed to be "live" in some epoch. The idea is that doppleganger protection should only start if some validator has is_live: false for some number of epochs.

This PR differs from @dankrad's proposal in that it doesn't ask "give me the latest attestation", instead it asks the broader question "has the validator been observed to be live?". This has the following advantages:

It's an easier question to answer. Clients already need to store caches for observed attesters/aggregators/proposers that answer the boolean question "has this validator already been observed". Keeping a cache of old attestations for validators is more of a burden.
It's a less specific question so clients are welcome to include more "liveness" sources if they see fit. It's also more future-proof since it might also evolve to encompass future message types.

Why

This provides doppleganger protection as a first-class citizen. Given that clients already need to answer the "has this validator been observed" question for gossip message verification, this seems cheap to implement. If we add this endpoint then things like Infura can also support doppleganger protection.

I understand that Nimbus has already implemented doppleganger protection (cc @arnetheduck) and Lighthouse is in the later stages of implementing it (see #2230).

Example

Request

POST /eth/v1/validator/liveness

{
  "epoch": "1",
  "indices": [
    "1"
  ]
}

Response

{
  "data": [
    {
      "index": "1",
      "epoch": "1",
      "is_live": true
    }
  ]
}

paulhauner · 2021-04-18T01:23:54Z

I'm not sure why CI is failing, I had a look but it's not clear to me. Perhaps someone with more CI skills could nudge me in the right direction? 🙏

rolfyone · 2021-04-18T21:24:32Z

apis/validator/liveness.yaml

+          type: object
+          properties:
+            epoch:
+              $ref: "../../../beacon-node-oapi.yaml#/components/schemas/Uint64"


I think you'll find references in this file are only ../../beacon-node-oapi.yaml and that's probably the main build issue...

mpetrunic · 2021-04-19T06:59:44Z

apis/validator/liveness.yaml

+    content:
+      application/json:
+        schema:
+          title: GetAttesterDutiesBody


you should probably change this title. If somebody generates client code from this spec, this title will affect generated type/class name.

mpetrunic · 2021-04-19T07:01:12Z

apis/validator/liveness.yaml

+      content:
+        application/json:
+          schema:
+            title: GetAttesterDutiesBody


dankrad · 2021-04-20T14:33:04Z

Quick question on how is this intended to be used? It only queries for a single epoch, so I assume a validator client would have to poll a few (2-5) epochs to be certain?

Maybe it's worth adding a short sentence about this intended use.

apis/validator/liveness.yaml

Co-authored-by: realbigsean <[email protected]>

ahadda5 · 2021-05-12T06:33:18Z

@paulhauner Shouldn't this be part of the eth/v1/beacon. Ultimately you are asking the question "Who are the validators being observed to be live at this (end of) epoch?" no?
Which the beacon-chain will answer similar to other requests like ListValidators /eth/v1/beacon/states/{state_id}/validators or attestations etc..

paulhauner · 2021-05-12T06:42:30Z

@paulhauner Shouldn't this be part of the eth/v1/beacon. Ultimately you are asking the question "Who are the validators being observed to be live at this (end of) epoch?" no?
Which the beacon-chain will answer similar to other requests like ListValidators /eth/v1/beacon/states/{state_id}/validators or attestations etc..

I'm trying to determine if the validator has been seen to sign any message during the given epoch to try and make a guess about whether or not there's some other VC running the same pubkey. This is different to asking if it exists in the state, has an attestation included in a block or if it is included in the attestation pool.

ahadda5 · 2021-05-12T08:54:53Z

I'm trying to determine if the validator has been seen to sign any message during the given epoch to try and make a guess about whether or not there's some other VC running the same pubkey. This is different to asking if it exists in the state, has an attestation included in a block or if it is included in the attestation pool.

Got it. I looked at the other requests and saw their folder structure. I was confused as to where this get belongs. Thanks

…#2607) ## Issue Addressed N/A ## Proposed Changes Add a note to the Doppelganger Protection docs about how it is not interoperable until an endpoint facilitating it is standardized (ethereum/beacon-APIs#131). ## Additional Info N/A Co-authored-by: realbigsean <[email protected]>

mcdee · 2022-02-21T09:49:07Z

There's no information in here on how far back the "epoch" parameter can/should go. Is it expected that this could be called for any epoch, or for only the most recent epochs? And if the latter, what would a reasonable number of epochs be? Perhaps back to the last finalized epoch?

mpetrunic · 2022-02-21T15:46:52Z

@paulhauner could you check "Allow edits from maintainers" or update yourself? :)

arnetheduck · 2022-06-30T08:29:27Z

Thanks for raising this PR - we're in the process of adding doppelganger support to our VC, so getting this settled would be nice.

There's no information in here on how far back the "epoch" parameter can/should go.

Indeed -an easy solution here is to simply remove it - in Nimbus, we only keep track of a single "last-seen" data point, which is sufficient for doppelganger, or "liveness" in general.

is_live

This flag seems redundant - I think we can capture the essence of this API by simply returning the last epoch (or indeed slot) and let consumers decide on their liveness policy.

epoch

Possibly, we could increase granularity here to slot.

jclapis · 2022-09-19T04:43:35Z

I'll throw my support behind this one. Post-merge, Rocket Pool has seen several people shuttling nodes around to migrate to different clients with better performance, and in some situations exporting & importing the slashing DB isn't easy to do. Doppelganger detection is a common way to miss attesting on purpose in cases like this, but we had one instance where someone tried to attach a Lighthouse VC to a Nimbus BN and it failed because the routes aren't standardized.

Having a standardized liveness route so any VC with doppelganger could work with any BN would be very well received by our node operators.

paulhauner · 2022-10-19T22:43:10Z

Alrighty, I have updated this PR. My apologies for letting it linger. Thanks for @rolfyone for prodding me (several times) 🙏

There's no information in here on how far back the "epoch" parameter can/should go.

@mcdee, good point. I've added that the BN MUST support the current and previous epoch and MAY support previous epochs. Based on our implementation in Lighthouse, this seems to be the minimally useful set of information.

Indeed -an easy solution here is to simply remove it

@arnetheduck, removing epoch is a reasonable request. What it would remove is the ability to know that a validator was live in the current epoch but not the previous one. It's been a while since we implemented DG, so I'm not sure if this is an issue or not. Removing the epoch parameter makes the API slightly simpler, keeping it makes it slightly more powerful. I'm ambivalent and therefore tempted to leave it as is. I'm open to arguments.

Possibly, we could increase granularity here to slot.

The reason that I have specified epoch is since P2P validation conditions already require nodes to track which epochs they've seen attestations, however it doesn't require tracking specific slots. One of the goals of this PR was to be minimally intrusive. I'm open to switching to slots, but I think the fact that clients already have the groundwork for epochs is an appealing reason to leave it at that.

arnetheduck · 2022-10-20T21:05:46Z

I'm open to arguments.

I think without epoch, the query poses a different question: "when was the validator last active" vs "was the validator active in epoch X" - answering the latter requires book-keeping of historical "activeness" which complicates the implementation

epoch in both request and response

as specified, it's somewhat ambiguous what the epoch parameter in the answer should be - if a validator was seen to sign something in epoch 5 and a query is done for epoch 6, should the validator be considered active or not (is_live)? if not, why is the epoch both in request and response? under what circumstances would epoch in the response differ from that of the request?

current and previous epoch

under what circumstances do we need to know two epochs for doppelganger, and why two specifically? it seems sufficient to know when the validator last published something - this allows configuring different paranoia policies (number of inactive epochs) for liveness without having to do multiple (potentially racy) queries. limiting it to "previous" and "current" constrains the implementation to that number of "inactive" epochs (one-and-a-few-slots)

paulhauner · 2022-10-20T23:19:44Z

@arnetheduck, I can see two points in your previous post:

Why should we track liveness per epoch rather than "highest live epoch".
Why is the epoch returned in the response.

1. Why track liveness per epoch

Firstly, we already need to track per-epoch liveness to follow the P2P spec. For instance, if you've seen an attestation in epoch n but not n - 1 you still need to propagate the epoch n - 1 attestation. Tracking just the highest seen epoch would not permit the client to follow the P2P spec.

Secondly, we should note that the P2P spec only requires the node to track the current + previous epochs (potentially the next epoch too, if you consider clock disparity).

Given those two points, let's assume that all nodes have at least this abstract data structure in order to meet the P2P propagation conditions:

struct AttestationSeen {
  previous_epoch: Set<ValidatorIndex>,
  current_epoch: Set<ValidatorIndex>,
}

If we accept that the above struct is the minimum required struct to meet the P2P propagation conditions, then I think it's clear that the best way to query it is by epoch. I argue that querying this struct for "highest seen epoch" is strictly worse for the following reasons:

It's impossible to know the liveness status of the previous epoch if the validator was live in the current epoch.
There is no way for the server to communicate that the requested epoch is out of range. If the epoch is provided in the request then the server can say "that epoch is out of range", this reduces edge-cases when the client and server have clock sync issues and also allows the server to communicate the lowest epoch it supports. If the client asks for the highest epoch then it gets no information back from the server about their respective clock sync or which epochs the server actually tracks.

So, I argue that "highest live epoch" is:

A complication when we consider what clients are already required to track.
Provides strictly less information.

The upside of "highest live epoch" is that it can smoosh together information about the current and previous epoch to reduce the number of calls to the BN. You made a point about "racy" queries, I reject that claim because the liveness of epoch n - 1 and n are independent values. I also argue that doppelganger queries are light enough and infrequent enough that the gains of optimising functionality away from them is premature.

2. Why is the `epoch` returned in the response.

The epoch in the request is always the same as the response. I don't recall why this was added (we implemented this over a year ago). Perhaps we can remove it, I don't really mind.

Summary

Unless there's a glaring hole in my logic here, I have a rather small appetite for arguing this further. It matters little to me about what we do here, I'm sure we can just shim your approach into ours if need be. If my reasoning is sound, but you still want to optimise this API call then feel free to make a new PR for it. If that gets merged rather than this then Lighthouse will implement it.

rolfyone · 2022-10-21T06:37:00Z

I'm in line with your logic @paulhauner , thanks for the detailed description and your hackmd - I find it all really logical...

To me, there's been no discussion for a long time, so I think that we were happy previously with the logic also. Will flag this to make sure ppl read it on discord, but I'm inclined to approve and merge unless theres strong objections raised...

arnetheduck · 2022-10-23T07:28:16Z

SGTM, thanks for the rationale
Tendentially it seems better to avoid redundant data as it opens up for ambiguity and implementation bugs/complexity (ie implementations must now check that the epoch is indeed correct) for no gain - annoying, but not a big deal

paulhauner · 2022-10-23T21:48:19Z

Tendentially it seems better to avoid redundant data as it opens up for ambiguity and implementation bugs/complexity (ie implementations must now check that the epoch is indeed correct) for no gain - annoying, but not a big deal

I had a look around the other endpoints and it seems that not including the epoch in the response would be most consistent with the rest of the repo. I also think your argument makes sense. I've removed the epoch in 4bbefdb.

rkapka · 2022-10-24T09:52:43Z

Nitpick:

Why track liveness per endpoint

It should say epoch, not endpoint.

dapplion

LGTM

rolfyone

LGTM. last call for any strong objections...

ethereum#131 introduced a new liveness endpoint - the request is similar to attester duties in that we query a list of attesters for data pertaining to a particular epoch - as such, it seems reasonable to keep the two requests similar in terms of their URL/postdata structure.

#131 introduced a new liveness endpoint - the request is similar to attester duties in that we query a list of attesters for data pertaining to a particular epoch - as such, it seems reasonable to keep the two requests similar in terms of their URL/postdata structure. Co-authored-by: Paul Harris <[email protected]>

ethereum/beacon-APIs#131 introduced a new liveness endpoint - the request is similar to attester duties in that we query a list of attesters for data pertaining to a particular epoch - as such, it seems reasonable to keep the two requests similar in terms of their URL/postdata structure. Co-authored-by: Paul Harris <[email protected]>

paulhauner added 6 commits April 18, 2021 10:34

Add liveness endpoint

6dce4b6

Update description

8acbbdd

Update description

4bc0352

Add full-stop

944c89c

Fix params

88dc231

Add epoch to response

a3f2d51

paulhauner mentioned this pull request Apr 18, 2021

[Merged by Bors] - Doppelganger detection sigp/lighthouse#2230

Closed

rolfyone reviewed Apr 18, 2021

View reviewed changes

mpetrunic reviewed Apr 19, 2021

View reviewed changes

realbigsean reviewed Apr 23, 2021

View reviewed changes

apis/validator/liveness.yaml Outdated Show resolved Hide resolved

Update apis/validator/liveness.yaml

9dcd377

Co-authored-by: realbigsean <[email protected]>

realbigsean mentioned this pull request Sep 20, 2021

[Merged by Bors] - Add a note about doppelganger protection interoperability to the docs sigp/lighthouse#2607

Closed

yorickdowne mentioned this pull request Oct 6, 2021

Feature: Liveness endpoint for doppelganger protection interop Consensys/teku#4453

Closed

paulhauner added 2 commits October 20, 2022 09:19

Merge branch 'master' into liveness

05a952e

Address review comments

a522c0a

paulhauner dismissed stale reviews from ghost via a522c0a October 19, 2022 22:22

Update endpoint description

ac3a93a

Remove epoch from response

4bbefdb

dapplion approved these changes Oct 24, 2022

View reviewed changes

rolfyone approved these changes Oct 24, 2022

View reviewed changes

rolfyone merged commit 8a96147 into ethereum:master Oct 25, 2022

arnetheduck mentioned this pull request Oct 25, 2022

Implemend standardised endpoint for doppelganger detection status-im/nimbus-eth2#4256

Closed

2 tasks

arnetheduck mentioned this pull request Oct 27, 2022

Give liveness epoch same "shape" as attester duties #253

Merged

nflaig mentioned this pull request Jun 18, 2023

fix!: update validator liveness API to be spec compliant ChainSafe/lodestar#5667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add liveness endpoint for doppleganger protection #131

Add liveness endpoint for doppleganger protection #131

paulhauner commented Apr 18, 2021 •

edited

Loading

paulhauner commented Apr 18, 2021 •

edited

Loading

rolfyone Apr 18, 2021

mpetrunic Apr 19, 2021 •

edited

Loading

mpetrunic Apr 19, 2021

dankrad commented Apr 20, 2021

ahadda5 commented May 12, 2021

paulhauner commented May 12, 2021

ahadda5 commented May 12, 2021 •

edited

Loading

mcdee commented Feb 21, 2022

mpetrunic commented Feb 21, 2022

arnetheduck commented Jun 30, 2022

jclapis commented Sep 19, 2022 •

edited

Loading

paulhauner commented Oct 19, 2022

arnetheduck commented Oct 20, 2022

paulhauner commented Oct 20, 2022 •

edited

Loading

rolfyone commented Oct 21, 2022

arnetheduck commented Oct 23, 2022

paulhauner commented Oct 23, 2022

rkapka commented Oct 24, 2022

dapplion left a comment

rolfyone left a comment

Add liveness endpoint for doppleganger protection #131

Add liveness endpoint for doppleganger protection #131

Conversation

paulhauner commented Apr 18, 2021 • edited Loading

What

Why

Example

Request

Response

paulhauner commented Apr 18, 2021 • edited Loading

rolfyone Apr 18, 2021

Choose a reason for hiding this comment

mpetrunic Apr 19, 2021 • edited Loading

Choose a reason for hiding this comment

mpetrunic Apr 19, 2021

Choose a reason for hiding this comment

dankrad commented Apr 20, 2021

ahadda5 commented May 12, 2021

paulhauner commented May 12, 2021

ahadda5 commented May 12, 2021 • edited Loading

mcdee commented Feb 21, 2022

mpetrunic commented Feb 21, 2022

arnetheduck commented Jun 30, 2022

jclapis commented Sep 19, 2022 • edited Loading

paulhauner commented Oct 19, 2022

arnetheduck commented Oct 20, 2022

paulhauner commented Oct 20, 2022 • edited Loading

1. Why track liveness per epoch

2. Why is the epoch returned in the response.

Summary

rolfyone commented Oct 21, 2022

arnetheduck commented Oct 23, 2022

paulhauner commented Oct 23, 2022

rkapka commented Oct 24, 2022

dapplion left a comment

Choose a reason for hiding this comment

rolfyone left a comment

Choose a reason for hiding this comment

paulhauner commented Apr 18, 2021 •

edited

Loading

paulhauner commented Apr 18, 2021 •

edited

Loading

mpetrunic Apr 19, 2021 •

edited

Loading

ahadda5 commented May 12, 2021 •

edited

Loading

jclapis commented Sep 19, 2022 •

edited

Loading

paulhauner commented Oct 20, 2022 •

edited

Loading

2. Why is the `epoch` returned in the response.