-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSE block event post deneb #349
Comments
What's the gap between 3) and 4), in general? Can they be far enough apart that they warrant separate events? I'd prefer to keep the 3 and 4 could be argued as both being part of the 'import to state' phase of the process, is there a requirement out there to know the timing of item 3? |
Not all blocks end up being head, so for the same block hash event 4 may never be emitted. Also a block could be made head latter when there's enough LMD weight, so the gap between 3 and 4 ranges from 0 to infinity.
We would have to check with all consumers of this event, because emitting at 2 means you can potentially emit invalid blocks. Not sure if this can break explorers and other uses. As of today, at least Lighthouse and Lodestar emit the
Not a strong requirement in timing in general, but they convey different information: 2. is precise in timing but may emit invalid blocks. 3. is imprecise in timing but is guaranteed to emit valid blocks. |
anyone using the event stream for networking analysis is likely measuring client rest server implementation differences rather than any kind of network performance - we've seen this significantly skew results between clients in past studies to the point that they were almost doing more harm than good and I think it's misleading at best to use this as signal unless precise timing information is added to each relevant event. "time received" vs "my rest server published it and the sse reader saw it and some more random delays in between" is a significant source of flakeyness. I certainly believe that we should be drawing no conclusions whatsoever from timing information received over |
From what I know (and use) then generally
I wouldn't object to the events being extended with a |
This would be really nice if possible 👍 |
From a consumer side, it would make sense to update all the existing relevant topics,
Possibly beyond the scope of this issue, but would it make sense to also publish invalid/not imported items for data collection? |
I'm not sure how multiple timestamps would work. You would either be holding the event back from the event stream until all timestamps were available, or you would be sending multiple events for the same item with the different timestamps as each point was reached. I'd prefer to see each event be clearly defined in terms of when it should be timestamped, but also be emitted as soon as possible afterwards. Having multiple timestamps starts to look like tracing, and should likely be handled with something like OpenTelemetry within the individual consensus clients. |
Yeah fair, multiple events would also be great. I guess that would also cover my question about publishing items received from gossip but were invalid. |
its non trivial for us to get he timestamp that it arrived, im not sure how practical it would be.... and our 'arrival' time is when we look at it. i'm not sure it'd be ideal to have to instrument the event as it goes through the system, currently its immutable... |
@rolfyone would the point that the item (thinking block, here) finished P2P validation and is passed on for processing a suitable point at which the event could be generated? As for the object being immutable, I assume that means that new event stream objects would have to be created to contain the timestamp information as well (unless you're passing JSON directly to the event API). Would adding this be significant overhead? Regarding multiple timestamps as data is received and processed: I agree that instrumenting it in-object would be painful, which is why I though of using telemetry as that can emit data points to the external system. But we're wandering out of scope for this issue; perhaps @Savid could open a new issue to discuss tracing specifically? |
i've got a CLI for triggering a block event after validation, but its not on by default because the optimistic flag at that point is basically meaningless.. |
One option may be
I think the vagueness of the block event is why we currently have it 'out of order'. To figure out the optimistic flag, its emitted after import completes, which for us means fork choice has been run (and potentially a head event emitted). If there was an explicit It concerns me that these events are being measured, because theres nothing requiring that head events must happen after block events, and as you say, block events are not defined super well. Parts of the block event itself (the optimistic flag) does kind of imply an implementation. I guess arguably measuring validation and head events if we went this way would be a fairly accurate picture, regardless of client. |
I would vote to define the event strictly to be emitted after full validation (gossip, consensus, execution, and DA) plus append a precise timestamp of received time before full validation. Would be good to collect from clients what's the earliest timestamp the can capture (before gossip validation or after). |
Wouldn't it be simpler to use the delay since start of slot instead of a timestamp? |
Not sure if simpler, sounds mostly equivalent. For the purpose of measuring latency it is more straightforward but sounds less generic. Thoughts? @rolfyone @parithosh |
Is the key point block propagation, or is the key point processing time from client to client? what are we trying to measure? eg. if we get
that tells us when a block got imported by the client, so we'd know when things 'have' the block in their structures etc, but it doesn't tell us the block arrived at 0ms, and took until 1100ms to get processed.... if we get
then we see that the block arrived fine, took ages to validate, and 600ms until it was processed... or do we want something like
all of the events have overheads, so if we don't need them i suggest we don't add them, but if there's a good reason then we can add support. I just think we want a good idea of the problem we're solving before we look for a solution... The more we add, obviously the more it takes to implement - but if its actually needed then it might be worth it. my caution would be if theres a lot of listeners on these events it'll potentially have negative impacts on clients, especially if we add a ton of events that some clients need to emit synchronously or something... |
I mentioned this before, but I really don't think that the events output is the right place to contain this information. It would require some sort of continuous instrumentation from the P2P layer throughout the entire stack as the block was validated and imported, and the event could not be emitted until the processing was complete. Not only would this be a potentially large burden on client implementations, but it would somewhat subvert the point of the events stream being notifiers for external processes and turn it into a pseudo-tracing system. Capturing the timestamp at which an item is pushed to the events system is a single metric that shouldn't be hard to implement (I'm making some assumptions here, but I think they're reasonable). I'd rather that we nailed down the trigger for the block event here, and worry about telemetry elsewhere. (I'm still voting for 2 as per the list of suggestions by @dapplion, it makes most sense to me as to be the point at which we are interested in knowing that a block has been seen by the node). |
so that |
Using @dapplion 's categorisation, and focusing on block diffusion analysis, having a Other events and timestamps might be useful for other uses. |
The most immediate usecase is researches measuring block propagation through networks to inform EIP-4844 params. That's the best tool they got before having to develop full libp2p clients, on man in the middle attacks. They perform analysis on a per implementation basis, so each client being slightly divergent is not a problem, but each client should attempt to provide as best data as possible. |
Actually it would be good to just know what events different clients expose currently. |
This isn't the main purpose of the events stream, though. If we're trying to track this information then I think that this is better off coming from telemetry rather than events. |
I agree completely, this isn't really a great way to measure propagation, especially if you want timely data. |
Tagging ethpandaops since they are the main users of events for timing data: @samcm @parithosh |
Yeah we're probably the ones that would most benefit from timing data being added to the event stream. I agree that it's not the right place for us to derive this info - telemetry/tracing makes way more sense. But it's all that we have at the moment, and is the only option that works with all client implementations. Even though we can't compare data between clients, comparing between the same client is still useful. If we want to discourage this use-case from the events stream that's fine, but I would suggest that we try and find another unified and standardized way for clients to produce telemetry. We've seen a growing demand for this data over the last 12 months, and I expect it will continue to grow as we work on protocol upgrades. If we leave this up to clients to implement independently it'll be a nightmare for consumers 🙏 |
If that's what we want, maybe a block_valid with a timestamp that validation completed would suffice, and would need to define the valid term, id probably suggest when it has passed gossip validation? Then when gossip validation finishes you'd get an event that just has basically the block root and timestamp... |
Going through the discussions above, it sounds like we are interested to track:
It would be useful to understand the consumers of these events - I'd assume app layers / L2s would be interested in only seeing valid block that are imported (2)? and (1) is useful mainly for measuring propagation, i.e. potentially for devs, ops, research etc. If we only do (2) above, is it less useful for tracking block/blob arrivals? are potentially invalid block useful? or are we just interested in the block arrival time in general? |
This was the original intention of the block event.
This is close to the original intention of the head event (with the caveat that it was a) after state had been updated and b) only if the block resulted in a new chain head). Depends a little on what you mean by "fully validated", because it appears that for some beacon nodes the validation is wrapped up in the state update function. Depending on the beacon node, "fully validated" and "state updated" may or may not be different times. If generally not, then there may be little point having 2) as a separate event. |
The problem i guess with the block event being 'validated' is that optimistic flag, which we know for sure once its imported, but at validation time we can only report if the head block is optimistic. |
Late recap from ACDC 124 ethereum/pm#922
Does not seem we have a well defined versioning strategy for SSE events. The Another point brought up in the call was adding a Adding a timestamp property seems to invite the addition of more timestamps at different stages of the block lifetime. One can make an argument that the post gossip validation is the most valuable, but still something to consider. Similarly, if we add the Next stepsLooking at the pros / cons I would:
|
The SSE block event is used currently to measure block propagation times. This data is critical to inform decisions like blob count and overall network health. In consensus clients blocks go through a few phases:
I believe both 2. and 3. are equally important event that convey different information. The API spec is vague such that you could interpret as having to emit the event on either 2. or 3.
beacon-APIs/apis/eventstream/index.yaml
Lines 50 to 51 in 8b52d94
After deneb, the gap between 2. and 3. will be widen since clients must wait for arrival of all blobs. This will prevent researchers from timing block arrival times correctly.
I propose to have two separate events properly defined:
block
: The node has imported a valid block into its fork-choice, received from P2P or APIblock_gossip
: The node received a block from gossip that pass gossip validity conditions of thebeacon_block
topicCC: @parithosh
The text was updated successfully, but these errors were encountered: