-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring event Indexing and simplifying the ETH events RPC flow (and fix a bunch of known issues with it) #12116
Comments
Some initial thoughts:
|
Oh, and further to that last point, we really do need GC on this thing if we're going to make it a necessary add-on. Maybe then it becomes much less an issue for people to turn on. If it GCs in time with splitstore and you only ever have less than a week's worth of events then there's less to be concerned about. I have a 33G events.db that's been collecting since nv22. Those who have been running it since FEVM must have much larger databases. |
|
@rvagg Any thoughts on how to implement those periodic consistency checks ?. In my mind, can be as simple as "My head is now epoch E -> so E - 900 is now final -> fetch the messages for it from the state store -> match it with what we have in the event Index -> raise alarm if mismatch". This can be a go-routine in the indexer itself. |
|
"I want this to work with the native APIs with minimal hackiness, so I'm not a fan of "interposing" the events sub-system between subscription to, e.g., chain notify events and the client" Wdym here ? Please can you elaborate a bit ? Which hackiness are you referring to ? I am saying that the native ETH RPC Event APIs should subscribe to Index DB stream to listen in for updates and forward them to the client (querying the DB if needed). IMO, the events subsystem still needs some way to "block" on a GetLogs call. With this design, this is just a matter of seeing an update event from the Index DB whose height is greater than the |
Checking at head-900 is what I was thinking, make big noise in the logs if there's a consistency problem. It's the kind of thing we could even remove in the future if it helps us build confidence. |
We would use both options in production if they were available. We often times use two completely different code bases to track the head of a chain vs return historical results. We then use reverse proxies and some chain aware logic to route the requests. Similarly, supporting offline historical block processing is super useful. |
Dropping in some links of relevant docs so I have somewhere to record this:
Mostly interesting for the references to backfilling because I think the way we currently do it is a bit broken. lotus-shed backfills directly into the sqlite db file that is being used read/write by the node which lotus-shed has to get message data from to do the backfills. We need some way of ensuring a single writer to the db, either by building in backfilling into lotus itself; or having some switch you can flick to make this work; or documenting a process (turn indexing off while you do it, turn it on when you're done [which isn't even a great answer if you think thorough the racy nature of that]). |
Thoughts on what to do on startup to auto-backfill missing events:
Some judicious logging would be good for this too, "took 15 minutes to check 100,00 (MaxBackfillEpochsCheck) epochs for events auto-backfill, backfilled 0 of them; consider adjusting MaxBackfillEpochsCheck to a smaller number." (e.g. if the user set it large and we are doing useless checking on each startup because they got it done the first time). An alternative approach to achieve "when you first do this, go back a loooong way, but on subsequent startups only go back to finality" might be to have both a Also to watch out for: using splitstore you're going to run into the problem of not having state at some epoch, we have to handle the case where the |
Mostly covered by ChainIndexer, not entirely but enough to close this out. We can open more targeted issues for items that come up as a concern. |
Checklist
Ideas
.Lotus component
What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.
The current Chain Notify <> Event Index <> Event Filter Management <> ETH RPC Events API is racy, causes missed events, is hard to reason about and has known problems such as lack of automated backfilling, returning empty events for tipsets on the cannonical chain even though the tipset has events, not using the event Index as the source of truth etc. This issue aims to propose a new architecture/flow for event indexing and filtering to fix all of the above.
Describe the solution you'd like
EthGetLogs
,EthGetFilterChanges
andEthGetFilterLogs
anyways, I'd posit that we can really simplify and streamline things if these along with theEthSubscribe
API use the event Index as the source of truthSee
lotus/chain/events/filter/event.go
Line 333 in 6f821c3
lotus/chain/events/filter/event.go
Line 340 in 6f821c3
EthGetFilterLogs
andEthGetFilterChanges
APIs that are supposed to return the events for a given filter since the last time it was polled. TheEthSubscribe
andEthGetLogs
APIs don't even need these buffersEthGetFilterLogs
andEthGetFilterChanges
APIs, the way it works is that when the filter is first created, we "prefill" it with matching events from the Index DB and then rely on the buffer getting updated with events from tipset updates sent byChainNotify
. Every time the client polls these APIs, we return what we have in the buffer -> empty out the buffer -> start again (again relying solely on tipset updates -> the event index is no longer in the picture)I'd suggest the following refactor to fix all of the above and make this code easy to reason about
The Event Index DB becomes the source of truth (effectively addressing Events source of truth: db or receipts #11830 )
Tipset updates (applies and reverts) coming from
ChainNotify
get written to a channel that the Event Index consumes from and applies event updates to the Index DB linearly one at a time(to not race between applies and reverts) -> there is no dual write to the filter buffersEvery update applied to the Event Index DB has a monotonically increasing ID
For a tipset with no events, the Event Index is updated with an empty event entry field ("I've seen this tipset but it has no events")
The Event Index allows subscription to an event stream containing the updates it makes to the DB
On subscribing to this stream("index stream"), a client gets the latest update made to the Index DB immediately and from there on, every subsequent Index update is published to the subscriber
Now here's how all the ETH RPC APIs work:
EthSubscribe
-> subscribes to the "index stream" -> forwards these updates to the RPC client's channelEthGetLogs
-> subscribes to the "index stream" -> keeps consuming till it sees an update containing the maxheight requested by the user -> queries the DB for events matching the client's filter -> sends out the resultsEthGetFilterLogs
andEthGetFilterChanges
-> filter subscribes to the "index stream" -> keeps updating it's "last seen Index update ID"
state
When the Index boots up, it looks at it's own "highest non reverted tipset" -> looks at the current head of the chain as per ChanNotify -> fills in the all the missing updates in between ("automated backfilling") and only then processes tipset updates and "index stream" subscriptions.
The text was updated successfully, but these errors were encountered: