-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handoff between real-time and historical processing #216
Comments
@ailisp would like to create a "sql join query" in query-api, but is unable to do so without Hasura foriegn-key relationships which he is unable to do as QueryAPI currently processes historical backfill in parallel with real-time blocks and in order to create forign-key relationships, you have to impose an order of insertions object X references object Y, but X can be inserted before Y that result in foreign key constraint violation. |
One way to solve this is for each indexer to have their own 'real time' block process, rather than the shared one we have now. This would make the handover much simpler, as it could just 'start' at the correct block height after historical has completed. |
This PR updates Coordinator such that **only one** Historical Backfill process can exist for a given Indexer Function. Currently, these processes do not have awareness of each other, meaning a developer can trigger multiple simultaneous backfills, with them all writing to the same Redis Stream. The major changes are detailed below. ## Streamer/Historical backfill tasks In this PR, I've introduced the `historical_block_processing::Streamer` struct. This is used to encapsulate the async task which is spawned for handling the Historical Backfill. I went with "Streamer", as I envision this becoming the single process which handles both historical/real-time messages for a given Indexer, as described in #216 (comment). `Streamer` is responsible for starting the `async` task, and provides a basic API for interacting with it. Right now, that only includes cancelling it. Cancelling a `Streamer` will drop the existing future, preventing any new messages from being added, and then delete the Redis Stream removing all existing messages. ## Indexer Registry Mutex/Async `indexer_registry` is the in-memory representation of the given Indexer Registry. Currently, we pass around the `MutexGuard` to mutate this data, resulting in the lock being held for much longer than is needed, blocking all other consumers from taking the lock. There isn't much contention for this lock, so it isn't a problem right now, but could end up being one in future. Adding in the `Streamer` `Mutex` made mitigating this problem trivial, so I went ahead and made some changes. I've updated the code so that we pass around the `Mutex` itself, rather than the guard, allowing us to the hold the lock for the least amount of time possible. This required updating most methods within `indexer_registry` to `async`, so that we could take the `async` `lock`. With the `Mutex` being used rather than the `MutexGuard`, it seemed sensible to add `indexer_registry` to `QueryAPIContext`, rather than passing it round individually.
Description
Problem:
Currently we do not coordinate between real-time and historical processing, indexers can run on blocks out of order.
See Confluence pages
See Real-time to Historical communication for architectural concerns.
Proposed Solutions:
Append real-time blocks to be processed to the processing queue of the historical thread.
After the move to Redis we could store a redis flag for each function for historical_backfill_in_progress . When this is set at the start of historical backfill, real-time processing can skip processing until both the flag is cleared and the historical queue is empty. Once the historical queue has been filled the flag can be cleared.
Linked Issues
Relates to
The text was updated successfully, but these errors were encountered: