Handoff between real-time and historical processing #216

Ishatt · 2023-09-26T19:48:50Z

Description

Problem:

Currently we do not coordinate between real-time and historical processing, indexers can run on blocks out of order.

See Confluence pages

See Real-time to Historical communication for architectural concerns.

Proposed Solutions:

Append real-time blocks to be processed to the processing queue of the historical thread.

After the move to Redis we could store a redis flag for each function for historical_backfill_in_progress . When this is set at the start of historical backfill, real-time processing can skip processing until both the flag is cleared and the historical queue is empty. Once the historical queue has been filled the flag can be cleared.

Linked Issues

Give feedback

🔷 [Epic] Complete migration to V2 and retire V1 #198

9 of 9

Epic
Options

Relates to

Give feedback

No tasks being tracked yet.

Options

roshaans · 2023-10-09T20:28:21Z

@ailisp would like to create a "sql join query" in query-api, but is unable to do so without Hasura foriegn-key relationships which he is unable to do as QueryAPI currently processes historical backfill in parallel with real-time blocks and in order to create forign-key relationships, you have to impose an order of insertions

object X references object Y, but X can be inserted before Y that result in foreign key constraint violation.

morgsmccauley · 2023-10-24T19:57:52Z

One way to solve this is for each indexer to have their own 'real time' block process, rather than the shared one we have now. This would make the handover much simpler, as it could just 'start' at the correct block height after historical has completed.

This PR updates Coordinator such that **only one** Historical Backfill process can exist for a given Indexer Function. Currently, these processes do not have awareness of each other, meaning a developer can trigger multiple simultaneous backfills, with them all writing to the same Redis Stream. The major changes are detailed below. ## Streamer/Historical backfill tasks In this PR, I've introduced the `historical_block_processing::Streamer` struct. This is used to encapsulate the async task which is spawned for handling the Historical Backfill. I went with "Streamer", as I envision this becoming the single process which handles both historical/real-time messages for a given Indexer, as described in #216 (comment). `Streamer` is responsible for starting the `async` task, and provides a basic API for interacting with it. Right now, that only includes cancelling it. Cancelling a `Streamer` will drop the existing future, preventing any new messages from being added, and then delete the Redis Stream removing all existing messages. ## Indexer Registry Mutex/Async `indexer_registry` is the in-memory representation of the given Indexer Registry. Currently, we pass around the `MutexGuard` to mutate this data, resulting in the lock being held for much longer than is needed, blocking all other consumers from taking the lock. There isn't much contention for this lock, so it isn't a problem right now, but could end up being one in future. Adding in the `Streamer` `Mutex` made mitigating this problem trivial, so I went ahead and made some changes. I've updated the code so that we pass around the `Mutex` itself, rather than the guard, allowing us to the hold the lock for the least amount of time possible. This required updating most methods within `indexer_registry` to `async`, so that we could take the `async` `lock`. With the `Mutex` being used rather than the `MutexGuard`, it seemed sensible to add `indexer_registry` to `QueryAPIContext`, rather than passing it round individually.

Ishatt assigned gabehamilton Sep 26, 2023

gabehamilton assigned morgsmccauley and unassigned gabehamilton Oct 3, 2023

bucanero mentioned this issue Oct 23, 2023

Go through obstacles that DevHub indexer developer had and list things to add to documentation #282

Closed

morgsmccauley mentioned this issue Nov 7, 2023

feat: Cancel historical backfill process before starting anew #363

Merged

morgsmccauley mentioned this issue Nov 21, 2023

Created dedicated block stream per Indexer #418

Closed

pkudinov closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handoff between real-time and historical processing #216

Handoff between real-time and historical processing #216

Ishatt commented Sep 26, 2023 •

edited

Loading

Linked Issues

Relates to

roshaans commented Oct 9, 2023

morgsmccauley commented Oct 24, 2023 •

edited

Loading

Handoff between real-time and historical processing #216

Handoff between real-time and historical processing #216

Comments

Ishatt commented Sep 26, 2023 • edited Loading

Linked Issues

Relates to

roshaans commented Oct 9, 2023

morgsmccauley commented Oct 24, 2023 • edited Loading

Ishatt commented Sep 26, 2023 •

edited

Loading

morgsmccauley commented Oct 24, 2023 •

edited

Loading