You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Historical streamer messages are not fetched in coordinator prior to the IndexerRunner call. As a result, we cannot apply the latency saving benefits of having coordinator cache the streamer message for use by runner. Instead, we want to pre fetch from S3 so that runner invocations don't have to wait for the streamer message to be retrieved from S3. Instead, these messages will be pre fetched and awaited in an array to ensure in order processing of block heights.
Below is the current flow for historical processing:
Coordinator retrieves timestamp for the block height that historical processing is starting from as well as current block height. It uses this to see which days to look for index files.
Index files are fils generated for each day an indexer is active. They contain information such as block heights which the particular indexer function was applied for. Coordinator fetches each index file available for the indexer starting from the day the starting block height falls into.
Coordinator parses the block heights from the file and puts them in the historical redis stream. This is where the divergence between real time and historical lies. Real time does not have any index files so it reads the streamer message from S3 to get data including block height. This block height is put into the real time stream.
Runner reads the block height from the historical stream. It pulls the streamer message from S3, parses it, and uses it for execution. This leads to each invocation taking at least 200ms if not more. I've seen as high as 700ms in a sample size of 20 invocations. 99th percentile might be much higher.
Below is the new workflow:
Coordinator functionality remains the same.
In runner, fetch X blocks from S3 as a promise.
Load the promises into an array, which is used as a queue.
Delete the block height from the stream, for each block height successfully placed on queue.
Await the first block in the queue. Upon completion of the promise, trigger the function call and pass in the loaded data.
I've also made it so that real-time also uses prefetch mechanism on top of the existing caching.
While an indexer function is running, several other blocks are being loaded simultaneously. For each loop, we ensure the array is as full as possible. This ensures few functions are waiting for the block instead of all of them.
The content you are editing has changed. Please copy your edits and refresh the page.
Historical streamer messages are not fetched in coordinator prior to the IndexerRunner call. As a result, we cannot apply the latency saving benefits of having coordinator cache the streamer message for use by runner. Instead, we want to pre fetch from S3 so that runner invocations don't have to wait for the streamer message to be retrieved from S3. Instead, these messages will be pre fetched and awaited in an array to ensure in order processing of block heights.
Below is the current flow for historical processing:
Below is the new workflow:
I've also made it so that real-time also uses prefetch mechanism on top of the existing caching.
While an indexer function is running, several other blocks are being loaded simultaneously. For each loop, we ensure the array is as full as possible. This ensures few functions are waiting for the block instead of all of them.
Tasks
The text was updated successfully, but these errors were encountered: