Control Plane rollout plan #422

morgsmccauley · 2023-11-21T02:16:52Z

All-in-one Release

This approach is completely manual, with all indexers being migrated in one go.

Write last_published_block to Redis from Coordinator V1, allowing Coordinator V2 to "Start from interruption".
Stop Coordinator V1
Wait for existing Redis Streams to Drain
Switch Runner to 'Control Plane' mode, exposing an RPC endpoint for Coordinator V2 to connect to, and preventing Executors from being started implicitly via the Redis Streams.
Start Coordinator V2

Advantages	Disadvantages
Doesn't require any additional changes	No way to iteratively test as Indexers are migrated all at once Large blast radius/potential for many things to go wrong Manual process Rollback becomes hard to achieve as the previous infrastructure has been stopped

Staged Release with automatic stream migration

With this option we introduce an deny/allowlist in Redis. Indexers can be added to this list progressively creating a staged migration. Runner will need to be updated to provide control over existing executors, i.e. Coordinator should be able to stop executors which have been started via the Redis streams set. It will therefore need to use a combined StreamHandlers list.

Write last_published_block to redis from Coordinator V1
Introduce deny list within Coordinator V1, all indexers specified within this list will be ignored, i.e. blocks will no longer be pushed to Redis
Introduce allow list within Coordinator V2, containing the same set of indexers as the deny list above
For each indexer in the allow list, Coordinator V2 will do the following:
1. Remove the indexer from the streams Redis Set, preventing Runner from starting it again
2. Stop the current historical/real-time executors
3. Move all messages from the historical/real-time streams to a new single stream
4. Using the stream above, start a Block Stream from last_published_block, and start the corresponding Executor
5. Set a flag in Redis to avoid running this process again, which can also be used to track which indexers have been migrated

Advantages	Disadvantages
Fully automated process Can limit the migration to a subset of indexers, allowing for iterative testing	Additional changes (which are mostly simple) required to enable this automation

Staged Release with passive stream migration

This option is mostly the same as the above, but instead, we wait for the stream to drain naturally before migrating to the new system. This approach is simpler, but also more error prone.

Write last_published_block to redis from Coordinator V1
Introduce deny list within Coordinator V1 as above
Introduce allow list within Coordinator V2
For each indexer within the allow list, Coordinator V2 will do the following:
1. Remove the indexer from the streams Redis Set
2. Monitor the length of the current historical/real-time Redis Streams, once the length reaches 0;
3. Using a new Stream, start the Block Stream from last_published_block as well as its corresponding Executor

Advantages	Disadvantages
Mostly automated process Can limit the migration to a subset of indexers, allowing for iterative testing	Additional changes required, but less compared to the above "Broken" Indexers will never be migrated as their Streams will not drain

The text was updated successfully, but these errors were encountered:

morgsmccauley · 2024-01-11T01:40:35Z

Rollback/forward Strategy

With either Staged approach we will roll-forward with fixes. As we have control of the release cadence, we can identify/fix issues in a contained manner, i.e. with our own test indexers. This allows us to build up the confidence to release to all indexers, hopefully minimising the impact.

With the All-in-one release we will need some form Rollback strategy, which becomes complex, making this approach undesirable.

morgsmccauley · 2024-01-24T03:08:11Z

To be implemented in: #520

morgsmccauley mentioned this issue Nov 21, 2023

🔷 [Epic] Indexer Lifecycle Management #396

Closed

morgsmccauley changed the title ~~Rollout Control service~~ Control Plane rollout plan Nov 21, 2023

morgsmccauley self-assigned this Jan 11, 2024

morgsmccauley mentioned this issue Jan 15, 2024

Selectively start indexers #501

Closed

morgsmccauley closed this as completed Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control Plane rollout plan #422

Control Plane rollout plan #422

morgsmccauley commented Nov 21, 2023 •

edited

Loading

morgsmccauley commented Jan 11, 2024

morgsmccauley commented Jan 24, 2024

Control Plane rollout plan #422

Control Plane rollout plan #422

Comments

morgsmccauley commented Nov 21, 2023 • edited Loading

All-in-one Release

Staged Release with automatic stream migration

Staged Release with passive stream migration

morgsmccauley commented Jan 11, 2024

Rollback/forward Strategy

morgsmccauley commented Jan 24, 2024

morgsmccauley commented Nov 21, 2023 •

edited

Loading