Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control Plane rollout plan #422

Closed
Tracked by #396
morgsmccauley opened this issue Nov 21, 2023 · 2 comments
Closed
Tracked by #396

Control Plane rollout plan #422

morgsmccauley opened this issue Nov 21, 2023 · 2 comments
Assignees

Comments

@morgsmccauley
Copy link
Collaborator

morgsmccauley commented Nov 21, 2023

All-in-one Release

This approach is completely manual, with all indexers being migrated in one go.

  1. Write last_published_block to Redis from Coordinator V1, allowing Coordinator V2 to "Start from interruption".
  2. Stop Coordinator V1
  3. Wait for existing Redis Streams to Drain
  4. Switch Runner to 'Control Plane' mode, exposing an RPC endpoint for Coordinator V2 to connect to, and preventing Executors from being started implicitly via the Redis Streams.
  5. Start Coordinator V2
Advantages Disadvantages
  • Doesn't require any additional changes
  • No way to iteratively test as Indexers are migrated all at once
  • Large blast radius/potential for many things to go wrong
  • Manual process
  • Rollback becomes hard to achieve as the previous infrastructure has been stopped

Staged Release with automatic stream migration

With this option we introduce an deny/allowlist in Redis. Indexers can be added to this list progressively creating a staged migration. Runner will need to be updated to provide control over existing executors, i.e. Coordinator should be able to stop executors which have been started via the Redis streams set. It will therefore need to use a combined StreamHandlers list.

  1. Write last_published_block to redis from Coordinator V1
  2. Introduce deny list within Coordinator V1, all indexers specified within this list will be ignored, i.e. blocks will no longer be pushed to Redis
  3. Introduce allow list within Coordinator V2, containing the same set of indexers as the deny list above
  4. For each indexer in the allow list, Coordinator V2 will do the following:
    1. Remove the indexer from the streams Redis Set, preventing Runner from starting it again
    2. Stop the current historical/real-time executors
    3. Move all messages from the historical/real-time streams to a new single stream
    4. Using the stream above, start a Block Stream from last_published_block, and start the corresponding Executor
    5. Set a flag in Redis to avoid running this process again, which can also be used to track which indexers have been migrated
Advantages Disadvantages
  • Fully automated process
  • Can limit the migration to a subset of indexers, allowing for iterative testing
  • Additional changes (which are mostly simple) required to enable this automation

Staged Release with passive stream migration

This option is mostly the same as the above, but instead, we wait for the stream to drain naturally before migrating to the new system. This approach is simpler, but also more error prone.

  1. Write last_published_block to redis from Coordinator V1
  2. Introduce deny list within Coordinator V1 as above
  3. Introduce allow list within Coordinator V2
  4. For each indexer within the allow list, Coordinator V2 will do the following:
    1. Remove the indexer from the streams Redis Set
    2. Monitor the length of the current historical/real-time Redis Streams, once the length reaches 0;
    3. Using a new Stream, start the Block Stream from last_published_block as well as its corresponding Executor
Advantages Disadvantages
  • Mostly automated process
  • Can limit the migration to a subset of indexers, allowing for iterative testing
  • Additional changes required, but less compared to the above
  • "Broken" Indexers will never be migrated as their Streams will not drain
@morgsmccauley morgsmccauley changed the title Rollout Control service Control Plane rollout plan Nov 21, 2023
@morgsmccauley
Copy link
Collaborator Author

Rollback/forward Strategy

With either Staged approach we will roll-forward with fixes. As we have control of the release cadence, we can identify/fix issues in a contained manner, i.e. with our own test indexers. This allows us to build up the confidence to release to all indexers, hopefully minimising the impact.

With the All-in-one release we will need some form Rollback strategy, which becomes complex, making this approach undesirable.

@morgsmccauley
Copy link
Collaborator Author

To be implemented in: #520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant