refactor: Merge sync logic to provide fine-grained life-cycle control #762

morgsmccauley · 2024-05-31T03:49:34Z

To synchronise Block Streams/Executors we compare the latest registry with the rest of the system. Components are started/stopped as needed, but we lack control over life-cycle events such as registration/deletion. As we build towards Coordinator managing provisioning, we need a way to hook in to these life-cycle events, so we can provision/de-provision accordingly.

This PR refactors the current synchronisation logic to provide greater control over these life-cycle events. Existing Block Stream/Executor sync logic have been merged to to a single Synchronisation struct, which now manages new/existing/deleted indexers as a whole, providing the required life-cycle hooks described above. Merging the sync logic allows us to control when Block Streams/Executors are started/stopped, allowing us to do things before and after, specifically, de-/provisioning. I wanted to do this refactor first to make reviewing easier, I'll follow this up with the provisioning changes next.

To achieve the above, each Indexer now has it's own IndexerState object in Redis. After synchronising a newly registered Indexer we write its state object, and after removing it, we delete the state object. This means we have the following states, allowing us to handle each accordingly:

Non-existent State/Existing Config - New Indexer
Existing State/Exisiting Config - Existing Indexer
Exisiting State/Non-existent Config - Deleted Indexer

The synchronisation logic is essentially the same, expect that now it is driven off the persistent Redis state, rather than the current Block Stream/Executors state.

This reverts commit b25685b.

morgsmccauley · 2024-06-11T04:46:31Z

coordinator/src/block_streams_handler.rs

-            .map_err(|e| {
-                tracing::error!(stream_id, "Failed to stop stream\n{e:?}");
-            });
+            .context(format!("Failed to stop stream: {stream_id}"))?;


Propagate the error up, don't swallow, so we can handle it in Synchroniser.

morgsmccauley · 2024-06-11T04:47:20Z

coordinator/src/indexer_config.rs

@@ -13,6 +13,25 @@ pub struct IndexerConfig {
    pub created_at_block_height: u64,
 }

+#[cfg(test)]
+impl Default for IndexerConfig {


Got sick of continuously writing this in tests - only available in test builds as shouldn't be used in release builds.

morgsmccauley · 2024-06-11T04:56:15Z

coordinator/src/indexer_state.rs

@@ -48,14 +54,65 @@ impl IndexerStateManagerImpl {
        Self { redis_client }
    }

+    pub async fn migrate(&self, registry: &IndexerRegistry) -> anyhow::Result<()> {


This serves two purposes:

Migrating from OldIndexerState to the new IndexerState, and

Ensuring all state objects are added to the Redis Set (handled via set_state())

We need 2. to be able to list current IndexerStates, we'd need to scan the DB otherwise which is not performant.

This migration could leave us in a bad state as we use the "existence" of the state object to signify it's SynchronisationState, i.e. new/existing. By using the current registry to update the exisiting states we may end up writing state for an Indexer which was registered but not yet synchronised (i.e. registered while Coordinator was down). We would then incorrectly treat this Indexer as "existing" rather than "new". At this point, this isn't necessarily a problem as the new/existing flows aren't too different, but this is handled in the synchroniser.sync_existing_block_stream() method which I'll call out.

morgsmccauley · 2024-06-11T04:58:04Z

coordinator/src/indexer_state.rs

+    // FIX `IndexerConfig` does not exist after an Indexer is deleted, and we need a way to
+    // construct the state key without it. But, this isn't ideal as we now have two places which
+    // define this key - we need to consolidate these somehow.
+    pub fn get_state_key(&self) -> String {


As described in the comment, this isn't ideal, I'll need to re-think this but didn't want to bloat the current PR.

Should Config and State share some interface to guarantee certain fields, which can constitute the key? Such as accountID and functionName. By state key, we really mean the key value in the set right?

Yup, but we can't do exactly that in Rust.

My current thoughts are to have a common trait, probably RedisKeyProvider with some default methods provided which generate the keys using account_id and function_name. But we can't require fields, we can only require methods, so best we can do is have methods for each of those. Which does still leave room for error but I think is both obvious enough to not be abused, and is better than this :)

By state key, we really mean the key value in the set right?

Yes, that's right

morgsmccauley · 2024-06-11T05:00:56Z

coordinator/src/synchroniser.rs

@@ -0,0 +1,1186 @@
+use registry_types::StartBlock;


This file is huge, but it's mostly tests.

morgsmccauley · 2024-06-11T05:01:42Z

coordinator/src/synchroniser.rs

+        }
+
+        // FIX if this fails, then subsequent control loops will perpetually fail since the
+        // above will error with ALREADY_EXISTS


Will handle this in a follow up

morgsmccauley · 2024-06-11T05:02:52Z

coordinator/src/synchroniser.rs

+        if state.block_stream_synced_at.is_none() {
+            // NOTE: A value of `None` would suggest that `state` was created before initialisation,
+            // which is currently not possible, but may be in future
+            tracing::warn!("Existing block stream has no previous sync state, treating as new");


As mentioned above with the migration, this is the new/existing case we need to handle. The migration may end up with new indexers being treated as existing, but we'd know if that's the case if block_stream_synced_at is None.

darunrs

Super cool work! I think the existing synchronization logic is more readily understood now and feels like block streams and executors are more logically tied together now.

darunrs · 2024-06-12T00:17:12Z

coordinator/src/synchroniser.rs

+                    state,
+                    executor.cloned(),
+                    block_stream.cloned(),
+                ))


If a return is used in a for loop, does the for loop just continue on instead of breaking and returning? My impression here is that we should be continuing through the loop as we have more work to do. The return here is confusing me a bit regarding that.

Oh I didn't even notice this, I'm assuming you are talking about the lack of ;?

In the context of a for loop, the implicit return value is ignored, so in this case the loop will continue. If I were to add return it would break the loop.

darunrs · 2024-06-12T00:36:49Z

coordinator/src/indexer_state.rs

+    // FIX `IndexerConfig` does not exist after an Indexer is deleted, and we need a way to
+    // construct the state key without it. But, this isn't ideal as we now have two places which
+    // define this key - we need to consolidate these somehow.
+    pub fn get_state_key(&self) -> String {


Should Config and State share some interface to guarantee certain fields, which can constitute the key? Such as accountID and functionName. By state key, we really mean the key value in the set right?

darunrs · 2024-06-12T00:39:46Z

coordinator/src/main.rs

@@ -63,27 +72,11 @@ async fn main() -> anyhow::Result<()> {

    loop {
        let indexer_registry = registry.fetch().await?;
+        indexer_state_manager.migrate(&indexer_registry).await?;


This only needs to be successful once right? Once it is done, we just remove it? I'm assuming you have it in the loop so you can manually resolve any problems while Coordinator is up.

Good point, yeah, this doesn't even need to be in the loop there isn't actually any benefit it being there 😅 I'll move it.

darunrs · 2024-06-12T00:44:03Z

coordinator/src/main.rs

-                Ok(())
-            }
-        )?;
+        tokio::try_join!(synchroniser.sync(), async {


It seems like all sync tasks are designed to succeed and instead just log errors. How do we enable visibility of sync errors beyond manually checking logs?

Do you mean like Grafana? We'd hook in to those Errors directly. Eventually I think we'd want to short-circuit the sync for that Indexer, store some state, and then act accordingly. But we're not quite there yet.

darunrs · 2024-06-12T00:46:50Z

coordinator/src/synchroniser.rs

+        }
+
+        self.state_manager.delete_state(state).await?;
+


Is this where we would call resource clean up? Since Block Streamer is technically responsible for creating the redis block height stream, who is going to delete the redis stream? This would be more important as well when we migrate to storing blocks on redis instead of heights.

Coordinator should delete the stream, just like it does when we republish the Indexer. The Redis stream is created implicitly when it is xadded to, Block Streamer doesn't explicitly create it.

morgsmccauley changed the base branch from main to feat/data-layer-management May 31, 2024 03:50

morgsmccauley linked an issue May 31, 2024 that may be closed by this pull request

Handle provisioning within Coordinator #760

Closed

Base automatically changed from feat/data-layer-management to main June 9, 2024 21:25

morgsmccauley added 9 commits June 10, 2024 09:28

Revert "chore: Remove notes"

c0a89fd

This reverts commit b25685b.

refactor: Remove unused migration code

a90e416

feat: Create merged synchroniser

75c53db

feat: Start new indexer

44fb56d

feat: Handle synced/outdated indexers

6703248

feat: Sync existing block streams

dec56d7

test: Test reconfiguration of block streams

f6f14f9

feat: Update state after synchronisation

da19ba9

feat: Handle sync failures on new indexers

9c595c8

morgsmccauley force-pushed the feat/coordinator-provisioning branch from cb11bad to 9c595c8 Compare June 9, 2024 21:30

morgsmccauley added 14 commits June 10, 2024 11:29

feat: Stop diabled indexers

54ca2c1

refactor: Separate sorting logic from acting logic

c96ab4f

feat: Migrate indexer redis state

6a35eb6

feat: List indexer states

622f8e3

fix: Correctly list indexer states

709b6ed

fix: Ensure iterator is not consumed on first loop

160e8de

refactor: Handle disabled indexers

c6bb492

fix: Handle block streams with no last published block

95cb079

fix: Dont swallow start executor errors

b5e3327

chore: Add some notes

536a717

refactor: Use new synchronisation logic

9355431

fix: Treat existing/unsynced block streams as new

834ae8a

fix: Create iter everytime to avoid missing elements

b3ceb36

feat: Handle deleted indexer

fc7459b

morgsmccauley force-pushed the feat/coordinator-provisioning branch 2 times, most recently from b0824b0 to 3a31c15 Compare June 11, 2024 03:57

chore: Remove notes

9eb9893

morgsmccauley added 2 commits June 11, 2024 16:00

refactor: Remove old synchronisation code

a7ec431

feat: Add error context to executors_handler

d57ff58

morgsmccauley force-pushed the feat/coordinator-provisioning branch from 35c2811 to d57ff58 Compare June 11, 2024 04:01

test: Fix indexer state tests

9e6d16f

morgsmccauley changed the title ~~Feat/coordinator provisioning~~ refactor: Merge sync logic to provide greater life-cycle control Jun 11, 2024

morgsmccauley added the component: Coordinator label Jun 11, 2024

morgsmccauley changed the title ~~refactor: Merge sync logic to provide greater life-cycle control~~ refactor: Merge sync logic to provide fine-grained life-cycle control Jun 11, 2024

chore: Remove print statement

84d1495

morgsmccauley commented Jun 11, 2024

View reviewed changes

chore: Add fix note

66c232d

morgsmccauley commented Jun 11, 2024

View reviewed changes

refactor: Use re-exports

3eb1be7

morgsmccauley commented Jun 11, 2024

View reviewed changes

morgsmccauley marked this pull request as ready for review June 11, 2024 05:04

morgsmccauley requested a review from a team as a code owner June 11, 2024 05:04

darunrs approved these changes Jun 12, 2024

View reviewed changes

morgsmccauley added 2 commits June 12, 2024 14:08

refactor: Move migration out of control loop

9186698

feat: Add deletion logs

5ddd38a

morgsmccauley merged commit 0f3b9d2 into main Jun 12, 2024
4 checks passed

morgsmccauley deleted the feat/coordinator-provisioning branch June 12, 2024 02:50

morgsmccauley mentioned this pull request Jun 18, 2024

Prod Release 18/06/24 #806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Merge sync logic to provide fine-grained life-cycle control #762

refactor: Merge sync logic to provide fine-grained life-cycle control #762

morgsmccauley commented May 31, 2024 •

edited

Loading

morgsmccauley Jun 11, 2024

morgsmccauley Jun 11, 2024

morgsmccauley Jun 11, 2024 •

edited

Loading

morgsmccauley Jun 11, 2024

darunrs Jun 12, 2024

morgsmccauley Jun 12, 2024

morgsmccauley Jun 11, 2024

morgsmccauley Jun 11, 2024

morgsmccauley Jun 11, 2024

darunrs left a comment

darunrs Jun 12, 2024

morgsmccauley Jun 12, 2024

darunrs Jun 12, 2024

darunrs Jun 12, 2024

morgsmccauley Jun 12, 2024 •

edited

Loading

darunrs Jun 12, 2024

morgsmccauley Jun 12, 2024

darunrs Jun 12, 2024

morgsmccauley Jun 12, 2024

refactor: Merge sync logic to provide fine-grained life-cycle control #762

refactor: Merge sync logic to provide fine-grained life-cycle control #762

Conversation

morgsmccauley commented May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgsmccauley Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darunrs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgsmccauley Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgsmccauley commented May 31, 2024 •

edited

Loading

morgsmccauley Jun 11, 2024 •

edited

Loading

morgsmccauley Jun 12, 2024 •

edited

Loading