Add a first version of the key concepts (#218)

input-output-hk · Nov 8, 2023 · 6b4d045 · 6b4d045
1 parent e5004fd
commit 6b4d045
Show file tree

Hide file tree

Showing 2 changed files with 292 additions and 46 deletions.
diff --git a/doc/read-the-docs-site/architecture/key-concepts.rst b/doc/read-the-docs-site/architecture/key-concepts.rst
@@ -1,6 +1,259 @@
-Key concepts
-============
+Marconi Key concepts
+====================
 
-We'll explain concepts such as ``Indexer``, ``Coordinator``, ``Worker``, ``Transformer``, ``Preprocessor``, ``Queryable``, ``IsSync``, ``IsIndex``, ``chain-sync protocol``, etc.
+We introduce here the key concepts of Marconi, how they are related and how they
+map to the architecture.
 
-Coming eventually!
+Indexer
+-------
+
+The key component of Marconi is an ``indexer``.
+An indexer can be any type that implements the ``IsIndex`` typeclass.
+The definition of ``IsIndex`` is (very close to) this one:
+
+.. code-block:: haskell
+
+   class (Monad m) => IsIndex m event indexer where
+
+     index
+       :: (Eq (Point event))
+       => Timed (Point event) (Maybe event)
+       -> indexer event
+       -> m (indexer event)
+
+    -- …
+
+    rollback :: (Ord (Point event)) => Point event -> indexer event -> m (indexer event)
+
+Let's discuss the type variables first:
+
+- ``m`` is the monadic context of the indexer.
+- ``event`` defines the input type of the indexer, we usually store these
+  events, but an indexer can also decide to process them and store a
+  different type.
+- ``indexer`` maintains the indexer state and provide the necessary information
+  to perform the indexing operations.
+
+The two key operations that we have to deal with are ``index`` and ``rollback``.
+In a blockchain context, we call ``index`` when a new block comes in order to
+index the event that corresponds to that block. ``rollback`` is called when the
+node emit a rollback: some of the indexed blocks aren't aligned with the
+blockchain consensus and we need to unindex them.
+
+``index`` takes a ``Timed (Point event) (Maybe event)`` and the indexer that
+provides the indexing context. ``Point event`` is a type family that defines how
+we identify the point in time of the block that produces the `event`.
+In the context of Cardano, a ``Point event`` will almost always be a
+``ChainPoint``: the hash of the block (to identify a block)
+and the slot at which the block was issued.
+The ``event`` is wrapped in a ``Maybe`` to express the possibility that a block
+may not contain any relevant information for a given indexer. It still important
+to index a block with no relevant event to keep track of the progress of an
+indexer. We'll come back on this later.
+
+``rollback`` takes a point in time. This is the point to which we must rollback
+the indexer. It means that we must put the indexer back to the state it was when
+we indexed the block at the given point.
+
+``IsIndex`` provides other functions such as ``indexAll`` and
+``indexAllDescending`` that allows indexing of a list of events.
+A default implementation is provided for these functions, and they can be
+overriden to provide a more efficient implementation.
+
+It also provides a ``setLastStablePoint`` method that allows the indexer to keep
+track of the last stable point progress and, if needed, to react accordingly.
+
+Coordinator and Workers
+-----------------------
+
+A common indexing scenario is to run several indexers in parallel to index
+different parts of a same event. Ideally we want to organise our indexers as
+a tree, each node doing a part of the processing before propagating the
+resulting events, and the leaves being indexers that store the relevant part of
+these events.
+
+For example, you may want to have one subtree that focuses on the block
+information, while another focuses on the transaction bodies,
+or any other split that sounds relevant for your business logic.
+
+In Marconi, the indexers can be grouped thanks to a ``Coordinator``.
+A coordinator is a special type of indexer that propagates incoming actions
+(index, rollback) to a list of indexers, and handles the lifecycle of these
+indexers.
+
+As Haskell is strongly typed, and as the indexer that we want to coordinate
+may have different types, we need a way to handle an heterogeneous list of
+indexers.
+
+There are many different ways to deal with heterogeneous lists in Haskell. For
+the coordinator, we decided to wrap the indexer in a type that hides the
+implementation details of an indexer, a ``Worker``.
+The type of a worker (slightly simplified) is the following:
+
+.. code-block:: haskell
+
+   data Worker input point = forall indexer event n.
+     ( IsIndex n event indexer
+     , Point event ~ point
+     ) =>
+     Worker
+     { workerName :: Text
+     -- ^ use to identify the worker in logs
+     , workerState :: MVar (indexer event)
+     -- ^ the indexer controlled by this worker
+     , transformInput :: Preprocessor (ExceptT IndexerError m) point input event
+     -- ^ adapt the input event givent by the coordinator to the worker type
+     , hoistError :: forall a. n a -> ExceptT IndexerError IO a
+     -- ^ adapt the monadic stack of the indexer to the one of the worker
+     }
+
+You don't need to understand this type declaration in details. Here are the
+important bits:
+
+- We hide the indexer representation under the worker type. Any concrete indexer
+  type will work as long as it implements the ``IsIndex`` typeclass.
+- The indexer is put in an ``MVar`` to allow access to it from other threads.
+  We'll come back to it later.
+- We provide a preprocessor, which can be seen as a functon that transforms
+  the input send by the coordinator into events.
+- ``hoistError`` ensures that we know how to translate the base monad of the
+  indexer into ``ExceptT IndexerError IO`` which is the base monad of a worker.
+
+Once we have a list of workers, we can use it to create a coordinator using the
+``mkCoordinator`` function.
+This function inialises each worker, creating a dedicated thread for each worker
+where the worker will wait for incoming action to perform and notify the
+coordinator when the action is performed.
+
+The coordinator monitors each of the threads and, if one of the worker encounter
+an error, it will try to close all the other workers nicely.
+
+``Coordinator`` itself implements ``IsIndex`` and thus we can itself be wrapped
+in a worker.
+Thanks to it, we can create a whole hierarchy of indexers that can control from
+a main coordinator.
+
+
+Preprocessor
+------------
+
+We saw in the coordinator and workers section that workers take a preprocessor.
+As stated in this section, the preprocessor type can be viewed as a stateful
+function that transform the action sent to an indexer.
+It's type is isomorphic to:
+
+.. code-block:: haskell
+
+   StateT s m ([ProcessedInput point a] -> [ProcessedInput point b])
+
+It's the first time we encounter ``ProcessedInput`` so it is worth going through
+its definition:
+
+
+.. code-block:: haskell
+
+   data ProcessedInput point event
+     = Rollback point
+     | Index (Timed point (Maybe event))
+     | IndexAllDescending (NonEmpty (Timed point (Maybe event)))
+     | StableAt point
+     | Stop
+
+It is mostly a reified version (functions expressed as data) of most of the
+``IsIndex`` functions plus a ``Stop`` construct that allows us to stop a worker.
+So the goal of a preprocessor is to take a list of actions that must be sent to
+a worker and to transform this list.
+It can be either to filter out some actions or to add some actions to the list,
+based on the internal state ``s`` of the preprocessor.
+
+Transformers
+------------
+
+Indexer transformers (or, shorter, transformers) are used to alter the behaviour
+of an existing indexer.
+The name comes from the monad transformers concept, and it was chosen because
+monad and indexer transformers have similarities, as both aims at adding
+capabilities to a base implementation.
+
+A transformer carries a state and an underlying indexer.
+The typeclass instances of the transformer can then add extra logic to this
+underlying indexer.
+
+For example the `WithCatchup` in `marconi-core` will prepare batch of events
+when the blocks we receive are far from the tip to enable batch insertion of
+events.
+
+Preprocessor vs Transformers
+----------------------------
+
+There's a lot of similarities between preprocessor and transformers.
+There are two major differences:
+
+- ``Preprocessor`` state is not exposed and can't be accessed from the outside
+  while ``Transformers`` can expose their state.
+  It makes preprocessors slightly less powerful.
+- ``Preprocessor`` doesn't rely on typeclasses implementation. As a consequence,
+  they are easier to write and easier to compose in the general case.
+  Furthermore, they don't change the type of the indexers.
+
+Despite these differences, preprocessors and transformers are closely related
+and in most scenarios, you should be able to rewrite a preprocessor as a
+transformer or the opposite.
+
+Tracking synchronisation progress
+---------------------------------
+
+In Marconi, each indexer is independent.
+You can reuse data from another indexer in your application, add or remove
+indexers from one run to another, without compromising the other indexers.
+A consequence is that each indexer must track its synchronisation progress.
+When there's a call on ``index`` the indexer keeps track of its last
+synchronisation point.
+When there's a call to ``setLastStablePoint``, it is supposed to keep track
+of the given point as well.
+
+In many scenarios, it is useful to have access to the last sync point and last
+stable point of an indexer.
+If you need to query different indexers for example, you may want to query them
+at the same point in time, to ensure that you get consistent results.
+When you restart your indexers, you may want to access their last stable point
+to know at which blocks you must restart your synchronisation.
+
+To expose this information, we usually implement the ``IsSync`` typeclass.
+Implementing this typeclass is required to put an indexer in a worker.
+
+Queryable
+---------
+
+The whole point of an indexer is to expose information about the event
+they index.
+In Marconi, it's done through the ``Queryable`` typeclass, which has the
+following deifinition:
+
+.. code-block:: haskell
+
+   class Queryable m event query indexer where
+     query
+       :: (Ord (Point event))
+       => Point event
+       -> query
+       -> indexer event
+       -> m (Result query)
+
+To define a ``Queryable`` instance, you need to provide the context ``m`` in
+which it operates, the ``event`` that the indexer must handle to be able to
+answer the query, the query type and the indexer implementation that can answer
+the query.
+
+Then, you need to implement the ``query`` method, which takes a point,
+a ``query`` and an indexer to provide a ``Result query``.
+``Result`` is a type family that associates a query to its result.
+The ``point`` needed by query defines both a point that the indexer must have
+reached and the upper bound of the result we consider, if applicable.
+When you need to do several queries to different indexers, passing the same
+point to the different queries ensures that the results are consistent.
+
+In many situations, we just want access to the freshest information of an
+indexer. In these scenarios, one can use the ``queryLatest`` function.
+``queryLatest`` requires the indexer to implement both ``Queryable`` and
+``IsSync``. It will get for you the last sync point and pass it to the query.
diff --git a/marconi-core/src/Marconi/Core.hs b/marconi-core/src/Marconi/Core.hs
@@ -9,29 +9,19 @@
 {-# OPTIONS_GHC -Wno-redundant-constraints #-}
 
 {- |
- This module propose an alternative to the index implementation proposed in @Storable@.
+ @Marconi.Core@ re-exports most of the content of the sub-modules and in most scenario,
+ most of the stuff you need to set up a chain-indexing solution.
 
- = Motivation
+ = Features
 
- The point we wanted to address are the following:
+ Marconi provides the following features out of the box:
 
-    * @Storable@ implementation is designed in a way that strongly promotes indexers
-      that rely on a mix of database and in-memory storage.
-      We try to propose a more generic design that would allow:
-
-        * full in-memory indexers
-        * indexer backed by a simple file
-        * indexer transformers, that add capability (logging, caching...) to an indexer
-        * mock indexer, for testing purpose, with predefined behaviour
-        * group of indexers, synchronised as a single indexer
-        * implement in-memory/database storage that rely on other query heuristic
-
-    * The original implementation considered the @StorablePoint@ as data that can be derived from
-      @Event@, leading to the design of synthetic events to deal with indexer that didn't index
-      enough data.
-
-    * In marconi, the original design uses a callback design to handle `MVar` modification,
-      we wanted to address this point as well.
+    * full in-memory indexers;
+    * indexers backed by a simple file;
+    * sqlite-indexers;
+    * indexer transformers, that add capability (logging, caching...) to an indexer;
+    * group of indexers, synchronised as a single indexer;
+    * mixed-indexers that allows a different type of storage for recent events and older ones.
 
  = Terminology
 
@@ -70,8 +60,7 @@
           (it allows us to opt-in for traces if we want, indexer by indexer)
         * Transform to change the input type of an indexer
 
-  Contrary to the original Marconi design,
-  indexers don't have a unique (in-memory/sqlite) implementation.
+  indexers can have different implementations (SQLite, in-memory...).
   @SQLite@ indexers are the most common ones. For specific scenarios, you may want to combine
   them with a mixed indexer or to go for other types of indexers.
   We may consider other database backend than @SQLite@ in the future.
@@ -87,24 +76,9 @@
 
  == Define an indexer instance
 
- === Define an indexer instance for 'ListIndexer'
-
-        1. You need to define a type for @event@ (the input of your indexer).
-        As soon as it's done, define the 'Point' type instance for this event,
-        'Point' is a way to know when the Point was emitted.
-        It can be a time, a slot number, a block, whatever information that tracks when
-        an event happen.
-
-        2. It's already enough to index `Timed` of the events you defined at step one
-        of your indexer and to proceed to rollback.
-        You can already test it, creating an indexer with `listIndexer`.
-
-        3. Define a @query@ type and the corresponding 'Result' type.
-
-        4. Then, for this query you need to define the 'Queryable' instance that
-        corresponds to your indexer.
-
-        5. The 'ListIndexer' is ready.
+ At this stage, in most of the cases, you want to define an `SQLiteIndexer`,
+ which will store the incoming events into an SQLite database.
+ Other types of indexers exist and we detail some of them as well below.
 
  == Define an indexer instance for 'SQLiteIndexer'
 
@@ -143,6 +117,25 @@
         There's no helper on this one, but you probably want to query the database and
         to aggregate the query result in your @Result@ type.
 
+ === Define an indexer instance for 'ListIndexer'
+
+        1. You need to define a type for @event@ (the input of your indexer).
+        As soon as it's done, define the 'Point' type instance for this event,
+        'Point' is a way to know when the Point was emitted.
+        It can be a time, a slot number, a block, whatever information that tracks when
+        an event happens.
+
+        2. It's already enough to index `Timed` of the events you defined at step one
+        of your indexer and to proceed to rollback.
+        You can already test it, creating an indexer with `listIndexer`.
+
+        3. Define a @query@ type and the corresponding 'Result' type.
+
+        4. Then, for this query you need to define the 'Queryable' instance that
+        corresponds to your indexer.
+
+        5. The 'ListIndexer' is ready.
+
  === Define an indexer instance for a 'MixedIndexer'
 
  Follow in order the steps for the creation of a 'ListIndexer' (the in-memory part)
@@ -157,7 +150,7 @@
 
  == Write a new indexer
 
-    Most user probably /don't/ want to do this.
+    Most users probably /don't/ want to do this.
 
     A good reason is to add support for another backend
     (another database or another in-memory structure)
@@ -166,9 +159,9 @@
 
         * 'IsSync'
         * 'IsIndex'
-        * 'AppendResult' (if you plan to use it as the in-memory part of a 'MixedIndexer')
         * 'Queryable'
         * 'Closeable' (if you plan to use it in a worker, and you probably plan to)
+        * 'AppendResult' (if you plan to use it as the in-memory part of a 'MixedIndexer')
 
     Best practices is to implement as much as we can 'event'/'query' agnostic
     instances of the typeclasses of these module for the new indexer.