Skip to content

Commit

Permalink
Add a first version of the key concepts (#218)
Browse files Browse the repository at this point in the history
  • Loading branch information
berewt authored Nov 8, 2023
1 parent e5004fd commit 6b4d045
Show file tree
Hide file tree
Showing 2 changed files with 292 additions and 46 deletions.
261 changes: 257 additions & 4 deletions doc/read-the-docs-site/architecture/key-concepts.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,259 @@
Key concepts
============
Marconi Key concepts
====================

We'll explain concepts such as ``Indexer``, ``Coordinator``, ``Worker``, ``Transformer``, ``Preprocessor``, ``Queryable``, ``IsSync``, ``IsIndex``, ``chain-sync protocol``, etc.
We introduce here the key concepts of Marconi, how they are related and how they
map to the architecture.

Coming eventually!
Indexer
-------

The key component of Marconi is an ``indexer``.
An indexer can be any type that implements the ``IsIndex`` typeclass.
The definition of ``IsIndex`` is (very close to) this one:

.. code-block:: haskell
class (Monad m) => IsIndex m event indexer where
index
:: (Eq (Point event))
=> Timed (Point event) (Maybe event)
-> indexer event
-> m (indexer event)
-- …
rollback :: (Ord (Point event)) => Point event -> indexer event -> m (indexer event)
Let's discuss the type variables first:

- ``m`` is the monadic context of the indexer.
- ``event`` defines the input type of the indexer, we usually store these
events, but an indexer can also decide to process them and store a
different type.
- ``indexer`` maintains the indexer state and provide the necessary information
to perform the indexing operations.

The two key operations that we have to deal with are ``index`` and ``rollback``.
In a blockchain context, we call ``index`` when a new block comes in order to
index the event that corresponds to that block. ``rollback`` is called when the
node emit a rollback: some of the indexed blocks aren't aligned with the
blockchain consensus and we need to unindex them.

``index`` takes a ``Timed (Point event) (Maybe event)`` and the indexer that
provides the indexing context. ``Point event`` is a type family that defines how
we identify the point in time of the block that produces the `event`.
In the context of Cardano, a ``Point event`` will almost always be a
``ChainPoint``: the hash of the block (to identify a block)
and the slot at which the block was issued.
The ``event`` is wrapped in a ``Maybe`` to express the possibility that a block
may not contain any relevant information for a given indexer. It still important
to index a block with no relevant event to keep track of the progress of an
indexer. We'll come back on this later.

``rollback`` takes a point in time. This is the point to which we must rollback
the indexer. It means that we must put the indexer back to the state it was when
we indexed the block at the given point.

``IsIndex`` provides other functions such as ``indexAll`` and
``indexAllDescending`` that allows indexing of a list of events.
A default implementation is provided for these functions, and they can be
overriden to provide a more efficient implementation.

It also provides a ``setLastStablePoint`` method that allows the indexer to keep
track of the last stable point progress and, if needed, to react accordingly.

Coordinator and Workers
-----------------------

A common indexing scenario is to run several indexers in parallel to index
different parts of a same event. Ideally we want to organise our indexers as
a tree, each node doing a part of the processing before propagating the
resulting events, and the leaves being indexers that store the relevant part of
these events.

For example, you may want to have one subtree that focuses on the block
information, while another focuses on the transaction bodies,
or any other split that sounds relevant for your business logic.

In Marconi, the indexers can be grouped thanks to a ``Coordinator``.
A coordinator is a special type of indexer that propagates incoming actions
(index, rollback) to a list of indexers, and handles the lifecycle of these
indexers.

As Haskell is strongly typed, and as the indexer that we want to coordinate
may have different types, we need a way to handle an heterogeneous list of
indexers.

There are many different ways to deal with heterogeneous lists in Haskell. For
the coordinator, we decided to wrap the indexer in a type that hides the
implementation details of an indexer, a ``Worker``.
The type of a worker (slightly simplified) is the following:

.. code-block:: haskell
data Worker input point = forall indexer event n.
( IsIndex n event indexer
, Point event ~ point
) =>
Worker
{ workerName :: Text
-- ^ use to identify the worker in logs
, workerState :: MVar (indexer event)
-- ^ the indexer controlled by this worker
, transformInput :: Preprocessor (ExceptT IndexerError m) point input event
-- ^ adapt the input event givent by the coordinator to the worker type
, hoistError :: forall a. n a -> ExceptT IndexerError IO a
-- ^ adapt the monadic stack of the indexer to the one of the worker
}
You don't need to understand this type declaration in details. Here are the
important bits:

- We hide the indexer representation under the worker type. Any concrete indexer
type will work as long as it implements the ``IsIndex`` typeclass.
- The indexer is put in an ``MVar`` to allow access to it from other threads.
We'll come back to it later.
- We provide a preprocessor, which can be seen as a functon that transforms
the input send by the coordinator into events.
- ``hoistError`` ensures that we know how to translate the base monad of the
indexer into ``ExceptT IndexerError IO`` which is the base monad of a worker.

Once we have a list of workers, we can use it to create a coordinator using the
``mkCoordinator`` function.
This function inialises each worker, creating a dedicated thread for each worker
where the worker will wait for incoming action to perform and notify the
coordinator when the action is performed.

The coordinator monitors each of the threads and, if one of the worker encounter
an error, it will try to close all the other workers nicely.

``Coordinator`` itself implements ``IsIndex`` and thus we can itself be wrapped
in a worker.
Thanks to it, we can create a whole hierarchy of indexers that can control from
a main coordinator.


Preprocessor
------------

We saw in the coordinator and workers section that workers take a preprocessor.
As stated in this section, the preprocessor type can be viewed as a stateful
function that transform the action sent to an indexer.
It's type is isomorphic to:

.. code-block:: haskell
StateT s m ([ProcessedInput point a] -> [ProcessedInput point b])
It's the first time we encounter ``ProcessedInput`` so it is worth going through
its definition:


.. code-block:: haskell
data ProcessedInput point event
= Rollback point
| Index (Timed point (Maybe event))
| IndexAllDescending (NonEmpty (Timed point (Maybe event)))
| StableAt point
| Stop
It is mostly a reified version (functions expressed as data) of most of the
``IsIndex`` functions plus a ``Stop`` construct that allows us to stop a worker.
So the goal of a preprocessor is to take a list of actions that must be sent to
a worker and to transform this list.
It can be either to filter out some actions or to add some actions to the list,
based on the internal state ``s`` of the preprocessor.

Transformers
------------

Indexer transformers (or, shorter, transformers) are used to alter the behaviour
of an existing indexer.
The name comes from the monad transformers concept, and it was chosen because
monad and indexer transformers have similarities, as both aims at adding
capabilities to a base implementation.

A transformer carries a state and an underlying indexer.
The typeclass instances of the transformer can then add extra logic to this
underlying indexer.

For example the `WithCatchup` in `marconi-core` will prepare batch of events
when the blocks we receive are far from the tip to enable batch insertion of
events.

Preprocessor vs Transformers
----------------------------

There's a lot of similarities between preprocessor and transformers.
There are two major differences:

- ``Preprocessor`` state is not exposed and can't be accessed from the outside
while ``Transformers`` can expose their state.
It makes preprocessors slightly less powerful.
- ``Preprocessor`` doesn't rely on typeclasses implementation. As a consequence,
they are easier to write and easier to compose in the general case.
Furthermore, they don't change the type of the indexers.

Despite these differences, preprocessors and transformers are closely related
and in most scenarios, you should be able to rewrite a preprocessor as a
transformer or the opposite.

Tracking synchronisation progress
---------------------------------

In Marconi, each indexer is independent.
You can reuse data from another indexer in your application, add or remove
indexers from one run to another, without compromising the other indexers.
A consequence is that each indexer must track its synchronisation progress.
When there's a call on ``index`` the indexer keeps track of its last
synchronisation point.
When there's a call to ``setLastStablePoint``, it is supposed to keep track
of the given point as well.

In many scenarios, it is useful to have access to the last sync point and last
stable point of an indexer.
If you need to query different indexers for example, you may want to query them
at the same point in time, to ensure that you get consistent results.
When you restart your indexers, you may want to access their last stable point
to know at which blocks you must restart your synchronisation.

To expose this information, we usually implement the ``IsSync`` typeclass.
Implementing this typeclass is required to put an indexer in a worker.

Queryable
---------

The whole point of an indexer is to expose information about the event
they index.
In Marconi, it's done through the ``Queryable`` typeclass, which has the
following deifinition:

.. code-block:: haskell
class Queryable m event query indexer where
query
:: (Ord (Point event))
=> Point event
-> query
-> indexer event
-> m (Result query)
To define a ``Queryable`` instance, you need to provide the context ``m`` in
which it operates, the ``event`` that the indexer must handle to be able to
answer the query, the query type and the indexer implementation that can answer
the query.

Then, you need to implement the ``query`` method, which takes a point,
a ``query`` and an indexer to provide a ``Result query``.
``Result`` is a type family that associates a query to its result.
The ``point`` needed by query defines both a point that the indexer must have
reached and the upper bound of the result we consider, if applicable.
When you need to do several queries to different indexers, passing the same
point to the different queries ensures that the results are consistent.

In many situations, we just want access to the freshest information of an
indexer. In these scenarios, one can use the ``queryLatest`` function.
``queryLatest`` requires the indexer to implement both ``Queryable`` and
``IsSync``. It will get for you the last sync point and pass it to the query.
77 changes: 35 additions & 42 deletions marconi-core/src/Marconi/Core.hs
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,19 @@
{-# OPTIONS_GHC -Wno-redundant-constraints #-}

{- |
This module propose an alternative to the index implementation proposed in @Storable@.
@Marconi.Core@ re-exports most of the content of the sub-modules and in most scenario,
most of the stuff you need to set up a chain-indexing solution.
= Motivation
= Features
The point we wanted to address are the following:
Marconi provides the following features out of the box:
* @Storable@ implementation is designed in a way that strongly promotes indexers
that rely on a mix of database and in-memory storage.
We try to propose a more generic design that would allow:
* full in-memory indexers
* indexer backed by a simple file
* indexer transformers, that add capability (logging, caching...) to an indexer
* mock indexer, for testing purpose, with predefined behaviour
* group of indexers, synchronised as a single indexer
* implement in-memory/database storage that rely on other query heuristic
* The original implementation considered the @StorablePoint@ as data that can be derived from
@Event@, leading to the design of synthetic events to deal with indexer that didn't index
enough data.
* In marconi, the original design uses a callback design to handle `MVar` modification,
we wanted to address this point as well.
* full in-memory indexers;
* indexers backed by a simple file;
* sqlite-indexers;
* indexer transformers, that add capability (logging, caching...) to an indexer;
* group of indexers, synchronised as a single indexer;
* mixed-indexers that allows a different type of storage for recent events and older ones.
= Terminology
Expand Down Expand Up @@ -70,8 +60,7 @@
(it allows us to opt-in for traces if we want, indexer by indexer)
* Transform to change the input type of an indexer
Contrary to the original Marconi design,
indexers don't have a unique (in-memory/sqlite) implementation.
indexers can have different implementations (SQLite, in-memory...).
@SQLite@ indexers are the most common ones. For specific scenarios, you may want to combine
them with a mixed indexer or to go for other types of indexers.
We may consider other database backend than @SQLite@ in the future.
Expand All @@ -87,24 +76,9 @@
== Define an indexer instance
=== Define an indexer instance for 'ListIndexer'
1. You need to define a type for @event@ (the input of your indexer).
As soon as it's done, define the 'Point' type instance for this event,
'Point' is a way to know when the Point was emitted.
It can be a time, a slot number, a block, whatever information that tracks when
an event happen.
2. It's already enough to index `Timed` of the events you defined at step one
of your indexer and to proceed to rollback.
You can already test it, creating an indexer with `listIndexer`.
3. Define a @query@ type and the corresponding 'Result' type.
4. Then, for this query you need to define the 'Queryable' instance that
corresponds to your indexer.
5. The 'ListIndexer' is ready.
At this stage, in most of the cases, you want to define an `SQLiteIndexer`,
which will store the incoming events into an SQLite database.
Other types of indexers exist and we detail some of them as well below.
== Define an indexer instance for 'SQLiteIndexer'
Expand Down Expand Up @@ -143,6 +117,25 @@
There's no helper on this one, but you probably want to query the database and
to aggregate the query result in your @Result@ type.
=== Define an indexer instance for 'ListIndexer'
1. You need to define a type for @event@ (the input of your indexer).
As soon as it's done, define the 'Point' type instance for this event,
'Point' is a way to know when the Point was emitted.
It can be a time, a slot number, a block, whatever information that tracks when
an event happens.
2. It's already enough to index `Timed` of the events you defined at step one
of your indexer and to proceed to rollback.
You can already test it, creating an indexer with `listIndexer`.
3. Define a @query@ type and the corresponding 'Result' type.
4. Then, for this query you need to define the 'Queryable' instance that
corresponds to your indexer.
5. The 'ListIndexer' is ready.
=== Define an indexer instance for a 'MixedIndexer'
Follow in order the steps for the creation of a 'ListIndexer' (the in-memory part)
Expand All @@ -157,7 +150,7 @@
== Write a new indexer
Most user probably /don't/ want to do this.
Most users probably /don't/ want to do this.
A good reason is to add support for another backend
(another database or another in-memory structure)
Expand All @@ -166,9 +159,9 @@
* 'IsSync'
* 'IsIndex'
* 'AppendResult' (if you plan to use it as the in-memory part of a 'MixedIndexer')
* 'Queryable'
* 'Closeable' (if you plan to use it in a worker, and you probably plan to)
* 'AppendResult' (if you plan to use it as the in-memory part of a 'MixedIndexer')
Best practices is to implement as much as we can 'event'/'query' agnostic
instances of the typeclasses of these module for the new indexer.
Expand Down

0 comments on commit 6b4d045

Please sign in to comment.