Skip to content
This repository has been archived by the owner on Oct 31, 2024. It is now read-only.

Commit

Permalink
chore: adding more docs to the storage
Browse files Browse the repository at this point in the history
Signed-off-by: Simon Paitrault <[email protected]>
  • Loading branch information
Freyskeyd committed Oct 31, 2023
1 parent 45c75a4 commit a8c4688
Show file tree
Hide file tree
Showing 12 changed files with 184 additions and 544 deletions.
62 changes: 38 additions & 24 deletions crates/topos-tce-storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,47 +21,61 @@ As an overview, the storage layer is composed of the following stores:
<img alt="Text changing depending on mode. Light: 'So light!' Dark: 'So dark!'" src="https://github.com/topos-protocol/topos/assets/1394604/e4bd859e-2a6d-40dc-8e84-2a708aa8a2d8">
</picture>

### Usage
#### Definitions and Responsibilities

Each store represents a different kind of capabilities, but they all act and need the same kind
of configuration in order to work.
As illustrated above, multiple `stores` are exposed in the library using various `tables`.

For instance, the [`EpochValidatorsStore`](struct@epoch::EpochValidatorsStore) only needs a [`PathBuf`](struct@std::path::PathBuf)
argument to be instantiated where [`FullNodeStore`](struct@fullnode::FullNodeStore) needs a little bit more arguments.
The difference between a `store` and a `table` is that the `table` is responsible for storing
the data while the `store` is responsible for managing the data and its access and behaviour.

The underlying mechanisms of how data is stored is fairly simple, it relies a lot on [`rocksdb`] and will
be describe below.
Here's the list of the different stores and their responsibilities:

As an example, in order to create a new [`EpochValidatorsStore`](struct@epoch::EpochValidatorsStore) you need to provide a
path where the [`rocksdb`] database will be placed:
- The [`EpochValidatorsStore`](struct@epoch::EpochValidatorsStore) is responsible for managing the list of validators for each `epoch`.
- The [`FullNodeStore`](struct@fullnode::FullNodeStore) is responsible for managing all the persistent data such as [`Certificate`] delivered and associated `streams`.
- The [`IndexStore`](struct@index::IndexStore) is responsible for managing indexes in order to collect information about the broadcast and the network.
- The [`ValidatorStore`](struct@validator::ValidatorStore) is responsible for managing the pending certificates pool and all the transient and volatile data.

```rust
use epoch::EpochValidatorsStore;
path.push("epoch");
let store: Arc<EpochValidatorsStore> = EpochValidatorsStore::new(path).unwrap();
```
For more information about a `store`, see the related doc.

Next, we've the list of the different tables and their responsibilities:

- The [`EpochValidatorsTables`](struct@epoch::EpochValidatorsTables) is responsible for storing the list of validators for each `epoch`.
- The [`ValidatorPerpetualTables`](struct@validator::ValidatorPerpetualTables) is responsible for storing the [`Certificate`] delivered and all the persitent data related to the broadcast.
- The [`ValidatorPendingTables`](struct@validator::ValidatorPendingTables) is responsible for storing the pending certificates pool and all the transient and volatile data.
- The [`IndexTables`](struct@index::IndexTables) is responsible for storing indexes about the delivery of [`Certificate`] such as `target subnet stream`.

### Special Considerations

When using the storage layer, you need to be aware of the following:
- The storage layer is using [`rocksdb`] as a backend, which means that the data is stored on disk.
- The storage layer is using [`Arc`](struct@std::sync::Arc) to share the stores between threads.
- The storage layer is using [`async_trait`](https://docs.rs/async-trait/0.1.51/async_trait/) to expose methods that need to manage locks. (see [`WriteStore`](trait@store::WriteStore))
- Some functions are using [`DBBatch`](struct@rocks::db_column::DBBatch) to batch multiple writes in one transaction. But not all functions are using it.
- The storage layer is using [rocksdb](https://rocksdb.org/) as a backend, which means that this storage doesn't need external service, as `rocksdb` is embeddable kv store.
- The storage layer is using [`Arc`](struct@std::sync::Arc) to share the stores between threads. It also means that a `store` is only instantiated once.
- Some functions are batching multiple writes in one transaction. But not all functions are using it.

### Design Philosophy

The choice of using [`rocksdb`] as a backend was made because it is a well known and battle tested database.
It is also very fast and efficient when it comes to write and read data. However, it is not the best when it comes
to compose or filter data. This is why we have multiple store that are used for different purposes.
The choice of using [rocksdb](https://rocksdb.org/) as a backend was made because it is a well known and battle tested database.
It is also very fast and efficient when it comes to write and read data.

Multiple `stores` and `tables` exists in order to allow admin to deal with backups or
snapshots as they see fit. You can pick and choose which `tables` you want to backup without having to backup the whole database.

By splitting the data in dedicated tables we define strong separation of concern
directly in our storage.

`RocksDB` is however not the best fit when it comes to compose or filter data based on the data
itself.

For complex queries, another database like [`PostgreSQL`](https://www.postgresql.org/) or [`CockroachDB`](https://www.cockroachlabs.com/) could be used as a Storage for projections.
The source of truth would still be [`rocksdb`] but the projections would be stored in a relational database. Allowing for more complex queries.
The source of truth would still be [rocksdb](https://rocksdb.org/) but the projections would be stored in a relational database, allowing for more complex queries.

As mention above, the different stores are using [`Arc`](struct@std::sync::Arc), allowing a single store to be instantiated once
and then shared between threads. This is very useful when it comes to the [`FullNodeStore`](struct@fullnode::FullNodeStore) as it is used in various places.
and then shared between threads. This is very useful when it comes to the [`FullNodeStore`](struct@fullnode::FullNodeStore) as it is used in various places but need to provides single entrypoint to the data.

It also means that the store is immutable, which is a good thing when it comes to concurrency.

The burden of managing the locks is handled by the [`async_trait`](https://docs.rs/async-trait/0.1.51/async_trait/) crate when using the [`WriteStore`](trait@store::WriteStore).
The rest of the mutation on the data are handled by [`rocksdb`] itself.

The locks are responsible for preventing any other query to mutate the data currently in processing. For more information about the locks see [`locking`](module@fullnode::locking)

The rest of the mutation on the data are handled by [rocksdb](https://rocksdb.org/) itself.

2 changes: 1 addition & 1 deletion crates/topos-tce-storage/src/fullnode/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ use crate::{

use self::locking::LockGuards;

mod locking;
pub mod locking;

/// Store to manage FullNode data
///
Expand Down
9 changes: 4 additions & 5 deletions crates/topos-tce-storage/src/index/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use std::{fs::create_dir_all, path::PathBuf};

use rocksdb::ColumnFamilyDescriptor;
use topos_core::{
types::stream::{CertificateTargetStreamPosition, Position},
types::stream::Position,
uci::{CertificateId, SubnetId},
};
use tracing::warn;
Expand All @@ -13,9 +13,8 @@ use crate::{
constants,
db::{default_options, init_with_cfs},
db_column::DBColumn,
TargetSourceListKey,
},
types::CertificateSequenceNumber,
types::{CertificateSequenceNumber, TargetSourceListColumn, TargetStreamsColumn},
};

#[allow(unused)]
Expand All @@ -24,8 +23,8 @@ pub(crate) struct IndexStore {
}

pub struct IndexTables {
pub(crate) target_streams: DBColumn<CertificateTargetStreamPosition, CertificateId>,
pub(crate) target_source_list: DBColumn<TargetSourceListKey, Position>,
pub(crate) target_streams: TargetStreamsColumn,
pub(crate) target_source_list: TargetSourceListColumn,
pub(crate) source_list: DBColumn<SubnetId, (CertificateId, Position)>,
pub(crate) source_list_per_target: DBColumn<(SubnetId, SubnetId), bool>,
}
Expand Down
167 changes: 41 additions & 126 deletions crates/topos-tce-storage/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,49 +19,70 @@
//! <img alt="Text changing depending on mode. Light: 'So light!' Dark: 'So dark!'" src="https://github.com/topos-protocol/topos/assets/1394604/e4bd859e-2a6d-40dc-8e84-2a708aa8a2d8">
//!</picture>
//!
//! ## Usage
//! ### Definitions and Responsibilities
//!
//! Each store represents a different kind of capabilities, but they all act and need the same kind
//! of configuration in order to work.
//! As illustrated above, multiple `stores` are exposed in the library using various `tables`.
//!
//! For instance, the [`EpochValidatorsStore`](struct@epoch::EpochValidatorsStore) only needs a [`PathBuf`](struct@std::path::PathBuf)
//! argument to be instantiated where [`FullNodeStore`](struct@fullnode::FullNodeStore) needs a little bit more arguments.
//! The difference between a `store` and a `table` is that the `table` is responsible for storing
//! the data while the `store` is responsible for managing the data and its access and behaviour.
//!
//! The underlying mechanisms of how data is stored is fairly simple, it relies a lot on [`rocksdb`] and will
//! be describe below.
//! Here's the list of the different stores and their responsibilities:
//!
//! - The [`EpochValidatorsStore`](struct@epoch::EpochValidatorsStore) is responsible for managing the list of validators for each `epoch`.
//! - The [`FullNodeStore`](struct@fullnode::FullNodeStore) is responsible for managing all the persistent data such as [`Certificate`] delivered and associated `streams`.
//! - The [`IndexStore`](struct@index::IndexStore) is responsible for managing indexes in order to collect information about the broadcast and the network.
//! - The [`ValidatorStore`](struct@validator::ValidatorStore) is responsible for managing the pending certificates pool and all the transient and volatile data.
//!
//! For more information about a `store`, see the related doc.
//!
//! Next, we've the list of the different tables and their responsibilities:
//!
//! - The [`EpochValidatorsTables`](struct@epoch::EpochValidatorsTables) is responsible for storing the list of validators for each `epoch`.
//! - The [`ValidatorPerpetualTables`](struct@validator::ValidatorPerpetualTables) is responsible for storing the [`Certificate`] delivered and all the persitent data related to the broadcast.
//! - The [`ValidatorPendingTables`](struct@validator::ValidatorPendingTables) is responsible for storing the pending certificates pool and all the transient and volatile data.
//! - The [`IndexTables`](struct@index::IndexTables) is responsible for storing indexes about the delivery of [`Certificate`] such as `target subnet stream`.
//!
//! ## Special Considerations
//!
//! When using the storage layer, you need to be aware of the following:
//! - The storage layer is using [`rocksdb`] as a backend, which means that the data is stored on disk.
//! - The storage layer is using [`Arc`](struct@std::sync::Arc) to share the stores between threads.
//! - The storage layer is using [`async_trait`](https://docs.rs/async-trait/0.1.51/async_trait/) to expose methods that need to manage locks. (see [`WriteStore`](trait@store::WriteStore))
//! - Some functions are using [`DBBatch`](struct@rocks::db_column::DBBatch) to batch multiple writes in one transaction. But not all functions are using it.
//! - The storage layer is using [rocksdb](https://rocksdb.org/) as a backend, which means that this storage doesn't need external service, as `rocksdb` is embeddable kv store.
//! - The storage layer is using [`Arc`](struct@std::sync::Arc) to share the stores between threads. It also means that a `store` is only instantiated once.
//! - Some functions are batching multiple writes in one transaction. But not all functions are using it.
//!
//! ## Design Philosophy
//!
//! The choice of using [`rocksdb`] as a backend was made because it is a well known and battle tested database.
//! It is also very fast and efficient when it comes to write and read data. However, it is not the best when it comes
//! to compose or filter data. This is why we have multiple store that are used for different purposes.
//! The choice of using [rocksdb](https://rocksdb.org/) as a backend was made because it is a well known and battle tested database.
//! It is also very fast and efficient when it comes to write and read data.
//!
//! Multiple `stores` and `tables` exists in order to allow admin to deal with backups or
//! snapshots as they see fit. You can pick and choose which `tables` you want to backup without having to backup the whole database.
//!
//! By splitting the data in dedicated tables we define strong separation of concern
//! directly in our storage.
//!
//! `RocksDB` is however not the best fit when it comes to compose or filter data based on the data
//! itself.
//!
//! For complex queries, another database like [`PostgreSQL`](https://www.postgresql.org/) or [`CockroachDB`](https://www.cockroachlabs.com/) could be used as a Storage for projections.
//! The source of truth would still be [`rocksdb`] but the projections would be stored in a relational database. Allowing for more complex queries.
//! The source of truth would still be [rocksdb](https://rocksdb.org/) but the projections would be stored in a relational database, allowing for more complex queries.
//!
//! As mention above, the different stores are using [`Arc`](struct@std::sync::Arc), allowing a single store to be instantiated once
//! and then shared between threads. This is very useful when it comes to the [`FullNodeStore`](struct@fullnode::FullNodeStore) as it is used in various places.
//! and then shared between threads. This is very useful when it comes to the [`FullNodeStore`](struct@fullnode::FullNodeStore) as it is used in various places but need to provides single entrypoint to the data.
//!
//! It also means that the store is immutable, which is a good thing when it comes to concurrency.
//!
//! The burden of managing the locks is handled by the [`async_trait`](https://docs.rs/async-trait/0.1.51/async_trait/) crate when using the [`WriteStore`](trait@store::WriteStore).
//! The rest of the mutation on the data are handled by [`rocksdb`] itself.
//!
use errors::InternalStorageError;
use rocks::iterator::ColumnIterator;
//! The locks are responsible for preventing any other query to mutate the data currently in processing. For more information about the locks see [`locking`](module@fullnode::locking)
//!
//! The rest of the mutation on the data are handled by [rocksdb](https://rocksdb.org/) itself.
//!
use serde::{Deserialize, Serialize};
use std::collections::HashMap;

use topos_core::{
types::stream::{CertificateSourceStreamPosition, CertificateTargetStreamPosition, Position},
uci::{Certificate, CertificateId, SubnetId},
uci::{CertificateId, SubnetId},
};

// v2
Expand All @@ -72,7 +93,6 @@ pub mod epoch;
pub mod fullnode;
pub mod index;
pub mod types;
/// Everything that is needed to participate to the protocol
pub mod validator;

// v1
Expand Down Expand Up @@ -128,108 +148,3 @@ pub struct SourceHead {
/// Position of the Certificate
pub position: Position,
}

/// Define possible status of a certificate
#[derive(Debug, Deserialize, Serialize)]
pub enum CertificateStatus {
Pending,
Delivered,
}

/// The `Storage` trait defines methods to interact and manage with the persistency layer
#[async_trait::async_trait]
pub trait Storage: Sync + Send + 'static {
async fn get_pending_certificate(
&self,
certificate_id: CertificateId,
) -> Result<(PendingCertificateId, Certificate), InternalStorageError>;

/// Add a pending certificate to the pool
async fn add_pending_certificate(
&self,
certificate: &Certificate,
) -> Result<PendingCertificateId, InternalStorageError>;

/// Persist the certificate with given status
async fn persist(
&self,
certificate: &Certificate,
pending_certificate_id: Option<PendingCertificateId>,
) -> Result<CertificatePositions, InternalStorageError>;

/// Update the certificate entry with new status
async fn update(
&self,
certificate_id: &CertificateId,
status: CertificateStatus,
) -> Result<(), InternalStorageError>;

/// Returns the source heads of given subnets
async fn get_source_heads(
&self,
subnets: Vec<SubnetId>,
) -> Result<Vec<crate::SourceHead>, InternalStorageError>;

/// Returns the certificate data given their id
async fn get_certificates(
&self,
certificate_ids: Vec<CertificateId>,
) -> Result<Vec<Certificate>, InternalStorageError>;

/// Returns the certificate data given its id
async fn get_certificate(
&self,
certificate_id: CertificateId,
) -> Result<Certificate, InternalStorageError>;

/// Returns the certificate emitted by given subnet
/// Ranged by position since emitted Certificate are totally ordered
async fn get_certificates_by_source(
&self,
source_subnet_id: SubnetId,
from: Position,
limit: usize,
) -> Result<Vec<CertificateId>, InternalStorageError>;

/// Returns the certificate received by given subnet
/// Ranged by timestamps since received Certificate are not referrable by position
async fn get_certificates_by_target(
&self,
target_subnet_id: SubnetId,
source_subnet_id: SubnetId,
from: Position,
limit: usize,
) -> Result<Vec<CertificateId>, InternalStorageError>;

/// Returns all the known Certificate that are not delivered yet
async fn get_pending_certificates(
&self,
) -> Result<Vec<(PendingCertificateId, Certificate)>, InternalStorageError>;

/// Returns the next Certificate that are not delivered yet
async fn get_next_pending_certificate(
&self,
starting_at: Option<usize>,
) -> Result<Option<(PendingCertificateId, Certificate)>, InternalStorageError>;

/// Remove a certificate from pending pool
async fn remove_pending_certificate(
&self,
index: PendingCertificateId,
) -> Result<(), InternalStorageError>;

async fn get_target_stream_iterator(
&self,
target: SubnetId,
source: SubnetId,
position: Position,
) -> Result<
ColumnIterator<'_, CertificateTargetStreamPosition, CertificateId>,
InternalStorageError,
>;

async fn get_source_list_by_target(
&self,
target: SubnetId,
) -> Result<Vec<SubnetId>, InternalStorageError>;
}
Loading

0 comments on commit a8c4688

Please sign in to comment.