diff --git a/README.md b/README.md index 59281e86..eee61b62 100644 --- a/README.md +++ b/README.md @@ -54,34 +54,60 @@ _Oura_ running in `daemon` mode can be configured to use custom filters to pinpo If the available out-of-the-box features don't satisfiy your particular use-case, _Oura_ can be used a library in your Rust project to setup tailor-made pipelines. Each component (sources, filters, sinks, etc) in _Oura_ aims at being self-contained and reusable. For example, custom filters and sinks can be built while reusing the existing sources. +## How it Works + +Oura is in its essence just a pipeline for proccessing events. Each stage of the pipeline fulfills a different roles: + +- Source Stages: are in charge of pulling data from the blockchain and mapping the raw blocks into smaller, more granular events. Each event is then sent through the output port of the stage for further processing. +- Filter Stages: receive individual events from the source stage and apply some sort of transformation to each one. The transformations applied will depend on the particular use-case, but they usually revolve around selecting relevant events and enriching them with extra information. +- Sink Stages: receive the final events from the filter stage and submits the payload to some external system, database or service for further processing. + +![diagram](assets/diagram.png) + ## Feature Status - Sources - [x] chain-sync full-block (node-to-client) - [ ] chain-sync headers-only (node-to-node) - [x] chain-sync + block-fetch (node-to-node) - - [ ] shared file system - Sinks - [x] Kafka topic - [x] Elasticsearch index / data stream + - [x] Rotating log files with compression - [ ] Redis streams - [ ] AWS SQS queue - [ ] AWS Lambda call - [ ] GCP PubSub - [x] webhook (http post) - [x] terminal (append-only, tail-like) - - [ ] TUI +- Events / Parsers + - [x] block events (start, end) + - [x] transaction events (inputs, outputs, assets) + - [x] metadata events (labels, content) + - [x] mint events (policy, asset, quantity) + - [x] pool registrations events + - [x] delegation events + - [x] CIP-25 metadata parser (image, files) + - [ ] CIP-15 metadata parser - Filters - - [x] by event type (block, tx, mint, cert, etc) - - [x] by asset subject (policy, name, etc) - - [x] by metadata keys - - [ ] by block property (size, tx count) - - [ ] by tx property (fee, has native script, has plutus script, etc) - - [ ] by utxo property (address, asset, amount range) -- Enrichment - - [ ] policy info from metadata service - - [ ] input tx info from Blockfrost api - - [ ] address translation from ADAHandle + - [x] cherry pick by event type (block, tx, mint, cert, etc) + - [x] cherry pick by asset subject (policy, name, etc) + - [x] cherry pick by metadata keys + - [ ] cherry pick by block property (size, tx count) + - [ ] cherry pick by tx property (fee, has native script, has plutus script, etc) + - [ ] cherry pick by utxo property (address, asset, amount range) + - [ ] enrich events with policy info from external metadata service + - [ ] enrich input tx info from Blockfrost API + - [ ] enrich addresses descriptions using ADAHandle +- Other + - [x] stateful chain cursor to recover from restarts + - [ ] buffer stage to hold blocks until they reach a certain depth + +## Known Limitations + +- Oura only knows how to process blocks from the Shelley era. We are working on adding support for Byron in a future release. +- Oura reads events from minted blocks / transactions. Support for querying the mempool is planned for a future release. +- Oura will notify about chain rollbacks as a new event. The business logic for "undoing" the already processed events is a responsability of the consumer. We're working on adding support for a "buffer" filter stage which can hold blocks until they reach a configurable depth (number of confirmations). ## Contributing diff --git a/assets/diagram.png b/assets/diagram.png new file mode 100644 index 00000000..6e974b31 Binary files /dev/null and b/assets/diagram.png differ diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 99b5c59d..794eb7c7 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -9,6 +9,7 @@ - [Usage](./usage/README.md) - [Watch Mode](./usage/watch.md) - [Daemon Mode](./usage/daemon.md) + - [Dump Mode](./usage/dump.md) - [Library](./usage/library.md) - [Filters](./filters/README.md) - [Fingerprint](./filters/fingerprint.md) @@ -21,6 +22,7 @@ - [Kafka](./sinks/kafka.md) - [Elasticsearch](./sinks/elastic.md) - [Webhook](./sinks/webhook.md) + - [Logs](./sinks/logs.md) - [Reference](reference/README.md) - [Data Dictionary](./reference/data_dictionary.md) - [Guides](./guides/README.md) diff --git a/book/src/sinks/README.md b/book/src/sinks/README.md index 81a0888e..a691023b 100644 --- a/book/src/sinks/README.md +++ b/book/src/sinks/README.md @@ -10,5 +10,6 @@ These are the existing sinks that are included as part the main _Oura_ codebase: - [Kakfa](kafka.md): a sink that sends each event into a Kafka topic - [Elasticsearch](elastic.md): a sink that writes events into an Elasticsearch index or data stream. - [Webhook](webhook.md): a sink that outputs each event as an HTTP call to a remote endpoint. +- [Logs](logs.md): a sink that saves events to the file system using JSONL text files. New sinks are being developed, information will be added in this documentation to reflect the updated list. Contributions and feature request are welcome in our [Github Repo](https://github.com/txpipe/oura). \ No newline at end of file diff --git a/book/src/sinks/logs.md b/book/src/sinks/logs.md new file mode 100644 index 00000000..a1c8512f --- /dev/null +++ b/book/src/sinks/logs.md @@ -0,0 +1,26 @@ +# Logs + +A sink that saves events into the file system. Each event is json-encoded and appended to the of a text file. Files are rotated once they reach a certain size. Optionally, old files can be automatically compressed once they have rotated. + +## Configuration + +Example sink section config + +```toml +[sink] +type = "Logs" +output_path = "/var/oura/mainnet" +output_format = "JSONL" +max_bytes_per_file = 1_000_000 +max_total_files = 10 +compression = true +``` + +### Section: `sink` + +- `type`: the literal value `Logs`. +- `output_path`: the path-like prefix for the output log files +- `output_format` (optional): specified the type of syntax to use for the serialization of the events. Only available option at the moment is `JSONL` (json + line break) +- `max_bytes_per_file` (optional): the max amount of bytes to add in a file before rotating it +- `max_total_files` (optional): the max amount of files to keep in the file system before start deleting the old ones +- `compression` (optional): a boolean indicating if the rotated files should be compressed. diff --git a/book/src/usage/README.md b/book/src/usage/README.md index 67c712c8..c7ad8c32 100644 --- a/book/src/usage/README.md +++ b/book/src/usage/README.md @@ -2,6 +2,6 @@ _Oura_ provides three different execution modes: -- [Dameon](daemon.md): a fully-configurable pipeline that runs in the background. Sources, filters and sinks can be combined to fulfil particular use-cases. +- [Daemon](daemon.md): a fully-configurable pipeline that runs in the background. Sources, filters and sinks can be combined to fulfil particular use-cases. - [Watch](watch.md): to watch live block events from a node directly in the terminal. It is meant for humans, it uses colors and throttling to facilitate reading. - [Dump](dump.md): to dump live block events from a node into rotation log files or stdout. It uses JSONL format for persistence of the events. \ No newline at end of file diff --git a/book/src/usage/daemon.md b/book/src/usage/daemon.md index 85ff9b30..79b6c352 100644 --- a/book/src/usage/daemon.md +++ b/book/src/usage/daemon.md @@ -44,6 +44,11 @@ type = "Z" # custom config fields for this sink type foo = "123" bar = "789" + +# optional cursor settings, remove seaction to disable feature +[cursor] +type = "File" +path = "/var/oura/cursor" ``` ### The `source` section @@ -58,6 +63,10 @@ This section specifies a collection of filters that are applied in sequence to e This section specifies the destination of the data. The special `type` field must always be present and containing a value matching any of the available built-in sinks. The rest of the fields in the section will depend on the selected `type`. See the [sinks](../sinks/index.md) section for a list of available options. +### The `cursor` section + +This section specifies how to configure the "cursor" feature. A cursor is a reference of the current position of the pipeline. If the pipeline needs to restart for whatever reason, and a cursor is available, the pipeline will start reading from that point in the chain. Removing the section from the config will disable the cursor feature. + ### Full Example Here's an example configuration file that uses a Node-to-Node source and output the events into a Kafka sink: diff --git a/book/src/usage/library.md b/book/src/usage/library.md index b2902985..2a699a59 100644 --- a/book/src/usage/library.md +++ b/book/src/usage/library.md @@ -1 +1,3 @@ # Library + +Coming Soon!