From d4ae7d84711f4d5cd049461249136651f73cb019 Mon Sep 17 00:00:00 2001 From: Shawn Reuland Date: Mon, 10 Jul 2023 17:51:53 -0700 Subject: [PATCH] #147: add more content on filter behavior --- docs/run-platform-server/ingestion-filtering.mdx | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/docs/run-platform-server/ingestion-filtering.mdx b/docs/run-platform-server/ingestion-filtering.mdx index fcd6cdebc..566b9e938 100644 --- a/docs/run-platform-server/ingestion-filtering.mdx +++ b/docs/run-platform-server/ingestion-filtering.mdx @@ -11,15 +11,22 @@ Ingestion Filtering enables Horizon operators to drastically reduce the storage Previously, the only way to limit data storage was by limiting the temporal range of history via rolling retention (e.g. the last 30 days). The filtering feature allows users to store a longer historical timeframe in the Horizon database for only whitelisted assets, accounts, and their related historical entities (transactions, operations, trades, etc.). -For further context, running a non-filtered `full` history Horizon instance currently takes ~ 25TB of disk space (as of June 2023) with storage growing at a rate of ~ 1TB / month. As a benchmark, filtering by even 100 of the most active accounts and assets reduces storage by over 90%. For the majority of applications which are interested in an even more limited set of assets and accounts, storage savings should be well over 99%. Other benefits are reducing operating costs for maintaining storage, improved DB health metrics and query performance. +For further context, running an unfiltered `full` history Horizon instance currently requires over 30TB of disk space (as of June 2023) with storage growing at a rate of about 1TB/month. As a benchmark, filtering by even 100 of the most active accounts and assets reduces storage by over 90%. For the majority of applications which are interested in an even more limited set of assets and accounts, storage savings should be well over 99%. Other benefits include reducing operating costs for maintaining storage, improved DB health metrics and query performance. ### How does it work: -This feature operates by accepting only ledger transactions that match a filter rule when persisting the transactions and operations to historical tables in the Horizon database at ingestion time, any entries that don't match are skipped and not stored on database. +Filtering feature operates during the ingestion process, **live** or **prior historical ranges**. It tells ingestion process to only accept incoming ledger transactions which match on a filter rule, any transactions which don't match on filter rules are skipped by ingestion and therefore not stored on database. -Note that this filtering applies only to historical data, and does not affect current state data stored in Horizon. However, current state data consumes a relatively small amount of the overall storage capacity. +Some key aspects to note about filtering behavior: -Filter rules can whitelist ingestion by the following supported entities: +- Filtering applies only to ingestion of historical data in the database, it does not affect how ingestion process maintains current state data stored in database, which is the last known ledger entry for each unique entity within accounts, trustlines, liquidity pools, offers. However, current state data consumes a relatively small amount of the overall storage capacity. +- When filter rules are changed, they only apply to active ingestion processes(**live** or **historical ranges**). They don't trigger any retro-active filtering or back-filling of existing historical data on the database. + - If you update the filter rules to increase allow-listing of accounts or assets, related transactions will only start to show up in historical database data from **live** ingestion beginning after time the filter rule is updated using the Horizon Admin API. Same applies to **historical range** ingestion, it will only be affected by new filter rules starting at current ledger it was processing within it's configured range at time the filter rules were updated. + - When updating filter rules with increased allow list coverage, no historical back-filling is done automatically. You can manually backfill the history on database by running a new **historical range** ingestion process for a past ledger range after you have updated the filter rules to achieve that result. + - If you update filter rules and reduce the allow list coverage by removing some entities, no retro-active purging or filtering of historical data per the reduced scope of filter rules on database is performed. Whatever data is stored on history tables resides for lifetime of database or until `HISTORY_RETENTION_COUNT` is exceeded, and Horizon will purge all historical data for all entites related to older ledgers regardless of any filtering rules. +- Filtering will not affect the performance or throughput rate of an ingestion process, it will remain consistent whether filter rules are present or not. + +Filter rules define allow-lists for the following supported entities: - Account id - Asset id (canonical)