Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

horizon: history_claimable_balances is not cleared out by the reaper. #4396

Closed
2opremio opened this issue May 23, 2022 · 6 comments · Fixed by #4565
Closed

horizon: history_claimable_balances is not cleared out by the reaper. #4396

2opremio opened this issue May 23, 2022 · 6 comments · Fixed by #4565
Assignees

Comments

@2opremio
Copy link
Contributor

We should:

  1. Make sure the reaper takes care of history_claimable_balances
  2. Make sure all the other history tables are included.
  3. Add a test that ensure all the history_ tables in the DB are cleared out (in order prevent this problem from happening to tables added in the future).
@2opremio
Copy link
Contributor Author

2opremio commented Jul 12, 2022

We have three tables which aren't cleaned up when deleting a range. Those are:

  • history_claimable_balances
  • history_liquidity_pools
  • history_accounts
  • history_assets

They all pair an entity (claimable balance, liquidity pool, account) with an internal id:

# \d history_accounts
 id      | bigint                |           | not null | nextval('history_accounts_id_seq'::regclass)
 address | character varying(64) |           |          | 

# \d history_liquidity_pools
 id                | bigint |           | not null | nextval('history_liquidity_pools_id_seq'::regclass)
 liquidity_pool_id | text   |           | not null | 

# \d history_assets
 id           | integer               |           | not null | nextval('history_assets_id_seq'::regclass)
 asset_type   | character varying(64) |           | not null | 
 asset_code   | character varying(12) |           | not null | 
 asset_issuer | character varying(56) |           | not null | 

# \d history_claimable_balances
 id                   | bigint |           | not null | nextval('history_claimable_balances_id_seq'::regclass)
 claimable_balance_id | text   |           | not null | 

In theory, every time we clear out a range of data (either for reingesting or for reaping) we should also remove any orphan entries in the tables above (an orphan entry is one whose internal id isn't used anymore, e.g. for history_claimable_balances, that would be all entries which cannot be joined with history_operation_claimable_balances or history_transaction_claimable_balances).

However, the queries required to do so may be too expensive. As I see it we could either:

  1. clear out any orphan entries (by figuring out which ids are not matched in any other tables)
  2. before clearing out the other tables, figure out which ids are only used the in the ledger range to clear out (and would end up being orphans after the fact)

I think both (1) and (2) may be too expensive for large enough history_* tables

@2opremio
Copy link
Contributor Author

maybe @sydneynotthecity may have a better querying suggestion?

@sydneynotthecity
Copy link

Do you know how expensive of a query cost is too expensive?

Another option is that we could create a trigger that tracked any record deletions in the history_* tables and logged them to an events logging table. We could set it up in a way where the deleted id was logged to the events table. Then, we could delete the orphaned entries based on a join to this events table, which should be significantly cheaper than joining on the original tables.

Once the records were cleared out, we could either update an indicator in the event tables for is_deleted = TRUE or just wipe the records from the events table so that the table only retains orphaned records to be deleted.

@sydneynotthecity
Copy link

@2opremio the other point is the history_transaction_claimable_balances and history_operation_claimable_balances were missing indices for claimable_balance_id which is what was causing joins/searches on claimable_balance_id to be super slow. @sreuland added the indices as a test in #4455 and it looks like it has greatly improved query performance.

It appears that the equivalent liquidity_pools tables are also missing indices on liquidity_pool_id only. My guess is that if we added those as well, we should see enough query performance improvement to be able to identify orphaned records by query, like you originally proposed.

@2opremio
Copy link
Contributor Author

let's wait for the indices to be added and retake this after that.

@bartekn bartekn assigned bartekn and unassigned 2opremio Aug 4, 2022
bartekn added a commit that referenced this issue Aug 9, 2022
…4518)

While Horizon removes history data when `--history-retention-count` flag is set
it doesn't clear lookup historical tables. Lookup tables are `[id, key name]`
pairs that allow setting pointers to keys in historical tables, thus saving disk
space. This data can occupy a vast space on disk and is never used when old
historical data is deleted.

This commit adds code responsible for clearing orphaned rows in lookup
historical tables. Orphaned rows can appear when old data is removed by reaper.
The new code is separate from the existing reaper code (see "Alternative
solutions" below) and activates after each ledger if there are no more ledgers
to ingest in the backend. This has two advantages: it does not slow down catchup
and it works only when ingestion is idle which shouldn't affect ingestion at
all. To ensure performance is not affected, the `ReapLookupTables` method is
called with context with 5 seconds timeout which means that if it does not
finish the work in specified time it will simply be cancelled.

The solution here requires new indexes added in c2d52f0 (without it finding the
rows to delete is slow). For each lookup table, we check the number of
occurences of a given lookup ID in all the tables in which lookup table is used.
If no occurences are found, the row is removed from a lookup table. 

Rows are removed in batches of 10000 rows (can be modified in the future). The
cursor is updated when tables is processed so after next ledger ingesion the
next chunk of rows is checked. When cursor reaches the end of table it is reset
back to 0. This ensures that all the orphaned rows are removed eventually (some
rows can be skipped because new rows are added to lookup tables by ingestion and
some are removed by reaper so `offset` does not always skip to the place is
should to cover entire table).

#### Alternative solutions

While working on this I tried to implement @fons'es idea from #4396 which was
removing rows before clearing historical data which are not present in other
ranges. There is a general problem with this solution. The lookup tables are
actively used by ingestion which means that if rows are deleted while ingestion
reads a given row it can create inconsistent data. We could modify reaper to
aquire ingestion lock but if there are many ledgers to remove it can affect
ingestion.

We could also write a query that finds and removes all the orphaned rows but
it's too slow to be executed between ingestion of two consecutive ledgers.
@bartekn
Copy link
Contributor

bartekn commented Aug 9, 2022

#4518 clears orphans from history_claimable_balances and history_liquidity_pools. The last two remaining tables (history_accounts and history_assets) require indexes in other tables like horizon_operation_participants. @ire-and-curses agreed that new indexes should be add in one of the future releases after further tests. EDIT: I was able to actually use existing indexes to reap history_accounts in #4525. The remaining one is history_assets which require an index (even multicolumn index would work) in tables where it's used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants