Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Reap history object tables when ingestion is idle #4518

Merged
merged 9 commits into from
Aug 9, 2022

Conversation

bartekn
Copy link
Contributor

@bartekn bartekn commented Aug 7, 2022

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

This commit adds code responsible for clearing orphaned rows in lookup historical tables. Orphaned rows can appear when old data is removed by reaper. The new code is separate from the existing reaper code (see "Alternative solutions" below) and activates after each ledger if there are no more ledgers to ingest in the backend. This has two advantages: it does not slow down catchup and it works only when ingestion is idle which shouldn't affect ingestion at all. To ensure performance is not affected, the ReapLookupTables method is called with 5 seconds context timeout which means that if it does not finish the work in specified time it will simply be cancelled.

The solution here requires new indexes added in c2d52f0 (without it finding the rows to delete is slow). For each lookup table, we check the number of occurences of a given lookup ID in all the tables in which lookup table is used. If no occurences are found, the row is removed from lookup table.

Rows are removed in batches of 10000 rows (can be modified in the future). The cursor is updated when tables is processed so after next ledger ingesion the next chunk of rows is checked. When cursor reaches the end of table it is reset back to 0. This ensures that all the orphaned rows are removed eventually (some rows can be skipped because new rows are added to lookup tables by ingestion and some are removed by reaper so offset does not always skip to the place is should to cover entire table).

Close [partially] #4396.

Alternative solutions

While working on this I tried to implement @fons'es idea from #4396 which was removing rows before clearing historical data which are not present in other ranges. There is a general problem with this solution. The lookup tables are actively used by ingestion which means that if rows are deleted while ingestion reads a given row it can create inconsistent data. We could modify reaper to aquire ingestion lock but if there are many ledgers to remove it can affect ingestion.

We could also write a query that finds and removes all the orphaned rows but it's too slow to be executed between ingestion of two consecutive ledgers.

Why

While Horizon removes history data when --history-retention-count flag is set it doesn't clear lookup historical tables. Lookup tables are [id, key name] pairs that allow setting pointers to keys in historical tables, thus saving disk space. This data can occupy a vast space on disk and is never used when old historical data is deleted.

Known limitations

[TODO or N/A]

@bartekn bartekn marked this pull request as ready for review August 8, 2022 15:45
@bartekn bartekn requested a review from a team August 8, 2022 15:45
services/horizon/internal/db2/history/main.go Show resolved Hide resolved
Comment on lines +917 to +920
// In short it checks the 100 rows omiting 1000 row of history_claimable_balances
// and counts occurences of each row in corresponding history tables.
// If there are no history rows for a given id, the row in
// history_claimable_balances is removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting way to determine "age." It creates a dependence between the history tables, right? Is there a reason we don't rely on ledger close time, instead? I guess probably because history_claimable_balances doesn't have that row 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really about age. The rows are sorted by id which is just a sequence integer value assigned to specific ledger object (like claimable balance). The limit ... offset listing here is just to ensure we iterate over entire table in multiple cycles.

services/horizon/internal/ingest/main.go Outdated Show resolved Hide resolved
@Shaptic Shaptic requested a review from a team August 8, 2022 16:54
@bartekn
Copy link
Contributor Author

bartekn commented Aug 9, 2022

I'm going to merge this without 👍 to start testing a new release. Please add comments and I'll open another PR with review fixes/requests.

@bartekn bartekn merged commit ee063a7 into stellar:master Aug 9, 2022
@bartekn bartekn deleted the reap-history-objects branch August 9, 2022 09:18
bartekn added a commit that referenced this pull request Sep 6, 2022
Adds the last remaining table: `history_assets` to lookup table reaper.

In #4518 the code responsible for reaping lookup tables was added but was
missing one table: `history_assets` due to lack of proper indexes. This commit
should remove all remaining data in lookup tables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants