Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/db2/history: Mitigate deadlock when running reingest range #2373

Merged
merged 6 commits into from
Mar 10, 2020

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Mar 9, 2020

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Fixes #2370

  • Remove database functions which were previously used by legacy ingestion but now are unused
  • Sort assets before inserting rows into history_assets
  • Sort accounts before inserting rows into history_accounts
  • Ingest each ledger in reingest range in separate transaction

Why

Running horizon db reingest range multiple times in parallel sometimes causes a deadlock in Postgres where multiple connections inserting into history_accounts are blocked trying to acquire a ShareLock.

ShareLocks are row locks that are taken on each row as it is written by the transaction. ShareLock deadlocks occurs when two separate transactions try to write the same rows in opposite order.

That is why assets and accounts are sorted before insertion.

Another change in this PR which should mitigate against deadlocks is keeping transactions short lived. Previously, we reingested the entire range in just one transaction. If you have multiple reingesting processes which are running in parallel over very long ranges that increases the chances there will be a deadlock.

Known limitations

[N/A]

@cla-bot cla-bot bot added the cla: yes label Mar 9, 2020
@tamirms tamirms force-pushed the sort-assets-accounts branch from fa6c361 to 179261c Compare March 9, 2020 22:47
@tamirms tamirms force-pushed the sort-assets-accounts branch from 179261c to 0994d6a Compare March 10, 2020 09:19
@tamirms tamirms marked this pull request as ready for review March 10, 2020 11:05
@tamirms tamirms requested review from bartekn and abuiles March 10, 2020 11:05
Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me. I think a single thing is worth changing. By removing createTx param in ingestRange and moving the code responsible for begin/commit/rollback here we solve two potential issues:

  • Incorrect usage of ingestRange - if a DB transaction is not started in a caller function we should never insert data without a DB transaction because in case of errors the ledger data will be inserted partially,
  • It's hard to track DB transaction flow - currently DB transaction can be started in two different places and both functions need to make sure they know about one another, it was similar in FSM before where some states expected to be in a DB tx and some didn't.

Copy link
Contributor

@abuiles abuiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tamirms tamirms merged commit 85727d7 into stellar:master Mar 10, 2020
@tamirms tamirms deleted the sort-assets-accounts branch March 10, 2020 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock when running db reingest range on multiple instances
3 participants