Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/ingest: Fix deadlock in parallel ingestion #5263

Merged
merged 3 commits into from
Apr 15, 2024

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Mar 27, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

I tried running the reingest command with 2 parallel workers and ran into the following postgres deadlock error which caused the reingestion to fail:

2024-03-27 09:13:38.176 UTC [80] ERROR:  deadlock detected
2024-03-27 09:13:38.176 UTC [80] DETAIL:  Process 80 waits for ShareLock on transaction 593; blocked by process 81.
	Process 81 waits for ShareLock on transaction 592; blocked by process 80.
	Process 80: 
		WITH r AS
			(SELECT unnest($1::character varying(64)[]) /* address */)
		INSERT INTO history_accounts
			(address)
		SELECT * from r
		ON CONFLICT (address) DO NOTHING
	Process 81: 
		WITH r AS
			(SELECT unnest($1::character varying(64)[]) /* address */)
		INSERT INTO history_accounts
			(address)
		SELECT * from r
		ON CONFLICT (address) DO NOTHING
2024-03-27 09:13:38.176 UTC [80] HINT:  See server log for query details.
2024-03-27 09:13:38.176 UTC [80] CONTEXT:  while inserting index tuple (532,3) in relation "history_accounts"
2024-03-27 09:13:38.176 UTC [80] STATEMENT:  
		WITH r AS
			(SELECT unnest($1::character varying(64)[]) /* address */)
		INSERT INTO history_accounts
			(address)
		SELECT * from r
		ON CONFLICT (address) DO NOTHING

This deadlock occurs because we have two go routines which both try to insert the same addresses into the history_accounts table but in a different order. For example:

  • At time t go routine 1 tries to insert account A.
  • At time t+1 go routine 2 tries to insert account B.
  • At time t+3 go routine 2 tries to insert account A and is blocked by go routine 1 which holds the ShareLock on row A. go routine 2 cannot progress until go routine 1 commits or rolls back its transaction.
  • At time t+4 go routine 1 tries to insert account B and is blocked by go routine 2 which holds the ShareLock on row B. Now there is a deadlock because there is a cycle of dependencies.

This deadlock scenario should not be possible because we sort the account addresses before inserting into history_accounts. But, it turns out that sorting alone is not sufficient to prevent this type of deadlock because of the data flow for parallel reingestion

Let's say we want to reingest ledgers 1 - 1000 with 2 parallel workers. Horizon will spin up 2 go routines. The first go routine will ingest ledgers 1 - 500 and the second go routine will ingest ledgers 501 - 1000. Each go routine will perform the ingestion of their subrange in a single transaction.

However, each go routine will further subdivide their subrange into batches of ledgers. Each batch will be ingested separately, which means that we will insert rows into history_accounts when ingesting each batch. So even if the account address are sorted in each individual batch, that does not mean the account addresses across all batches executed within a single transaction are sorted. Therefore, the possibility of a deadlock remains.

To fix this issue I have encapsulated the ingestion of each ledger batch within a single transaction. Each go routine will no longer perform the ingestion of their subrange in a single transaction, instead the go routines will execute multiple transactions (corresponding to each ledger batch) during their ingestion of the subrange.

With the change described above, a transaction in parallel ingestion may block on another if they're inserting common addresses into history_accounts, but there should never be a deadlock. However, I noticed that performance could be improved by having more fine grained transactions. The workflow of each transaction resembles the following:

BEGIN;

-- insert into lookup tables such as history_accounts, history_claimable_balances, etc
INSERT INTO history_accounts (address)
VALUES
('GA226TSKWZ2VSCJKPTXNH2N3E3UZTUMIBKPTIGCVMBGVMOQ5633D436Z'), ..., ('GA22GNOGDBCNJVTCGRAJSJ7YYRQOBBUD5HBAXO2MQ2ZFDPFOIVAQC3SW');
...

-- insert into lookup history tables such as history_transaction_participants, history_transactions, etc
INSERT INTO history_transaction_participants (history_account_id, history_transaction_id)
VALUES (23, 218982738697388032), ...;
...
COMMIT;

First, we insert into the lookup tables and then we insert into the history tables using the ids from the lookup tables. The first part of the transaction can block if there are other transactions which insert common addresses into history_accounts, however, the second part of the transaction should never block on other concurrent transactions.

So, by splitting the workflow into two transactions we can reduce contention and allow more concurrency when inserting rows into history tables:

BEGIN; -- this tx may block on other tx which insert common rows

-- insert into lookup tables such as history_accounts, history_claimable_balances, etc
INSERT INTO history_accounts (address)
VALUES
('GA226TSKWZ2VSCJKPTXNH2N3E3UZTUMIBKPTIGCVMBGVMOQ5633D436Z'), ..., ('GA22GNOGDBCNJVTCGRAJSJ7YYRQOBBUD5HBAXO2MQ2ZFDPFOIVAQC3SW');
...

COMMIT;

BEGIN; -- this tx should not block on any other running tx

-- insert into lookup history tables such as history_transaction_participants, history_transactions, etc
INSERT INTO history_transaction_participants (history_account_id, history_transaction_id)
VALUES (23, 218982738697388032), ...;
...
COMMIT;

After implementing that change, I observed a 10-15% speed up in parallel ingestion when testing locally on my laptop using 2 workers.

I think there could be other data flow changes which could also improve performance by reducing contention in the postgres transactions. One data flow which I think could be promising is:

  1. stream ledger transactions for a large amount of ledgers in parallel
  2. accumulate all lookup table entries in loaders which are shared by all the parallel workers
  3. insert rows in lookup tables within a single tx
  4. insert rows in history tables concurrently with 1 tx per worker

We can experiment with this idea in another PR

Known limitations

If individual transactions fail it could leave the DB in a partially ingested state. This can still be avoided by ingesting without concurrency (albeit without any performance speedups that concurrency gives you). But this limitation existed even prior to this change.

@tamirms tamirms marked this pull request as ready for review March 28, 2024 09:55
@tamirms tamirms requested a review from a team March 28, 2024 10:14
@sreuland
Copy link
Contributor

I think there could be other data flow changes which could also improve performance by reducing contention in the postgres transactions

is it worth taking step back from current lookup table based on a sequential id generator and explore any other table schema that might do better in current flow? since the application flows are now getting more complex due to non-app related db concerns? One option - lookup tables use a hash of the string lookup key to generate deterministicid and corresponding batch insert builders use UPSERT on the lookup tables which may avoid the table contention issues due to INSERTs.

@sreuland
Copy link
Contributor

Is there a issue/ticket that captures this for visibility on project side? it seems like this would be part of Sustainability/Object-8 and this time/effort can be accounted for.

@tamirms
Copy link
Contributor Author

tamirms commented Mar 28, 2024

I think there could be other data flow changes which could also improve performance by reducing contention in the postgres transactions

is it worth taking step back from current lookup table based on a sequential id generator and explore any other table schema that might do better in current flow? since the application flows are now getting more complex due to non-app related db concerns? One option - lookup tables use a hash of the string lookup key to generate deterministicid and corresponding batch insert builders use UPSERT on the lookup tables which may avoid the table contention issues due to INSERTs.

the challenge with that approach is dealing with collisions and also we'd have to reingest. but it's worth thinking about. however, that falls outside the scope of this pr

@tamirms
Copy link
Contributor Author

tamirms commented Mar 28, 2024

Is there a issue/ticket that captures this for visibility on project side? it seems like this would be part of Sustainability/Object-8 and this time/effort can be accounted for.

this PR fixes a bug so it doesn't get pointed

@sreuland
Copy link
Contributor

I think there could be other data flow changes which could also improve performance by reducing contention in the postgres transactions. One data flow which I think could be promising is:

@tamirms , one more change is to skip filtered tmp processor during reingest, pr , I could merge that into here or do it separately.

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tamirms tamirms enabled auto-merge (squash) April 15, 2024 16:52
@tamirms
Copy link
Contributor Author

tamirms commented Apr 15, 2024

@sreuland I think it would be easier to review that change separately. I've configured auto-merge on this PR so it should be merged soon

@tamirms tamirms merged commit 6bb4a44 into stellar:master Apr 15, 2024
23 checks passed
@sreuland
Copy link
Contributor

this PR fixes a bug so it doesn't get pointed

was there a GH issue on the bug?

@tamirms
Copy link
Contributor Author

tamirms commented Apr 15, 2024

was there a GH issue on the bug?

no

@devfed1
Copy link

devfed1 commented Apr 18, 2024

I can confirm that I encounter same error: job failed, recommended restart range: [544001, 800000]: error when processing [544001, 560000] range: error processing ledger range 557301 - 557400: Error flushing changes from processor: error during lazy loader resolution, *history.AccountLoader.Exec: exec failed: pq: deadlock detected
Screenshot 2024-04-18 at 15 29 42

Horizon version: 2.29.0-4a00bc7e1afbb895dc669853c98af8f6f6663223
go1.22.1

I need to monitor the ingest process and to restart the ingestion after failure.
I end up using the reingestion with no parallel-workers option.

@mollykarcher
Copy link
Contributor

@devfed1 this fix was released in version 2.30.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants