Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/db2/history: Implement account loader and future account ids #5015

Merged
merged 2 commits into from
Aug 14, 2023

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Aug 11, 2023

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Some of the history tables have columns referencing accounts. However, these columns do not include the text representation of an account address. Instead, they include an integer id which points to the account address in the history_accounts table. When we ingest into history tables that have account integer id columns we first have to lookup the integer id from the history_accounts table and then we can construct and insert the row into the history table.

This PR introduces the concept of an account loader which encapsulates the management of account integer ids. Using the account loader we can more easily refactor the ingestion data flow. Whenever the history processor encounters an account string in ProcessTransaction(), the processor can register the account string in the loader component and store the resulting FutureAccountID in a FastBatchInsertBuilder. We can ensure that Exec() is called on the account loader before it's called on the FastBatchInsertBuilder.

	accountLoader := history.NewAccountLoader()
	processors := buildTransactionProcessors(
		s.historyQ,
		accountLoader,
	)

       // apply all the ledgers in the batch on the processors
       for _, ledger := range ledgers {
		if err = s.runner.ApplyProcessorsOnLedger(processors, ledgerCloseMeta); err != nil {
			return err
		}
       }
	if err := s.historyQ.Begin(); err != nil {
		return errors.Wrap(err, "Error starting a transaction")
	}
	defer s.historyQ.Rollback()

       // use the loader to lookup all the accounts registered by the processors
        if err := accountLoader.Exec(s.ctx, s.historyQ); err != nil {
	        return err
        }

        // flush the rows to the db, the processors will be able to obtain the integer ids from the loaders
	if err := processors.Flush(s.ctx, s.historyQ); err != nil {
		return err
	}
	if err := s.historyQ.Commit(); err != nil {
		return errors.Wrap(err, commitErrMsg)
	}

Why

This PR is required to implement #4909

Known limitations

[N/A]

@tamirms tamirms requested a review from a team August 11, 2023 17:15
for address := range a.set {
addresses = append(addresses, address)
}
// sort entries before inserting rows to prevent deadlocks on acquiring a ShareLock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a prior issue that explains this aspect further, since ingestion is single threaded and all this happens in same db tx, trying to understand where it could happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingestion is single threaded. however, it is possible to do parallel reingestion with multiple workers where each worker has a separate but concurrent transaction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then there is potential that same accounts could be processed from different ledger ranges of different worker threads at same time, how does sorting in that case further avoid db deadlock, I'm not suggesting to remove it, just to understand. It sticks out in the application code as performing unrelated complexity that would have expected to be mitigated at db level with tx repeatable read isolation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's say there are two workers ingesting different ledger ranges. In both the ledger ranges the workers have to insert the same set of accounts into the history_accounts table. If the workers insert the accounts in the same order they will avoid a deadlock because the worker who wins the race will acquire the lock and the other worker will block until the transaction is complete. Consider the worst case scenario if worker 1 inserts accounts A, B, C and worker 2 inserts accounts C, B, A. Let's say worker 1 is faster so it inserts accounts A and B. Then worker 2 inserts account C. When worker 1 tries to insert account C there will be a deadlock because worker 2 already has a lock on that row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think changing the transaction isolation level could avoid the deadlock

// GetNow should only be called on values which were registered by
// GetFuture() calls. Also, Exec() must be called before any GetNow
// call can succeed.
func (a AccountLoader) GetNow(address string) (int64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could follow the map interface and return bool for indication of presence rather than error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the loader is used correctly then the absence of an account is always an error. You should never lookup an address which you have never inserted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this so that GetNow() panics instead of returning an error


insert := 0
for _, address := range addresses {
if _, ok := a.ids[address]; ok {
Copy link
Contributor

@sreuland sreuland Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you apply this filter that skips 'set' addresses which are already resolved in ids up in the loop that builds the query on line 100, to avoid it going through db i/o?

nvm, I don't think there could be a case where set initially overlaps with any keys in ids at the start of Exec.

if insert == 0 {
return nil
}
addresses = addresses[:insert]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat, re-used the same array in place

)
}

sql := `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be compelling at some point if we have time to consider evaluating sql code options that provide compile time type safety on sql statements, less embedded string fragments/concat, like gojet. it may not even be viable for our setup, but worth keeping in mind.

return err
}

return a.lookupKeys(ctx, q, addresses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the AccountLoader.set be cleared at this point, since those 'future' requests have now been realized into AccountLoader.ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended that the account loader should only be used once and not be reused. I will add some code to enforce that intention.

@sreuland
Copy link
Contributor

We can ensure that Exec() is called on the account loader before it's called on the FastBatchInsertBuilder.

it looks like in example code processors would have a ref to accountLoader, so the processors.Flush could invoke accountLoader.Exec at appropriate time internally, and caller would not have to orchestrate that ordering?

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great design!

@tamirms
Copy link
Contributor Author

tamirms commented Aug 14, 2023

it looks like in example code processors would have a ref to accountLoader, so the processors.Flush could invoke accountLoader.Exec at appropriate time internally, and caller would not have to orchestrate that ordering?

We will actually want to orchestrate the ordering because in parallel reingestion we want the account loader to execute in a separate db transaction from the rest of ingestion. The reason behind that is because parallel workers are likely to have their db transactions block on insertions to the history_account tables (e.g. when multiple workers try to insert the same row). So if we isolate the call to accountLoader.Exec() to a separate transaction that will reduce the time where workers are blocked due to row locks on the history_account tables.

@tamirms tamirms merged commit a630fcb into stellar:ingestion-next Aug 14, 2023
1 of 14 checks passed
@tamirms tamirms deleted the account-loader branch August 14, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants