Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion performance fixes #316

Closed
wants to merge 0 commits into from
Closed

Ingestion performance fixes #316

wants to merge 0 commits into from

Conversation

bartekn
Copy link
Contributor

@bartekn bartekn commented Feb 16, 2018

Ingestion code is making too many DB requests:

  • It inserts each effect, operation, etc. in a separate query.
  • It GetCreateAccountID is executed multiple times when checking row ID for a given AccountID.

Because of so many requests, ingesting a single transaction with 100 payment operations takes 200-500 ms. It can grow to 20 seconds per ledger causing delays in ingestion.

This PR introduces the following changes:

  • It batches INSERTs to the database.
  • It caches a result for GetCreateAccountID in memory. EDIT Cache no longer used. See comments below.

After implementing this changes ledger with 50 txs / 5000 operations is ingested in 3-5 seconds, ledger with 500 operations is ingested in 0.5 second.

However this is not ready to be merged. This needs to be addressed before merging:

  • What happens when account is merged? We need to invalidate cache. Fixed in 09fc13c.
  • For many effects/operations query length can exceed maximum number of params in Postgres. (ERRO[0078] import session failed: exec failed: pq: got 100000 parameters but PostgreSQL only supports 65535 parameters pid=15551). Fixed in 98ddb01.

Other thoughts:

  • We need to consider how quickly the cache can grow and what are the memory requirements. Cache no longer used.
  • The cache makes sense when operations involve a small set of accounts. Otherwise, GetCreateAccountID may still be a bottleneck. Currently I can't find a better solution than DB schema redesign (so accounts are identified by AccountID instead of primary key in accounts table). Or maybe cache warmup on horizon init? Cache no longer used.
  • Something else?

@@ -192,7 +267,12 @@ func (ingest *Ingestion) Start() (err error) {
return
}

ingest.accountIDMapping = make(map[xdr.AccountId]int64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per your concerns about this exploding, how about using an LRU cache instead?
perhaps something like https://github.com/hashicorp/golang-lru

Copy link
Contributor Author

@bartekn bartekn Feb 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think caching GetCreateAccountID results is just a temporary solution to allow us to ship this fix ASAP. The long-term solution can be one of the following:

  • More complicated code that gets all account IDs after the ledger is ingested, inserts new accounts and updates INSERT queries before executing them.
  • DB schema redesign so getting IDs is not requires (accounts are identified by their AccountID).
  • Something else?

@@ -192,7 +271,12 @@ func (ingest *Ingestion) Start() (err error) {
return
}

ingest.accountIDMapping = make(map[xdr.AccountId]int64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache is per ingestion session. We should probably keep it between sessions.

elapsed := time.Now().Sub(start2)
fmt.Println("ingestTransaction", elapsed)
}()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed after tests.

@nullstyle
Copy link
Contributor

Before I type out my full comments, I just want to put this here: https://godoc.org/github.com/lib/pq#hdr-Bulk_imports

We should probably look into using the above method for performing this bulk work as opposed to manually building huge sql statements with 60,000 bound parameters.

@bartekn
Copy link
Contributor Author

bartekn commented Feb 20, 2018

Before I type out my full comments, I just want to put this here: https://godoc.org/github.com/lib/pq#hdr-Bulk_imports

We should probably look into using the above method for performing this bulk work as opposed to manually building huge sql statements with 60,000 bound parameters.

Yeah, we should explore possible solutions but when it comes to bulk imports with COPY ... FROM STDIN I'm afraid that it will also be slow. What I learnt from profiling the ingestion code is that there is an overhead for each query/statement sent to the DB. When ingestion is inserting very large number of rows the overhead time of executing a query can sum up to seconds. After checking the code of godoc.org/github.com/lib/pq it looks like it sends each row in a separate query - so this will still add overhead time for each row. (I haven't tested it yet so I can be wrong.)

The solution proposed in this PR can add thousands of rows in a single query to a DB by using INSERT INTO ... VALUES (...), (...), ... queries. So additional cost of executing queries is constant when number of rows is smaller than hundreds of thousands (Postgres accept queries with up to 65k parameters).

@bartekn
Copy link
Contributor Author

bartekn commented Feb 21, 2018

OK, in e07a7d8 in-memory cache has been removed completely and now account IDs are batch loaded/inserted once per session (check Ingestion.UpdateAccountIDs). This should work very well in any stress-test scenario because it will execute only 2 DB queries per session (whereas cache could be easily exploited to touch DB for each account).

Also improved the code for batch inserting:

  • Rows in tables that require account ID are updated with loaded/inserted IDs once per session (just before batch-insert).
  • Introduced row interface in ingest package that should largely improve batch-insert code. Used this for tables with the largest insert rate, but ultimately this can be used for all tables.

Tests are broken now, will check them this week.

@bartekn
Copy link
Contributor Author

bartekn commented Feb 26, 2018

@nullstyle I checked the performance of COPY IN vs batch inserts and COPY IN is a winner (2x faster for 65000 rows inserted). I'm going to switch to this query tomorrow.

EDIT Actually query building can take some time when testing batch inserts however COPY IN is more elegant and better in our use case (we won't have to track the number of params to be inserted).

@bartekn
Copy link
Contributor Author

bartekn commented Feb 27, 2018

After changing the ingestion code to use COPY ... FROM STDIN I realized that this query has a few drawbacks:

  • It changes the connection mode to something we can call reading rows mode. In this mode you can't send any SQL queries (even ROLLBACK or COMMIT) until the input is ended by calling stmt.Exec() (with no arguments, what closes the input and switches back to query mode). Currently, it's possible to move all COPY queries at the very end of ingestion and we're using a single thread but I can imagine that in a future this may not be possible and we'll have to revert the code. It still requires some code changes to successfully ROLLBACK transaction in case of errors.
  • In case of errors COPY queries return errors that are hard to debug (like: pq: extra data after last expected column).
  • It also requires to change a lot of code for special field types like arrays. Functions like sqx.StringArray no longer work as expected.

Overall COPY ... FROM STDIN require developers to be super careful about how they use it, limits them to execute queries in specific order and can become hell if we ever decide to use a Go routines in ingestion code.

The solution present in this PR (batch-inserts) is slower than COPY but we're talking about 1 sec. vs 2 sec. execution time for ingesting 5000 operations which is also great compared to 30 seconds before changes.

I'm not comfortable with pushing this to production without proper, extensive testing. So I'm going to push the code I wrote today to another branch: ingest-perf-fixes-copy-stdin if we ever decide to explore this solution again. And now I'm going to finish working on the current solution.

Talked with @nullstyle and he agrees.

@nullstyle
Copy link
Contributor

so @bartekn should I be reviewing this PR for inclusion into master or do you have some more work you want to do before review?

@bartekn
Copy link
Contributor Author

bartekn commented Feb 27, 2018

I need to fix tests and want to move all tables in ingest to use row interface. But I think you can check if it's going in the right direction.

Copy link
Contributor

@nullstyle nullstyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm not happy with how much code is motion in this PR. The general flow of ingestion was already primed for batching and I'm skeptical we need to make this scale of change to the code to get our performance up.

"type",
"details",
)
func (ingest *Ingestion) createInsertBuilderByTableName(name TableName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this function up to date is going to be a pain.

@@ -101,34 +143,28 @@ func (ingest *Ingestion) Ledger(
header *core.LedgerHeader,
txs int,
ops int,
) error {

sql := ingest.ledgers.Values(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use this hunk to express another concern I have with this code:

The approach you take for reifying table rows is unnecessary and produces a maintenance headache. Instead of replacing this code with something brand new, instead you should replace sq.InsertBuilder with a custom type named something similar to ingest.BatchInsertBuilder.

This new type would still accept a call to Values as the original type does, but it simply doesn't return a sql statement and we don't run ingest.DB.Exec(sql)

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this makes sense. But we still need some way to figure out which param is address and needs to be replaced with ID later. What do you think about creating a new type like:

type Address string

And we will be passing values like:

ingest.TransactionParticipantBatchInsertBuilder.Values(Address("G..."), param1, param2)

Then ingest.BatchInsertBuilder will have a method to return all of addresses we need to get/create. Sounds good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with that approach. sounds good!

GetParams() []interface{}
// UpdateAccountIDs updates fields with account IDs by using provided
// address => id mapping.
UpdateAccountIDs(accounts map[string]int64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this approach to translating addresses into the synthetic account ids used by the history system.

The cleaner approach is add an additional phase to each round of ingestion:

  1. Collect all accounts that are involved in a ledger and get-or-create the history_accounts rows as a batch, populating an in-memory map that phase 2 can refer to
  2. the rest of ingestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We can do this (A) or this (B). Which one is better for you? IMHO, seems to be a little better because we won't have to update a function collecting addresses if a new table is created but both are OK to me.

@bartekn bartekn closed this Mar 1, 2018
@bartekn bartekn force-pushed the ingest-perf-fixes branch from 41667b3 to 0009f1f Compare March 1, 2018 18:39
@bartekn
Copy link
Contributor Author

bartekn commented Mar 1, 2018

Too much Git magic, master untouched. Please review: #338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants