Ingestion performance fixes #316

bartekn · 2018-02-16T22:33:42Z

Ingestion code is making too many DB requests:

It inserts each effect, operation, etc. in a separate query.
It GetCreateAccountID is executed multiple times when checking row ID for a given AccountID.

Because of so many requests, ingesting a single transaction with 100 payment operations takes 200-500 ms. It can grow to 20 seconds per ledger causing delays in ingestion.

This PR introduces the following changes:

It batches INSERTs to the database.
It caches a result for GetCreateAccountID in memory. EDIT Cache no longer used. See comments below.

After implementing this changes ledger with 50 txs / 5000 operations is ingested in 3-5 seconds, ledger with 500 operations is ingested in 0.5 second.

However this is not ready to be merged. This needs to be addressed before merging:

What happens when account is merged? We need to invalidate cache. Fixed in 09fc13c.
For many effects/operations query length can exceed maximum number of params in Postgres. (ERRO[0078] import session failed: exec failed: pq: got 100000 parameters but PostgreSQL only supports 65535 parameters pid=15551). Fixed in 98ddb01.

Other thoughts:

~~We need to consider how quickly the cache can grow and what are the memory requirements.~~ Cache no longer used.
The cache makes sense when operations involve a small set of accounts. Otherwise, GetCreateAccountID may still be a bottleneck. Currently I can't find a better solution than DB schema redesign (so accounts are identified by AccountID instead of primary key in accounts table). Or maybe cache warmup on horizon init? Cache no longer used.
Something else?

tomerweller · 2018-02-16T23:06:40Z

services/horizon/internal/ingest/ingestion.go

@@ -192,7 +267,12 @@ func (ingest *Ingestion) Start() (err error) {
 		return
 	}

+	ingest.accountIDMapping = make(map[xdr.AccountId]int64)
+


As per your concerns about this exploding, how about using an LRU cache instead?
perhaps something like https://github.com/hashicorp/golang-lru

I think caching GetCreateAccountID results is just a temporary solution to allow us to ship this fix ASAP. The long-term solution can be one of the following:

More complicated code that gets all account IDs after the ledger is ingested, inserts new accounts and updates INSERT queries before executing them.

DB schema redesign so getting IDs is not requires (accounts are identified by their AccountID).

Something else?

bartekn · 2018-02-16T23:30:58Z

services/horizon/internal/ingest/ingestion.go

@@ -192,7 +271,12 @@ func (ingest *Ingestion) Start() (err error) {
 		return
 	}

+	ingest.accountIDMapping = make(map[xdr.AccountId]int64)


This cache is per ingestion session. We should probably keep it between sessions.

bartekn · 2018-02-16T23:32:44Z

services/horizon/internal/ingest/session.go

+		elapsed := time.Now().Sub(start2)
+		fmt.Println("ingestTransaction", elapsed)
+	}()
+


This can be removed after tests.

nullstyle · 2018-02-20T16:10:55Z

Before I type out my full comments, I just want to put this here: https://godoc.org/github.com/lib/pq#hdr-Bulk_imports

We should probably look into using the above method for performing this bulk work as opposed to manually building huge sql statements with 60,000 bound parameters.

bartekn · 2018-02-20T16:57:53Z

Before I type out my full comments, I just want to put this here: https://godoc.org/github.com/lib/pq#hdr-Bulk_imports

We should probably look into using the above method for performing this bulk work as opposed to manually building huge sql statements with 60,000 bound parameters.

Yeah, we should explore possible solutions but when it comes to bulk imports with COPY ... FROM STDIN I'm afraid that it will also be slow. What I learnt from profiling the ingestion code is that there is an overhead for each query/statement sent to the DB. When ingestion is inserting very large number of rows the overhead time of executing a query can sum up to seconds. After checking the code of godoc.org/github.com/lib/pq it looks like it sends each row in a separate query - so this will still add overhead time for each row. (I haven't tested it yet so I can be wrong.)

The solution proposed in this PR can add thousands of rows in a single query to a DB by using INSERT INTO ... VALUES (...), (...), ... queries. So additional cost of executing queries is constant when number of rows is smaller than hundreds of thousands (Postgres accept queries with up to 65k parameters).

bartekn · 2018-02-21T02:02:23Z

OK, in e07a7d8 in-memory cache has been removed completely and now account IDs are batch loaded/inserted once per session (check Ingestion.UpdateAccountIDs). This should work very well in any stress-test scenario because it will execute only 2 DB queries per session (whereas cache could be easily exploited to touch DB for each account).

Also improved the code for batch inserting:

Rows in tables that require account ID are updated with loaded/inserted IDs once per session (just before batch-insert).
Introduced row interface in ingest package that should largely improve batch-insert code. Used this for tables with the largest insert rate, but ultimately this can be used for all tables.

Tests are broken now, will check them this week.

bartekn · 2018-02-26T19:37:58Z

@nullstyle I checked the performance of COPY IN vs batch inserts and COPY IN is a winner (2x faster for 65000 rows inserted). I'm going to switch to this query tomorrow.

EDIT Actually query building can take some time when testing batch inserts however COPY IN is more elegant and better in our use case (we won't have to track the number of params to be inserted).

bartekn · 2018-02-27T19:07:09Z

After changing the ingestion code to use COPY ... FROM STDIN I realized that this query has a few drawbacks:

It changes the connection mode to something we can call reading rows mode. In this mode you can't send any SQL queries (even ROLLBACK or COMMIT) until the input is ended by calling stmt.Exec() (with no arguments, what closes the input and switches back to query mode). Currently, it's possible to move all COPY queries at the very end of ingestion and we're using a single thread but I can imagine that in a future this may not be possible and we'll have to revert the code. It still requires some code changes to successfully ROLLBACK transaction in case of errors.
In case of errors COPY queries return errors that are hard to debug (like: pq: extra data after last expected column).
It also requires to change a lot of code for special field types like arrays. Functions like sqx.StringArray no longer work as expected.

Overall COPY ... FROM STDIN require developers to be super careful about how they use it, limits them to execute queries in specific order and can become hell if we ever decide to use a Go routines in ingestion code.

The solution present in this PR (batch-inserts) is slower than COPY but we're talking about 1 sec. vs 2 sec. execution time for ingesting 5000 operations which is also great compared to 30 seconds before changes.

I'm not comfortable with pushing this to production without proper, extensive testing. So I'm going to push the code I wrote today to another branch: ingest-perf-fixes-copy-stdin if we ever decide to explore this solution again. And now I'm going to finish working on the current solution.

Talked with @nullstyle and he agrees.

nullstyle · 2018-02-27T19:37:17Z

so @bartekn should I be reviewing this PR for inclusion into master or do you have some more work you want to do before review?

bartekn · 2018-02-27T19:41:17Z

I need to fix tests and want to move all tables in ingest to use row interface. But I think you can check if it's going in the right direction.

nullstyle

In general, I'm not happy with how much code is motion in this PR. The general flow of ingestion was already primed for batching and I'm skeptical we need to make this scale of change to the code to get our performance up.

nullstyle · 2018-02-28T18:25:25Z

services/horizon/internal/ingest/ingestion.go

-		"type",
-		"details",
-	)
+func (ingest *Ingestion) createInsertBuilderByTableName(name TableName) {


Keeping this function up to date is going to be a pain.

nullstyle · 2018-02-28T18:50:49Z

services/horizon/internal/ingest/ingestion.go

@@ -101,34 +143,28 @@ func (ingest *Ingestion) Ledger(
 	header *core.LedgerHeader,
 	txs int,
 	ops int,
-) error {
-
-	sql := ingest.ledgers.Values(


Lets use this hunk to express another concern I have with this code:

The approach you take for reifying table rows is unnecessary and produces a maintenance headache. Instead of replacing this code with something brand new, instead you should replace sq.InsertBuilder with a custom type named something similar to ingest.BatchInsertBuilder.

This new type would still accept a call to Values as the original type does, but it simply doesn't return a sql statement and we don't run ingest.DB.Exec(sql)

Does that make sense?

OK, this makes sense. But we still need some way to figure out which param is address and needs to be replaced with ID later. What do you think about creating a new type like:

type Address string

And we will be passing values like:

ingest.TransactionParticipantBatchInsertBuilder.Values(Address("G..."), param1, param2)

Then ingest.BatchInsertBuilder will have a method to return all of addresses we need to get/create. Sounds good?

I'm cool with that approach. sounds good!

nullstyle · 2018-02-28T19:01:36Z

services/horizon/internal/ingest/main.go

+	GetParams() []interface{}
+	// UpdateAccountIDs updates fields with account IDs by using provided
+	// address => id mapping.
+	UpdateAccountIDs(accounts map[string]int64)


I don't agree with this approach to translating addresses into the synthetic account ids used by the history system.

The cleaner approach is add an additional phase to each round of ingestion:

Collect all accounts that are involved in a ledger and get-or-create the history_accounts rows as a batch, populating an in-memory map that phase 2 can refer to

the rest of ingestion

I see. We can do this (A) or this (B). Which one is better for you? IMHO, seems to be a little better because we won't have to update a function collecting addresses if a new table is created but both are OK to me.

bartekn · 2018-03-01T18:41:57Z

Too much Git magic, master untouched. Please review: #338

bartekn requested review from nullstyle, tomerweller and nikhilsaraf February 16, 2018 22:33

bartekn added the in progress label Feb 16, 2018

tomerweller reviewed Feb 16, 2018

View reviewed changes

bartekn added the horizon label Feb 16, 2018

bartekn commented Feb 16, 2018

View reviewed changes

bartekn closed this Feb 26, 2018

bartekn removed the in progress label Feb 26, 2018

bartekn reopened this Feb 26, 2018

bartekn added the in progress label Feb 26, 2018

nullstyle suggested changes Feb 28, 2018

View reviewed changes

bartekn closed this Mar 1, 2018

bartekn force-pushed the ingest-perf-fixes branch from 41667b3 to 0009f1f Compare March 1, 2018 18:39

bartekn removed the in progress label Mar 1, 2018

bartekn mentioned this pull request Mar 1, 2018

Ingestion performance fixes #338

Merged

bartekn mentioned this pull request Nov 22, 2021

support/db: PoC: Use COPY instead of INSERT in BatchInsertBuilder #4094

Closed

2 tasks

tamirms mentioned this pull request Jun 15, 2023

Update DB layer to use COPY to insert into history tables #4908

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingestion performance fixes #316

Ingestion performance fixes #316

bartekn commented Feb 16, 2018 •

edited

Loading

tomerweller Feb 16, 2018

bartekn Feb 16, 2018 •

edited

Loading

bartekn Feb 16, 2018

bartekn Feb 16, 2018

nullstyle commented Feb 20, 2018

bartekn commented Feb 20, 2018 •

edited

Loading

bartekn commented Feb 21, 2018 •

edited

Loading

bartekn commented Feb 26, 2018 •

edited

Loading

bartekn commented Feb 27, 2018 •

edited

Loading

nullstyle commented Feb 27, 2018

bartekn commented Feb 27, 2018 •

edited

Loading

nullstyle left a comment

nullstyle Feb 28, 2018

nullstyle Feb 28, 2018

bartekn Feb 28, 2018

nullstyle Feb 28, 2018

nullstyle Feb 28, 2018

bartekn Feb 28, 2018

bartekn commented Mar 1, 2018 •

edited

Loading

Ingestion performance fixes #316

Ingestion performance fixes #316

Conversation

bartekn commented Feb 16, 2018 • edited Loading

Choose a reason for hiding this comment

bartekn Feb 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nullstyle commented Feb 20, 2018

bartekn commented Feb 20, 2018 • edited Loading

bartekn commented Feb 21, 2018 • edited Loading

bartekn commented Feb 26, 2018 • edited Loading

bartekn commented Feb 27, 2018 • edited Loading

nullstyle commented Feb 27, 2018

bartekn commented Feb 27, 2018 • edited Loading

nullstyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn commented Mar 1, 2018 • edited Loading

bartekn commented Feb 16, 2018 •

edited

Loading

bartekn Feb 16, 2018 •

edited

Loading

bartekn commented Feb 20, 2018 •

edited

Loading

bartekn commented Feb 21, 2018 •

edited

Loading

bartekn commented Feb 26, 2018 •

edited

Loading

bartekn commented Feb 27, 2018 •

edited

Loading

bartekn commented Feb 27, 2018 •

edited

Loading

bartekn commented Mar 1, 2018 •

edited

Loading