Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: Precompute 1m Trade Aggregation buckets #3641

Merged

Conversation

paulbellamy
Copy link
Contributor

@paulbellamy paulbellamy commented May 28, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Incremental pre-computation for the 1-minute trade-aggregation buckets with a database-trigger. Compute higher-resolution queries on-the-fly.

Why

Fixes #632

Per discussion pre-computing seems the best option for performance and scalability, as the amount of computation is known and bounded. The concern is disk usage. Given a raw history_trades table of 4gb, a naive 1m bucket pre-computation results in a new table approx 1.6GB. Not outrageous, but could grow quickly if more trading pairs become active. When generating that as a materialized view, I accidentally forgot to include the base_asset_id, and counter_asset_id, and noticed the table was much smaller. To compress the table this PR uses wide jsonb columns, so there is one row per year/month/asset-pair, with a values column containing the month's 1m buckets. This adds complexity and a small overhead on querying (~15%), but saves a further ~70% on disk size.

In order to handle the offset param the maximum resolution we could potentially pre-compute is 15 minutes. For this PR, I've only pre-computed the 1 minute buckets, and aggregated higher resolutions on the fly. As higher resolutions are quite quick to aggregate on-the-fly. In my past experience this has been enough, as the 1m aggregation gives higher resolutions a big boost. (TODO: benchmark this for concrete numbers)

Initially I hoped we could just use a materialized view to do the pre-computation (to avoid changes to horizon ingestion code). However, updating materialized views is all-or-nothing, and takes ~7 minutes on my laptop. Instead, this PR uses a trigger to incrementally update the relevant buckets on insert. This implementation is still quite fragile, in that it doesn't handle UPDATE or DELETE queries, and assumes all trades are inserted in-order. Non-incremental recompute of a 1m bucket takes ~300ms. I didn't want to add that overhead to each insert, but it might be an easy way to handle update/delete queries.

Known limitations

This trigger is still quite fragile, in that it doesn't handle UPDATE or DELETE queries, and assumes all trades are inserted in-order.

The reduced query load on history_trades means we could probably get rid of some of the now-unused indexes on that table, which would save a few gigabytes of disk.

TODO

  • fix high/low calculation
  • fix open/close calculation
  • benchmark on-the-fly higher-resolution aggregations
  • try benchmarks without jsonb as well to see if disk usage tradeoff is worth it or not.
  • handle offsets correctly in the new sql queries
  • handle reingestion (delete buckets, insert, rebuild start+end buckets)
  • try rebuilding buckets from go (instead of db triggers)
    • test this actually works
    • check this is actually fast enough to not slow down ingestion
  • testing the complicated sql aggregation queries (or pre-compute 5 and 15 minute intervals)
  • Check it works with postgres 9.6
  • Figure out if we need to create the index concurrently
  • benchmark with big data
    • 1m queries
    • 1d queries
    • ingestion
    • reingestion

Later

  • Figure out if we can drop any old indexes on the history_trades.

@paulbellamy
Copy link
Contributor Author

For reading this, I'd recommend starting with the migration, as it'll help make sense of the rest.

Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good. Looking forward to some response durations stats. My only worry is reingestion in ranges. In such case ranges can be ingested in non-increasing order so the triggers here will calculate values wrong.

I think we can fix this but it requires adding an extra step once reingestion is done:

  • We move the trigger to Go code and run it after a ledger is ingested using normal operations (not reingestion).
  • In reingestion code we add an extra step that runs once all ranges are completed which is responsible for calculating 1m resolutions.

@bartekn
Copy link
Contributor

bartekn commented Jun 2, 2021

Idea to fix open and close when reingesting out of order. We can add two new columns open_ledger_seq and close_ledger_seq that will indicate the sequence number of the previously set open/close price. If a new ledger in a given bucket appears and it's before open_ledger_seq then we update the open price, otherwise open is not updated. We update close price in a similar way.

Please note that the open/close price will still be incorrect if the bucket was ingested partially.

@paulbellamy
Copy link
Contributor Author

In reingestion code we add an extra step that runs once all ranges are completed which is responsible for calculating 1m resolutions.

This seems like a good plan. We actually only need to completely-recalculate the first and last buckets, as those are the only ones which would be only partially-ingested.

@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 3e88881 to c1fe1e5 Compare June 10, 2021 11:26
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 93e3b55 to 1ac4160 Compare June 10, 2021 17:14
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 1ac4160 to 0c002ac Compare June 10, 2021 17:15
@paulbellamy
Copy link
Contributor Author

paulbellamy commented Jun 10, 2021

Ok, so benchmarking on my local laptop. Let's find the biggest most-traded pairs, to set a worst-case scenario bound...

Top asset-pairs query
postgres=# select base_asset_id, counter_asset_id, count(*) from history_trades group by base_asset_id, counter_asset_id order by count desc limit 10;
 base_asset_id | counter_asset_id |  count
---------------+------------------+---------
             2 |              149 | 2819451
             2 |         17581803 | 1660065
             2 |         27869101 | 1424528
             2 |             2351 |  761635
             2 |             1519 |  616861
             2 |            40569 |  604979
             2 |            28007 |  590901
      20772946 |         20772947 |  562536
             2 |         46998801 |  555386
             2 |          1649457 |  449990

So base_asset_id: 2, counter_asset_id: 149 are the biggest (with 70% more than the second).

Pre-computed 1m buckets query: ~900ms
postgres=# EXPLAIN ANALYZE SELECT
  (j.value->>'timestamp')::bigint as timestamp,
  (j.value->>'count')::integer as count,
  (j.value->>'base_volume')::bigint as base_volume,
  (j.value->>'counter_volume')::bigint as counter_volume,
  ARRAY[(j.value->>'high_n')::bigint, (j.value->>'high_d')::bigint] as high,
  ARRAY[(j.value->>'low_n')::bigint, (j.value->>'low_d')::bigint] as low,
  ARRAY[(j.value->>'open_n')::bigint, (j.value->>'open_d')::bigint] as open,
  ARRAY[(j.value->>'close_n')::bigint, (j.value->>'close_d')::bigint] as close
FROM
  history_trades_60000 h, jsonb_each(h.values) j
WHERE h.base_asset_id = '2'
  AND h.counter_asset_id = '149'
ORDER BY timestamp DESC LIMIT 200;


                                                                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=957.86..958.36 rows=200 width=156) (actual time=910.185..910.196 rows=200 loops=1)
   ->  Sort  (cost=957.86..958.61 rows=300 width=156) (actual time=910.183..910.188 rows=200 loops=1)
         Sort Key: (((j.value ->> 'timestamp'::text))::bigint) DESC
         Sort Method: top-N heapsort  Memory: 131kB
         ->  Nested Loop  (cost=0.29..945.51 rows=300 width=156) (actual time=0.356..848.402 rows=371658 loops=1)
               ->  Index Scan using history_trades_60000_year_month_base_asset_id_counter_asset_key on history_trades_60000 h  (cost=0.29..912.51 rows=3 width=618) (actual time=0.099..0.607 rows=39 loops=1)
                     Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '149'::bigint))
               ->  Function Scan on jsonb_each j  (cost=0.00..1.00 rows=100 width=32) (actual time=5.995..7.002 rows=9530 loops=39)
 Planning Time: 0.192 ms
 Execution Time: 910.512 ms
(10 rows)

Time: 911.161 ms
On-the-fly 1d buckets query: ~1300ms
postgres=# EXPLAIN ANALYZE SELECT
  bucket as timestamp,
  sum("count") as count,
  sum(base_volume) as base_volume,
  sum(counter_volume) as counter_volume,
  sum(counter_volume)/sum(base_volume) as avg,
  (max_price(high))[1] as high_n,
  (max_price(high))[2] as high_d,
  (min_price(low))[1] as low_n,
  (min_price(low))[2] as low_d,
  (first(open))[1] as open_n,
  (first(open))[2] as open_d,
  (last(close))[1] as close_n,
  (last(close))[2] as close_d
  FROM (
    SELECT
      div(((j.value->>'timestamp')::bigint - 0), 86400000)*86400000 + 0 as bucket,
      (j.value->>'timestamp')::bigint as timestamp,
      (j.value->>'count')::integer as count,
      (j.value->>'base_volume')::bigint as base_volume,
      (j.value->>'counter_volume')::bigint as counter_volume,
      ARRAY[(j.value->>'high_n')::bigint, (j.value->>'high_d')::bigint] as high,
      ARRAY[(j.value->>'low_n')::bigint, (j.value->>'low_d')::bigint] as low,
      ARRAY[(j.value->>'open_n')::bigint, (j.value->>'open_d')::bigint] as open,
      ARRAY[(j.value->>'close_n')::bigint, (j.value->>'close_d')::bigint] as close
    FROM
      history_trades_60000 h, jsonb_each(h.values) j
    WHERE h.base_asset_id = '2'
      AND h.counter_asset_id = '149'
    ORDER BY timestamp ASC
  ) as htrd
  GROUP BY bucket ORDER BY timestamp DESC LIMIT 200;

                                                                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=979.95..1622.70 rows=200 width=296) (actual time=1218.986..1223.731 rows=200 loops=1)
   ->  GroupAggregate  (cost=979.95..1622.70 rows=200 width=296) (actual time=1218.984..1223.723 rows=200 loops=1)
         Group Key: htrd.bucket
         ->  Sort  (cost=979.95..980.70 rows=300 width=180) (actual time=1218.870..1219.016 rows=1148 loops=1)
               Sort Key: htrd.bucket DESC
               Sort Method: external merge  Disk: 69184kB
               ->  Subquery Scan on htrd  (cost=963.86..967.61 rows=300 width=180) (actual time=1032.856..1084.236 rows=371658 loops=1)
                     ->  Sort  (cost=963.86..964.61 rows=300 width=188) (actual time=1032.854..1057.811 rows=371658 loops=1)
                           Sort Key: (((j.value ->> 'timestamp'::text))::bigint)
                           Sort Method: external merge  Disk: 74928kB
                           ->  Nested Loop  (cost=0.29..951.51 rows=300 width=188) (actual time=0.309..901.815 rows=371658 loops=1)
                                 ->  Index Scan using history_trades_60000_year_month_base_asset_id_counter_asset_key on history_trades_60000 h  (cost=0.29..912.51 rows=3 width=618) (actual time=0.060..0.521 rows=39 loops=1)
                                       Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '149'::bigint))
                                 ->  Function Scan on jsonb_each j  (cost=0.00..1.00 rows=100 width=32) (actual time=5.279..6.170 rows=9530 loops=39)
 Planning Time: 0.385 ms
 Execution Time: 1279.927 ms
(16 rows)

Thoughts

Keep in mind these are worst-cases (loading all of time for the most traded asset pair). Seems like most of the time is spent unwinding the jsonb. Intuitively I feel like that could be reduced, as it seems like the limit param isn't being propagated down as far as it could. Maybe I'm missing an index I had before? 🤔

For comparison with the "normal" case, with counter asset 15937061, the 1m and 1d buckets take 200ms and 400ms respectively.

It also might be worth doing a benchmark without the jsonb storage. It ~doubles the data storage, but might make the query times much happier.

Update: Benchmarked without the jsonb storage.

Non-jsonb Pre-computed 1m buckets query: ~30ms
postgres=#
-- raw non-jsonb 1m query
EXPLAIN ANALYZE SELECT
  timestamp,
  count,
  base_volume,
  counter_volume,
  ARRAY[high_n, high_d] as high,
  ARRAY[low_n, low_d] as low,
  ARRAY[open_n, open_d] as open,
  ARRAY[close_n, close_d] as close
FROM
  history_trades_60000 h
WHERE h.base_asset_id = '2'
  AND h.counter_asset_id = '149'
ORDER BY timestamp DESC LIMIT 200;

                                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..697.19 rows=200 width=156) (actual time=1.156..25.029 rows=200 loops=1)
   ->  Index Scan Backward using history_trades_60000_timestamp_base_asset_id_counter_asset__key on history_trades_60000 h  (cost=0.56..834139.28 rows=239478 width=156) (actual time=1.154..25.005 rows=200 loops=1)
         Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '149'::bigint))
 Planning Time: 0.217 ms
 Execution Time: 25.075 ms
(5 rows)
Non-jsonb On-the-fly 1d buckets query: ~1000ms
postgres=# EXPLAIN ANALYZE SELECT
  timestamp,
  sum("count") as count,
  sum(base_volume) as base_volume,
  sum(counter_volume) as counter_volume,
  sum(counter_volume)/sum(base_volume) as avg,
  (max_price(high))[1] as high_n,
  (max_price(high))[2] as high_d,
  (min_price(low))[1] as low_n,
  (min_price(low))[2] as low_d,
  (first(open))[1] as open_n,
  (first(open))[2] as open_d,
  (last(close))[1] as close_n,
  (last(close))[2] as close_d
  FROM (
    SELECT
      timestamp,
      count,
      base_volume,
      counter_volume,
      ARRAY[high_n, high_d] as high,
      ARRAY[low_n, low_d] as low,
      ARRAY[open_n, open_d] as open,
      ARRAY[close_n, close_d] as close
    FROM
      history_trades_60000 h
    WHERE h.base_asset_id = '2'
      AND h.counter_asset_id = '149'
    ORDER BY timestamp ASC
  ) as htrd
  GROUP BY timestamp ORDER BY timestamp DESC LIMIT 200;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=446157.44..446587.44 rows=200 width=272) (actual time=950.483..950.731 rows=200 loops=1)
   ->  GroupAggregate  (cost=446157.44..961035.14 rows=239478 width=272) (actual time=950.483..950.724 rows=200 loops=1)
         Group Key: h."timestamp"
         ->  Sort  (cost=446157.44..446756.14 rows=239478 width=156) (actual time=950.462..950.508 rows=201 loops=1)
               Sort Key: h."timestamp" DESC
               Sort Method: external merge  Disk: 69184kB
               ->  Sort  (cost=402940.67..403539.36 rows=239478 width=156) (actual time=832.959..857.091 rows=371658 loops=1)
                     Sort Key: h."timestamp"
                     Sort Method: external merge  Disk: 69184kB
                     ->  Seq Scan on history_trades_60000 h  (cost=0.00..362717.36 rows=239478 width=156) (actual time=0.051..672.880 rows=371658 loops=1)
                           Filter: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '149'::bigint))
                           Rows Removed by Filter: 9872212
 Planning Time: 0.228 ms
 Execution Time: 986.947 ms
Non-jsonb inserts: 1-15ms
postgres=# explain analyze INSERT into history_trades
  (history_operation_id, "order",        ledger_closed_at,      offer_id,
   base_account_id,      base_asset_id,  base_amount,           counter_account_id,
   counter_asset_id,     counter_amount, base_is_seller,        price_n,
   price_d,              base_offer_id,  counter_offer_id)
  values
  (152527926612222986,   0,              '2021-05-21 15:16:36', 572368382,
   2329565490,           2,              2490,                  263763996,
   17076589,             45272,          false,                 200,
   11,                   4764213935039610881,                   572368382);

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Insert on history_trades  (cost=0.00..0.01 rows=1 width=109) (actual time=10.135..10.136 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=109) (actual time=0.002..0.003 rows=1 loops=1)
 Planning Time: 0.062 ms
 Trigger htrd_60000_insert: time=5.723 calls=1
 Execution Time: 15.962 ms
(5 rows)

postgres=# explain analyze INSERT into history_trades
  (history_operation_id, "order",        ledger_closed_at,      offer_id,
   base_account_id,      base_asset_id,  base_amount,           counter_account_id,
   counter_asset_id,     counter_amount, base_is_seller,        price_n,
   price_d,              base_offer_id,  counter_offer_id)
  values
  (152527926612222987,   0,              '2021-05-21 15:16:37', 572368382,
   2329565490,           2,              2490,                  263763996,
   17076589,             45272,          false,                 200,
   11,                   4764213935039610881,                   572368382);
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Insert on history_trades  (cost=0.00..0.01 rows=1 width=109) (actual time=0.167..0.168 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=109) (actual time=0.003..0.003 rows=1 loops=1)
 Planning Time: 0.045 ms
 Trigger htrd_60000_insert: time=0.848 calls=1
 Execution Time: 1.120 ms
(5 rows)

So, expectedly, the raw non-jsonb retrieval is blazing fast. Interestingly the on-the-fly is still quite slow, since it is dominated by the GroupAggregate work. We could do the same trigger (more-or-less) to pre-compute the 5 and 15-minute buckets as well, which would probably help, but this adds ~1300MB. Basically, we should to decide if the extra disk usage (1600MB vs 500MB with jsonb) is worth the increased query speed (or find another way to reduce the disk usage, maybe a side-table for the asset-pair, or tuning the field types to reduce row width).

@paulbellamy
Copy link
Contributor Author

Thinking about this more, and benchmarking a bit more, I think we should not do the jsonb thing for now. The faster query times will take more load off the DB, and hard-drives are cheap(ish). Let's opt for the ~1600MB disk usage, and fastest query times.

Because all the data is available, if/when disk usage later becomes an issue we can reconsider a move to jsonb (via db migration, no reingest needed), with more knowledge about the usage patterns.

@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from b129046 to 2cb5733 Compare June 16, 2021 18:37
Paul Bellamy added 2 commits June 16, 2021 19:39
- Just calls the ~same thing once-per-insert-batch from go
- Unknowns
  - Does it actually work?
  - Is the "in" query for buckets efficient, or would a range be better?
  - Is full-rebuild of buckets efficient, or should we do incremental?
    Seems safer if it works.
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 2cb5733 to e5c3966 Compare June 16, 2021 18:39
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 9a011d2 to 6e35397 Compare June 25, 2021 15:33
@@ -753,6 +760,11 @@ func (h reingestHistoryRangeState) run(s *system) (transition, error) {
}
}

err = s.historyQ.RebuildTradeAggregationBuckets(s.ctx, h.fromLedger, h.toLedger)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, running this once at the end of the reingestion means that the aggregations will be out-of-date until the reingestion finishes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could decide we're fine with that, or do some detection and rebuild it after each minute chunk. Running it once at the end is way more efficient though, as it can just load the whole range of trades at once (instead of loading each minute's).

Copy link
Contributor Author

@paulbellamy paulbellamy Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per benchmarking with @jacekn it seems like the previous slowness was caused by missing an index to use for the DeleteRangeAll call. If we're concerned with consistency during reingestion we can retry rebuilding the buckets after each ledger, and time that.

@paulbellamy paulbellamy changed the title [WIP] services/horizon: Precompute 1m Trade Aggregation buckets services/horizon: Precompute 1m Trade Aggregation buckets Jun 28, 2021
@paulbellamy paulbellamy marked this pull request as ready for review June 28, 2021 16:23
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 5229099 to cdd97b4 Compare June 29, 2021 11:39
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from cdd97b4 to 25ac86c Compare June 29, 2021 11:44
Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it looks good to me! I added one major comment about delete part of RebuildTradeAggregationBuckets and a few minor comments. I know you checked the speed of the new thing (which looks really good!) but have you checked the correctness? I think what we could do is: deploy your branch to stg, rebuild buckets and then get as many /trade_aggregation requests from the access log and use horizon-cmp to check if the stable release returns the same data.

go.list Show resolved Hide resolved
services/horizon/internal/db2/history/main.go Outdated Show resolved Hide resolved
FROM htrd
GROUP by base_asset_id, counter_asset_id, timestamp
ORDER BY timestamp
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should remove recalculations from DB migration. Not everyone is using trade aggregations and not everyone can afford 9m downtime (which I guess can be even slower on lower spec machines/disks). What about leaving a query that create a table here but recalculating when someone explicitly runs horizon db reingest ...? There are obviously disadvantages of this so I'd rather want to discuss it first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this actually need to cause downtime? I guess so, as the table locks would conflict with ingestion... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After actually deploying this and using it, I think I agree. Having the migrations take a long time is a pain, as is maintaining the query in 2 places. Let's remove it from the migration.

services/horizon/internal/db2/history/trade_aggregation.go Outdated Show resolved Hide resolved
services/horizon/internal/db2/history/trade_aggregation.go Outdated Show resolved Hide resolved
services/horizon/internal/ingest/fsm.go Show resolved Hide resolved
Comment on lines 249 to 252
_, err := q.Exec(ctx, sq.Delete("history_trades_60000").Where(
sq.GtOrEq{"open_ledger": fromLedger},
sq.Lt{"open_ledger": toLedger},
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something is wrong here or I misunderstand what history_trades_60000 holds. Given that history_trades_60000 holds one minute trade aggregations buckets, check this chart:

---|-------------------|----------
   O      |            C      |
          |                   |
          X                   Y

The line in the top is one of the already aggregated buckets in a DB, it starts at ledger O (open) and ends at C (close). Then we run reingest of X-Y ledgers. What we want to delete buckets starting at O and C and I think that the highlighted code doesn't do that.

Copy link
Contributor Author

@paulbellamy paulbellamy Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, yeah. We actually needed to find the first and last bucket timestamps from the given ledger range, not the actual ledger values. Otherwise we would only include partial data when rebuilding. Nice catch, thanks.

services/horizon/internal/db2/history/trade_aggregation.go Outdated Show resolved Hide resolved
services/horizon/internal/db2/history/trade_aggregation.go Outdated Show resolved Hide resolved
services/horizon/internal/db2/history/trade_aggregation.go Outdated Show resolved Hide resolved
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from 05d750f to d3b4695 Compare July 2, 2021 15:52
@paulbellamy paulbellamy force-pushed the paulb/trade_aggregation_precompute branch from d3b4695 to 41cf9fe Compare July 2, 2021 16:29
@paulbellamy
Copy link
Contributor Author

Note, when ingesting this failed with:

error rebuilding trade aggregations: could not rebuild trade aggregation bucket: exec failed: pq: duplicate key value violates unique constraint \"history_trades_60000_pkey\"

So, we are not clearing out the correct old buckets.

@paulbellamy paulbellamy merged commit d336609 into stellar:master Jul 6, 2021
@paulbellamy paulbellamy deleted the paulb/trade_aggregation_precompute branch July 6, 2021 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize trade aggregations query
2 participants