Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize trade aggregations query #632

Closed
bartekn opened this issue Sep 5, 2018 · 24 comments · Fixed by #3641
Closed

Optimize trade aggregations query #632

bartekn opened this issue Sep 5, 2018 · 24 comments · Fixed by #3641
Assignees
Labels
horizon performance issues aimed at improving performance

Comments

@bartekn
Copy link
Contributor

bartekn commented Sep 5, 2018

No description provided.

@bartekn bartekn added the horizon label Sep 5, 2018
@bartekn bartekn added this to the Horizon v0.15.0 milestone Sep 5, 2018
@bartekn bartekn modified the milestones: Horizon v0.15.0, Horizon v0.14.x - patch releases Sep 6, 2018
@howardtw
Copy link
Contributor

@bartekn Since the description is empty here, I wonder if we log the actual slow queries. I want to see the explain analyze of it but getting the resolution, offset, and the time range is not straightforward. I can see that we have a b-tree index to apply time filter, and I will need to identify other bottlenecks of the trade aggregation queries.

@bartekn
Copy link
Contributor Author

bartekn commented Jan 14, 2019

Yes, we do log slow queries. Here's an example:

2019-01-14 13:01:32 UTC:172.17.20.113(55478):stellar@horizon:[79213]:LOG: duration: 1353.979 ms execute <unnamed>: SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 900000)*900000 + 0 as timestamp, history_operation_id, "order", counter_asset_id as base_asset_id, counter_amount as base_amount, base_asset_id as counter_asset_id, base_amount as counter_amount, ARRAY[price_d, price_n] as price FROM history_trades WHERE counter_asset_id = $1 AND base_asset_id = $2 AND ledger_closed_at >= $3 AND ledger_closed_at < $4 ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200
2019-01-14 13:01:32 UTC:172.17.20.113(55478):stellar@horizon:[79213]:DETAIL: parameters: $1 = '7', $2 = '2', $3 = '1970-01-18 13:00:00', $4 = '2018-11-16 08:30:00'

Explain analyze of above:

horizon=> explain analyze SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 900000)*900000 + 0 as timestamp, history_operation_id, "order", counter_asset_id as base_asset_id, counter_amount as base_amount, base_asset_id as counter_asset_id, base_amount as counter_amount, ARRAY[price_d, price_n] as price FROM history_trades WHERE counter_asset_id = '7' AND base_asset_id = '2' AND ledger_closed_at >= '1970-01-18 13:00:00' AND ledger_closed_at < '2018-11-16 08:30:00' ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200;
                                                                                                                           QUERY PLAN                                                                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=113625.23..113625.73 rows=200 width=80) (actual time=1408.475..1408.593 rows=200 loops=1)
   ->  Sort  (cost=113625.23..113625.73 rows=200 width=80) (actual time=1408.474..1408.515 rows=200 loops=1)
         Sort Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * 1000::double precision))::bigint - 0))::numeric, 900000::numeric) * 900000::numeric) + 0::numeric))
         Sort Method: top-N heapsort  Memory: 78kB
         ->  HashAggregate  (cost=113613.08..113617.58 rows=200 width=80) (actual time=1377.139..1400.161 rows=14238 loops=1)
               Group Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * 1000::double precision))::bigint - 0))::numeric, 900000::numeric) * 900000::numeric) + 0::numeric))
               ->  Sort  (cost=47576.34..47729.56 rows=61287 width=68) (actual time=223.914..249.145 rows=91856 loops=1)
                     Sort Key: history_trades.history_operation_id, history_trades."order"
                     Sort Method: quicksort  Memory: 15990kB
                     ->  Index Scan using htrd_pair_time_lookup on history_trades  (cost=0.56..42703.01 rows=61287 width=68) (actual time=0.034..153.567 rows=91856 loops=1)
                           Index Cond: ((base_asset_id = 2::bigint) AND (counter_asset_id = 7::bigint) AND (ledger_closed_at >= '1970-01-18 13:00:00'::timestamp without time zone) AND (ledger_closed_at < '2018-11-16 08:30:00'::timestamp without time zone))
 Planning time: 0.306 ms
 Execution time: 1410.295 ms
(13 rows)

@howardtw
Copy link
Contributor

@bartekn it is actually not terribly bad as I don't see any Seq Scan in the above plan. What do you think we can make the trade aggregations better?

@bartekn
Copy link
Contributor Author

bartekn commented Jan 15, 2019

One thing that caught my eye here is that the time period is very large - basically aggregating data since the first ledger and index scan using this value is taking most of the execution time. Given that history_trades table is huge maybe that's something we should fix first, like disallow querying periods larger than ex. 1 month?

I will send slow query log from the last few days to you to check if the query is slow only for larger periods.

@tomerweller
Copy link
Contributor

Trade aggregation can take a lot of time because it essentially scans every trade in the given time range. When we originally built it we limited the bucket resolutions to a few values, so that in the future we can optimize by caching buckets in these resolutions (maybe by utilizing materialized views). Maybe the future is now?

If we disallow long time ranges it's going to break pretty much every trading client out there. Maybe we can introduce this limit as a configuration value and start warning users that in the future that value in the public instance will be set to a month.

@howardtw
Copy link
Contributor

Do we still limit the bucket resolutions to a few values or we relaxed the constraint some point in the past? Caching the buckets might be challenging as we would have to update the cache constantly depending on the resolutions if I understand it correctly.

@howardtw
Copy link
Contributor

@bartekn after talking to @tomerweller, we think one possible approach is to have a database table or a materialized view store the trade aggregations with a resolution (bucket) equal to an hour from the beginning since the offset is in hours. For requests with a resolution that's greater or equal to an hour, we will calculate the result from this new table without a limit for the period. For requests with a resolution that's less the an hour, we will calculate the result from the original table but limit the period to within a day. What do you think about this approach?

@bartekn
Copy link
Contributor Author

bartekn commented Jan 16, 2019

Overall it seems good but:

  • Before starting this we should check if this really solved the problem, ex. create a materialized view and run a query on it.
  • I'm wondering if having more trades data (ex. 5 vs 2 years) will make 1hr buckets slow again.
  • Looks like this solution will require more disk space in DB. Also, data will need to be updated from time to time and updating many materialized views can take some time and DB's CPU.

In the context of rate limiting I'm wondering if allowing full scans on a huge table like history_trades with a cost equal 1 (each request decrements the allowed number of requests in a given time range by one) should be possible. Even if we limit the maximum period to 1 month applications can still request the previous month in a separate query (this could be simultaneous request). TBH: I'm still not sure if it's the right approach but it's much easier to implement and updates required on the client side are not that complicated.

@tomerweller
Copy link
Contributor

Before starting this we should check if this really solved the problem, ex. create a materialized view and run a query on it.

Agreed. Let's get some numbers.

I'm wondering if having more trades data (ex. 5 vs 2 years) will make 1hr buckets slow again.

Definitely might. In this case we know exactly the rate of data growth (per pair), so we can test it.

Looks like this solution will require more disk space in DB. Also, data will need to be updated from time to time and updating many materialized views can take some time and DB's CPU.

In theory you should have exactly one update per hour including as many rows as pairs that have been traded.

@howardtw
Copy link
Contributor

Even if we limit the maximum period to 1 month applications can still request the previous month in a separate query (this could be simultaneous request).

With the hourly bucket optimization, we don't have to worry about limiting the period to be under a month on the client side anymore, right?

@bartekn
Copy link
Contributor Author

bartekn commented Jan 17, 2019

With the hourly bucket optimization, we don't have to worry about limiting the period to be under a month on the client side anymore, right?

Yeah, I was thinking about a solution for API clients in case we don't implement bucket optimization and just limit requests to 1 month period.

@tomquisel tomquisel removed this from the Horizon patch releases milestone Jan 24, 2019
@tomquisel
Copy link
Contributor

It sounds like this is a fair bit of work, so I'd say it's lower priority.

@poliha
Copy link
Contributor

poliha commented Aug 14, 2019

I looked at this a bit more, see my findings and proposed solution below.

According to the horizon logs over a period of 7 days, there are over 2.3M requests to the /trade_aggregations endpoint of horizon.stellar.org. Out of that, response times for 99.8% of those requests are under 2seconds. From the remaining 0.2%, 14 requests were over 30seconds. The good news from this is that most times, the API is being used on the happy path and everything is fine. But it also means that there is a possibility for misuse. It can easily be the other way around and the requests could be more delayed.

An example of the delayed request is shown below; this request took 68 seconds probably timed out

/trade_aggregations?base_asset_type=native&counter_asset_type=credit_alphanum4&counter_asset_code=DRA&counter_asset_issuer=GCJKSAQECBGSLPQWAU7ME4LVQVZ6IDCNUA5NVTPPCUWZWBN5UBFMXZ53&start_time=0&end_time=0&resolution=86400000&order=desc&limit=200

this translates into the sql query

SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 86400000)*86400000 + 0 as timestamp, history_operation_id, "order", base_asset_id, base_amount, counter_asset_id, counter_amount, ARRAY[price_n, price_d] as price FROM history_trades WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND ledger_closed_at >= '1970-01-01 00:00:00' ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200

When analyzed, we get this

QUERY PLAN                                                                                                         -----------------------------------------------------------------------------------------------------------------
 Limit  (cost=2833991.45..2833991.95 rows=200 width=80) (actual time=44213.773..44213.893 rows=200 loops=1)
   ->  Sort  (cost=2833991.45..2833991.95 rows=200 width=80) (actual time=44213.771..44213.816 rows=200 loops=1)
         Sort Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
         Sort Method: top-N heapsort  Memory: 78kB
         ->  HashAggregate  (cost=2833979.30..2833983.80 rows=200 width=80) (actual time=44212.371..44213.110 rows=516 loops=1)
               Group Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
               ->  Sort  (cost=596769.28..601960.03 rows=2076297 width=52) (actual time=6214.336..7484.817 rows=2687984 loops=1)
                     Sort Key: history_trades.history_operation_id, history_trades."order"
                     Sort Method: external sort  Disk: 249592kB
                     ->  Index Scan using htrd_pair_time_lookup on history_trades  (cost=0.56..297799.79 rows=2076297 width=52) (actual time=0.061..3635.642 rows=2687984 loops=1)
                           Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint) AND (ledger_closed_at >= '1970-01-01 00:00:00'::timestamp without time zone))
 Planning time: 0.612 ms
 Execution time: 44250.765 ms
(13 rows)

45seconds execution time is definitely not acceptable.

Precompute Trade Aggregation Buckets

One of the suggestions mentioned earlier was pre-computing the trade aggregation buckets. I think that is a good idea so I decided to test it out. I precomputed the buckets for the asset pair above using 1-hour resolutions and saved it as a materialized view. With that, the slow query above can be converted to

SELECT timestamp, sum(count) as count, sum(base_volume) as base_volume, sum(counter_volume) as counter_volume,  sum(avg)/count(*) as avg, max_price(high) as high, min_price(low) as low, first(open) as open, last(close) as close FROM(SELECT div((cast(timestamp as bigint) - 0), 86400000) * 86400000 + 0 as timestamp, count, base_volume, counter_volume, avg, high, low, open, close FROM trdagg_view WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND timestamp >= '1468800000') as htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200;

Analyzing this gives

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1498.62..1710.12 rows=200 width=196) (actual time=18.271..75.610 rows=200 loops=1)
   ->  GroupAggregate  (cost=1498.62..13543.55 rows=11390 width=196) (actual time=18.269..75.516 rows=200 loops=1)
         Group Key: (((div((((trdagg_view."timestamp")::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
         ->  Sort  (cost=1498.62..1527.10 rows=11390 width=196) (actual time=17.968..19.284 rows=4432 loops=1)
               Sort Key: (((div((((trdagg_view."timestamp")::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
               Sort Method: quicksort  Memory: 3358kB
               ->  Seq Scan on trdagg_view  (cost=0.00..731.19 rows=11390 width=196) (actual time=0.017..12.141 rows=11391 loops=1)
                     Filter: (("timestamp" >= '1468800000'::numeric) AND (base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint))
 Planning time: 0.131 ms
 Execution time: 75.709 ms
(10 rows)

⚡ Alright, Precomputing the buckets definitely looks like the way to go, but there are some issues. Let us look at some numbers:

Currently, there are 25,070 assets in the history_assets table. With Stellar each of those asset can potentially create a trading pair with every other asset; this gives us about 628,454,760 asset pairs. Multiply that by 24(hourly buckets per day), that is a lot of data.

Thankfully(At the moment), not all of those asset pairs are in play. Looking at the history_trades tables, there are 2,683 distinct asset pairs. This gives 64,392 precomputed entries per day. Following this, the maximum amount of entries per day to be precomputed is N * 24, where N is the number of distinct asset pairs. Note that, this assumes that all asset pairs are active every day, which is not necessarily the case.

Bounded time range

Another solution that I have considered is having the time-range bounded to multiples of the specified resolution.
Given that horizon will only respond with a maximum of 200 records, we should limit the starttime to a minimum of endtime - (200 * r), where r is the resolution. The next link in the response will then be updated with corresonponding start and end times with intervals of 200r. For cases when the endtime is not defined, we can set it to the maximum ledger_closed_at value for the given asset pair.

If we do this, the slow query above will be represented as

SELECT timestamp, count(*) as count, sum(base_amount) as base_volume, sum(counter_amount) as counter_volume, sum(counter_amount)/sum(base_amount) as avg, max_price(price) as high, min_price(price) as low, first(price) as open, last(price) as close FROM (SELECT div((cast((extract(epoch from ledger_closed_at) * 1000 ) as bigint) - 0), 86400000)*86400000 + 0 as timestamp, history_operation_id, "order", base_asset_id, base_amount, counter_asset_id, counter_amount, ARRAY[price_n, price_d] as price FROM history_trades WHERE base_asset_id = '2' AND counter_asset_id = '5833' AND ledger_closed_at >= '2019-01-26 00:00:00' ORDER BY history_operation_id , "order") AS htrd GROUP BY timestamp ORDER BY timestamp desc LIMIT 200

When we analyze this, we get

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1053711.98..1053712.48 rows=200 width=80) (actual time=2530.585..2530.708 rows=200 loops=1)
   ->  Sort  (cost=1053711.98..1053712.48 rows=200 width=80) (actual time=2530.583..2530.632 rows=200 loops=1)
         Sort Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric)) DESC
         Sort Method: quicksort  Memory: 78kB
         ->  HashAggregate  (cost=1053699.84..1053704.34 rows=200 width=80) (actual time=2530.109..2530.385 rows=201 loops=1)
               Group Key: (((div(((((date_part('epoch'::text, history_trades.ledger_closed_at) * '1000'::double precision))::bigint - 0))::numeric, '86400000'::numeric) * '86400000'::numeric) + '0'::numeric))
               ->  Sort  (cost=307102.24..308834.49 rows=692898 width=52) (actual time=369.907..435.424 rows=151092 loops=1)
                     Sort Key: history_trades.history_operation_id, history_trades."order"
                     Sort Method: external sort  Disk: 14032kB
                     ->  Index Scan using htrd_pair_time_lookup on history_trades  (cost=0.56..226349.23 rows=692898 width=52) (actual time=0.038..229.375 rows=151092 loops=1)
                           Index Cond: ((base_asset_id = '2'::bigint) AND (counter_asset_id = '5833'::bigint) AND (ledger_closed_at >= '2019-01-26 00:00:00'::timestamp without time zone))
 Planning time: 0.202 ms
 Execution time: 2533.212 ms
(13 rows)

2.5seconds is better than 45seconds so this has some potential as well.

Proposed Solution

My proposal for optimizing the trade_aggregation endpoint is a that we approach this in stages.

  • Stage 0: Implement the bounded time range strategy above and monitor how this improves requests. I am going with this solution first because it introduces a significant amount of gain with minimum overhead. The changes needed will only be to the codebase. From a user perspective, they still get the same response structure. Also it has the added advantage of not involving extra storage.

  • Stage 1: After monitoring the changes in Stage 0, should we feel the need to further improve this endpoint we can implement the precomputed records strategy. This will involve the creation of a new table: history_trade_aggregations that will have trade aggregation records for asset pairs based on hourly resolutions. Resolutions greater than or equal to an hour will use this table, otherwise it can use the current query. Given that this feature requires extra resources it should be accompanied by 2 flags

    • TRADE_AGGREGATION_HISTORY: boolean to indicate if trade aggregations should be precomputed.
    • TRADE_AGGREGATION_HISTORY_RANGE: the amount of history that should be saved. 1month, 3months, 6months, 1year?

Will love to hear some feedback on this. My plan is to implement stage 0 in the next sprint.

@tomerweller
Copy link
Contributor

Wow! Great find with the time bounding approach.

That nested query completely ignores paging right now and brings the entire time range worth of trades for every page. Who ever wrote that monstrosity... 🤷‍♂️

I agree about stage 0. Given the performance gains from that I think that precomputing buckets is pretty aggressive. Let's put that on a hold until evidence suggests otherwise.

@ire-and-curses
Copy link
Member

Yikes! It's kinda amazing that this query isn't doing this already. Totally agree we should do that ASAP and then re-measure.

I'd expect the time bounding should handle insane cases like asking for 1m summaries of an asset from genesis to the present day. I'd like to see the performance of some large time bounded 1w resolution queries though. There may still be work to do there.

@bartekn
Copy link
Contributor Author

bartekn commented Jul 27, 2020

Added bounded time range in #2856.

bartekn added a commit that referenced this issue Jul 30, 2020
…des aggregation query (#2856)

This commit limits time range in trade aggregations query to a maximum
number of buckets limited by `limit` param.

In #632 (comment) it
was suggested that the query in trade aggregations needs to prepare
buckets for entire history if start and end dates are not provided. This
can be fixed because we limit number of buckets returned to a user. The
code in this commit adjusts start time and end time to limit the range
to the maximum range visible by the API user.
@bartekn
Copy link
Contributor Author

bartekn commented Jul 30, 2020

I implemented Bounded Range solution in #2856 and I'm currently checking it in stg. There's a small improvement in p99 response time, much better in p95 but it's still above 1s for many queries. After #2869 which improves performance of /trades, /trade_aggregations in the single route that responds in seconds instead of milliseconds time. Fixing this will help us build better metrics, alerting and improve Horizon clusters health overall.

I think we should spend more time next sprint and implement the buckets solution. It will be much simpler than a couple months ago:

  • We have a new ingestion system so we really just need a new processor on ingestion side.
  • We have the reingesiton cluster so we can check storage requirements for new data the day after reingestion is started.

To make sure users can upgrade without having to reingest their DBs we can add the new code behind a feature flag.

Moving to Horizon 1.8.0 milestone temporarily. To be discussed in Horizon Team meeting.

@bartekn bartekn modified the milestones: Horizon 1.7.0, Horizon 1.8.0 Jul 30, 2020
@bartekn bartekn added the performance issues aimed at improving performance label Aug 5, 2020
@bartekn
Copy link
Contributor Author

bartekn commented Aug 5, 2020

We had to revert the Bounded Range solution in: #2893. It doesn't work for markets with some buckets missing for a given time range and limit (more in the revert PR description).

Again, we should probably implement precomputed buckets idea. This issue should not occur in this solution.

@ire-and-curses
Copy link
Member

Damn, that's disappointing. Since the issue is now not even partially resolved, lets prioritise this now then. Moved to top of backlog.

@bartekn bartekn modified the milestones: Horizon 1.8.0, Horizon 1.9.0 Aug 26, 2020
@abuiles abuiles modified the milestones: Horizon 1.9.0, Horizon 1.10.0 Oct 2, 2020
@bartekn bartekn removed this from the Horizon 1.10.0 milestone Oct 21, 2020
@paulbellamy
Copy link
Contributor

Sidenote: Reading back through this, the explain line saying: Sort Method: external sort Disk: 14032kB, seems slightly odd. Doesn't the worker have 14MB free memory to quicksort those?

Intuitively, I'd say we should do bounded ranges (more flexible, and lower disk use), and do multiple queries to fix the limit issues with #2893.

We could possibly combine that with table-partitioning. This would need some experimentation and analysis on the table/index sizes and access patterns, but we could partition either on:

  • ledger_closed_at
    • this would be better if we typically query across all assets within a time range
  • max(base_asset_id, counter_asset_id)
    • this would be better if we typically query for a specific asset-pair across all time
  • both
    • Suspect this might be too much for now.

@paulbellamy
Copy link
Contributor

paulbellamy commented May 19, 2021

Per discussion in team meeting today, I think this will be easier if we change some of the trade aggregation endpoint semantics. Particularly around start_time, end_time, and offset.

Proposed API Param changes:

- start_time (optional) long
+ start_time (required) long
    The lower time boundary represented as milliseconds since epoch.

- end_time   (optional) long
+ end_time   (required) long
    The upper time boundary represented as milliseconds since epoch.

- resolution (optional) long
+ resolution (required) long
    The segment duration represented as milliseconds. Supported values are 1 minute (60000), 5 minutes (300000), 15 minutes (900000), 1 hour (3600000), 1 day (86400000) and 1 week (604800000).

  offset     (optional) long
-  Segments can be offset using this parameter. Expressed in milliseconds. Can only be used if the resolution is greater than 1 hour. Value must be in whole hours, less than the provided resolution, and less than 24 hours.
+  Segments can be offset using this parameter. Expressed in milliseconds. Value must be in multiples of 15 minutes, less than the provided resolution, and less than 24 hours. Used to adjust queries for a timezone offset.

  order      (optional)
    A designation of the order in which records should appear. Options include asc(ascending) or desc (descending). If this argument isn’t set, it defaults to asc.

  limit      (optional)
    The total number of records returned. The limit can range from 1 to 200 - an upper limit that is hardcoded in Horizon for performance reasons. If this argument isn’t designated, it defaults to 10.

  ... asset pair type/issuer/code ...

The offset param, is used to adjust the query for timezone offsets. I was hoping we could get rid of it, but unfortunately, I don't actually think we can. e.g., For daily buckets, where midnight is in UTC-7, the summary needs to be calculated differently than for UTC-0. However, it's a bit lacking, in it currently has to be multiples of an hour (which doesn't cover all timezones). We should change the minimum offset multiple from 1h to 15m. Unfortunately, this actually makes the api more flexible, but will help us actually account for all timezones. I figure we should get this in earlier, so we get the caching/precomputing work right.

offset query example:

Range/Resolution Tradeoffs

As discussed, we should enforce range/resolution tradeoffs. The idea being, that for a given resolution, there's a maximum end_time-start_time. Given the maximum limit value is 200, I'd suggest starting with that. IMO, seems more than enough for most charts, and charts should lazy-load in more data as you scroll/zoom, anyway.

Resolution Max Range Max Buckets/Query
1 minute (60000) 200 minutes 200
5 minutes (300000) 1000 minutes 200
15 minutes (900000) 50 hours 200
1 hour (3600000) 200 hours 200
1 day (86400000) 200 days 200
1 week (604800000). 1400 days 200

In essence:

if (end_time - start_time) > (resolution * 200) {
  w.WriteStatus(http.StatusBadRequest)
  return
}

Given most of our requests are for 1min and 15min resolutions, this should mean most requests are constrained to smaller sections of the database, making partitioning/sharding more effective.

@tomerweller
Copy link
Contributor

tomerweller commented May 20, 2021

if (end_time - start_time) > (resolution * 200) {
  w.WriteStatus(http.StatusBadRequest)
  return
}

This contract change breaks clients that give a duration that can contain multiple pages of results and depend on the paging links.

Maybe it's worth taking a look at public horizon logs to see what percentage of requests this will break?

@tomerweller
Copy link
Contributor

tomerweller commented May 20, 2021

It seems like we need some more information on the paging strategy of trading clients to make an informed decision. The suggestion above by @paulbellamy has the potential to introduce breakage in paging (or not). And, to the best of my understanding, #2856 by @bartekn was reverted because it effectively changed the paging contract (more on this below).

This is a bit of a tragedy because trading aggregations don't really need opinionated server side paging. The combination of parameters gives clients all the freedom in the world to page as they please. However, when creating this endpoint we adhered to the same HAL response structure as the rest of the API. I'll add this to my upcoming book titled Crimes committed in the name of generics.

Revisiting #2893 that reverted #2856. It seems like the issue with timebounding is that the results are sparse (empty buckets not included) which breaks paging logic. Paging logic assumes that len(results)<page_limit means there are no more results and the client shouldn't further page.

What if we restore #2856 and solve this problem by artificially un-sparsing the results adding 0-buckets where there are non buckets to the results? This should restore paging logic. This is very inefficient for rarely traded pairs, but maybe that's not the end of the world. We should also probably not serve 0-buckets from before the existence of a given orderbook.

This is an implicit contract change as clients have never had to deal with empty buckets but it's fairly non offensive and we can check with the major trading clients if it breaks them. My bet is that it won't introduce breakage.

@paulbellamy paulbellamy self-assigned this May 26, 2021
@paulbellamy
Copy link
Contributor

I've had an idea how to reduce disk size of pre-computed buckets, which would allow us to make this more efficient without changing the API. Just seeing how feasible it is before committing to that, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
horizon performance issues aimed at improving performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants