Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused indexes from horizon schema #5061

Closed
24 tasks
tamirms opened this issue Sep 21, 2023 · 8 comments
Closed
24 tasks

Remove unused indexes from horizon schema #5061

tamirms opened this issue Sep 21, 2023 · 8 comments
Assignees
Labels
horizon performance issues aimed at improving performance

Comments

@tamirms
Copy link
Contributor

tamirms commented Sep 21, 2023

While investigating #5059 , I discovered that the history_operations_on_type was not used by Horizon despite being present in the db schema for several years. I used the following query from https://pgdash.io/blog/finding-unused-indexes-in-postgresql.html to find all the indexes in the horizon schema and the number of times each index has been scanned:

select schemaname || '.' || indexrelname as index, idx_scan from pg_stat_user_indexes;

The output from that query indicates that there are other indexes which are also unused:

index                                  |   idx_scan    
------------------------------------------------------------------------+---------------
 public.accounts_data_pkey                                              |             0
 public.accounts_pkey                                                   |    1031025647
 public.accounts_signers_pkey                                           |        108852
 public.claimable_balance_claimants_pkey                                |             0
 public.claimable_balances_pkey                                         |     854140668
 public.exp_asset_stats_pkey                                            |      25320968
 public.gorp_migrations_pkey                                            |             0
 public.history_assets_asset_code_asset_type_asset_issuer_key           |     374988189
 public.history_assets_pkey                                             |   51865712229
 public.history_trades_60000_pkey                                       |       9023515
 public.key_value_store_pkey                                            |    2560632613
 public.liquidity_pools_pkey                                            |      12866984
 public.offers_pkey                                                     |     487719082
 public.trust_lines_pkey                                                |             0
 public.accounts_by_sponsor                                             |          1336
 public.accounts_data_account_id_name                                   |     549970728
 public.accounts_data_by_sponsor                                        |          1605
 public.accounts_home_domain                                            |             0
 public.accounts_inflation_destination                                  |             0
 public.accounts_signers_by_sponsor                                     |          1330
 public.asset_by_issuer                                                 |             0
 public.best_offer                                                      |    1122713103
 public.by_hash                                                         |     799480837
 public.by_inner_hash                                                   |     799480721
 public.by_ledger                                                       |    1095649867
 public.claimabable_balances_by_claimants                               |             0
 public.claimable_balance_claimants_by_destination_last_modified_ledger |      32094806
 public.claimable_balances_by_asset_last_modified_ledger_id             |       3789899
 public.claimable_balances_by_last_modified_ledger_and_id               |       9862204
 public.claimable_balances_by_sponsor_last_modified_ledger_id           |       1022056
 public.exp_asset_stats_by_issuer                                       |      95818931
 public.hist_e_by_order                                                 |    1156816169
 public.hist_e_id                                                       |     581811297
 public.hist_op_p_id                                                    |     440664077
 public.hist_tx_p_id                                                    | 1403816565469
 public.hop_by_hoid                                                     |    1011542798
 public.hs_ledger_by_id                                                 |     197524618
 public.hs_transaction_by_id                                            |  172410300570
 public.htp_by_htid                                                     |     231611894
 public.htrd_agg_bucket_lookup                                          |             0
 public.htrd_agg_counter_asset                                          |       7757630
 public.htrd_agg_open_ledger_toid                                       |             0
 public.htrd_agg_timestamp_brin                                         |             0
 public.htrd_by_base_account                                            |     887941186
 public.htrd_by_base_liquidity_pool_id                                  |     911107636
 public.htrd_by_base_offer                                              |       1466333
 public.htrd_by_counter_account                                         |     884081479
 public.htrd_by_counter_asset                                           |     568049275
 public.htrd_by_counter_liquidity_pool_id                               |     911091947
 public.htrd_by_counter_offer                                           |       1253628
 public.htrd_by_trade_type                                              |             0
 public.htrd_pair_pid                                                   |     722001608
 public.htrd_pair_time_lookup                                           |             0
 public.htrd_pid                                                        |    1312271374
 public.htrd_time_lookup                                                |             0
 public.index_history_accounts_on_address                               |     817814164
 public.index_history_accounts_on_id                                    |  917548674819
 public.index_history_claimable_balances_on_claimable_balance_id        |       2951043
 public.index_history_claimable_balances_on_id                          |             0
 public.index_history_effects_on_type                                   |             0
 public.index_history_ledgers_on_closed_at                              |             0
 public.index_history_ledgers_on_importer_version                       |             0
 public.index_history_ledgers_on_ledger_hash                            |             0
 public.index_history_ledgers_on_previous_ledger_hash                   |             0
 public.index_history_ledgers_on_sequence                               |   24368570193
 public.index_history_liquidity_pools_on_id                             |   47176367317
 public.index_history_liquidity_pools_on_liquidity_pool_id              |        381594
 public.index_history_operation_claimable_balances_on_id                |       2936823
 public.index_history_operation_claimable_balances_on_ids               |             0
 public.index_history_operation_claimable_balances_on_operation_id      |       8008497
 public.index_history_operation_liquidity_pools_on_id                   |         50979
 public.index_history_operation_liquidity_pools_on_ids                  |        284906
 public.index_history_operation_liquidity_pools_on_operation_id         |        747589
 public.index_history_operations_on_id                                  |  109216678419
 public.index_history_operations_on_transaction_id                      |     799062041
 public.index_history_operations_on_type                                |             0
 public.index_history_transaction_claimable_balances_on_id              |         14202
 public.index_history_transaction_claimable_balances_on_ids             |             0
 public.index_history_transaction_claimable_balances_on_transaction_id  |         16445
 public.index_history_transaction_liquidity_pools_on_id                 |            42
 public.index_history_transaction_liquidity_pools_on_ids                |           620
 public.index_history_transaction_liquidity_pools_on_transaction_id     |           684
 public.liquidity_pools_by_asset_reserves                               |       3372047
 public.live_liquidity_pools                                            |             0
 public.live_offers                                                     |             0
 public.offers_by_buying_asset                                          |         73782
 public.offers_by_last_modified_ledger                                  |      37813497
 public.offers_by_seller                                                |     134091106
 public.offers_by_selling_asset                                         |             0
 public.offers_by_sponsor                                               |             3
 public.signers_by_account                                              |     550101786
 public.trade_effects_by_order_book                                     |             0
 public.trust_lines_by_account_id                                       |     557948030
 public.trust_lines_by_issuer                                           |             0
 public.trust_lines_by_liquidity_pool_id                                |        151938
 public.trust_lines_by_sponsor                                          |          1500
 public.trust_lines_by_type_code_issuer_account                         |        996720
(97 rows)

If we ignore all the primary key indexes, the indexes with 0 scans are likely to be unused. We should investigate all the indexes below and determine if they can be removed:

  • accounts_home_domain
  • accounts_inflation_destination
  • asset_by_issuer
  • claimabable_balances_by_claimants
  • htrd_agg_bucket_lookup
  • htrd_agg_open_ledger_toid
  • htrd_agg_timestamp_brin
  • htrd_by_trade_type
  • htrd_pair_time_lookup
  • htrd_time_lookup
  • index_history_claimable_balances_on_id
  • index_history_effects_on_type
  • index_history_ledgers_on_closed_at
  • index_history_ledgers_on_importer_version
  • index_history_ledgers_on_ledger_hash
  • index_history_ledgers_on_previous_ledger_hash
  • index_history_operation_claimable_balances_on_ids
  • index_history_operations_on_type
  • index_history_transaction_claimable_balances_on_ids
  • live_liquidity_pools
  • live_offers
  • offers_by_selling_asset
  • trade_effects_by_order_book
  • trust_lines_by_issuer
@mollykarcher mollykarcher added this to the Soroban Pubnet Release milestone Sep 21, 2023
@mollykarcher mollykarcher added the performance issues aimed at improving performance label Sep 26, 2023
@mollykarcher mollykarcher moved this from Backlog to Next Sprint Proposal in Platform Scrum Sep 26, 2023
@urvisavla urvisavla moved this from Next Sprint Proposal to In Progress in Platform Scrum Sep 26, 2023
@urvisavla
Copy link
Contributor

urvisavla commented Sep 26, 2023

Update:

history_effects

  • trade_effects_by_order_book:

    • Size: 405 GB
    • Index Column: btree (((details ->> 'sold_asset_type'::text)), ((details ->> 'sold_asset_code'::text)), ((details ->> 'sold_asset_issuer'::text)), ((details ->> 'bought_asset_type'::text)), ((details ->> 'bought_asset_code'::text)), ((details ->> 'bought_asset_issuer'::text))) WHERE (type = 33);
    • Usage: Not used. Effect type 33 is Trade, it may have been used very early on but now we have a dedicated table history_trades for storing trades.
  • index_history_effects_on_type:

    • Size: 286 GB
    • Index Column: type
    • Usage: Not used. /effects endpoint does not filter by 'type'.

history_transaction_claimable_balances

  • index_history_transaction_claimable_balances_on_ids:
    • Size: 46 GB
    • Index Columns: history_transaction_id, history_claimable_balance_id
    • Usage: Not used by /claimable_balances endpoint. The end point uses history_claimable_balance_id index when querying for transactions for a claimable balance.

history_operation_claimable_balances

  • index_history_operation_claimable_balances_on_ids:
    • Size: 44 GB
    • Index Columns: history_operation_id, history_claimable_balance_id
    • Usage: Not used by /claimable_balances endpoint. The end point uses history_claimable_balance_id index when querying for operations for a claimable balance.

history_claimable_balances

  • index_history_claimable_balances_on_id:
    • Size: 12 GB
    • Index Column: id
    • Usage: Not used. Not sure why?

history_ledgers

  • index_history_ledgers_on_ledger_hash:

    • Size: 6.56 GB
    • Index Column: ledger_hash
    • Usage: Not used. The /ledgers endpoint uses ledger sequence number to search for a particular ledger, not ledger hash.
  • index_history_ledgers_on_previous_ledger_hash:

    • Size: 6.56 GB
    • Index Column: previous_ledger_hash
    • Usage: Not used. The /ledgers endpoint uses ledger sequence number to search for a particular ledger previous ledger hash.
  • index_history_ledgers_on_importer_version:

    • Size: 1.1 GB
    • Index Column: importer_version
    • Usage: Not used. No endpoints or queries (in the code) that filter by importer_version are seen.
  • index_history_ledgers_on_closed_at:

    • Size: 1 GB
    • Index Column: closed_at
    • Usage: Not used. No endpoints or queries (in the code) that filter by closed_at are seen. Ledger sequence number is used for sorting.

history_assets

  • asset_by_issuer:
    • Size: 13 MB
    • Index Column: asset_issuer
    • Usage: Not used. Filtering by assets is always on the combination of asset_code, asset_type and asset_issuer and uses the compound index

offers

  • offers_by_selling_asset:

    • Size: 119 MB
    • Index Column: selling_asset, deleted
    • Usage: It could be used by /offers endpoint when filtering by selling_asset (type, code, issuer). Not sure why it’s not being used.
  • live_offers:

    • Size: 205 MB
    • Index Column: deleted, last_modified_ledger
    • Usage: Should be used when compacting offers.

claimable_balances

  • claimabable_balances_by_claimants:
    • Size: 330 MB
    • Index Column: gin (claimants jsonb_path_ops)
    • Usage: Not Used. We have a dedicated table claimable_balance_claimants table for filtering by claimants.

trust_lines

  • trust_lines_by_issuer:
    • Size: 1.9 GB
    • Index Column: asset_issuer
    • Usage: Not used. Asset filter is typically in asset_code:asset_issuer format. So when querying trust_lines to get a list of assets for an account via the /accounts endpoint, trust_lines_by_type_code_issuer_account index is used instead.

accounts

  • accounts_inflation_destination:

    • Size: 200 MB
    • Index Column: inflation_destination
    • Usage: Not used. Is inflation relevant any more?
  • accounts_home_domain:

    • Size: 171 MB
    • Index Column: home_domain
    • Usage: Not used. Not sure about the original use case for this.

liquidity_pools

  • live_liquidity_pools:
    • Size: 9.6 MB
    • Index Column: deleted, last_modified_ledger
    • Usage: Should be used during periodic in memory order book update.

history_operations

  • index_history_operations_on_type:
    • Size: 333 GB
    • Index Column: type
    • Usage: Not used. /operations does not filter by 'type'.

history_trades

  • htrd_by_trade_type:

    • Size: 111 GB
    • Index Column: trade_type, history_operation_id, order
    • Usage: Should be used by /trades endpoint when filtered by trade type (eg trade_type=liquidity_pool)?
  • htrd_time_lookup:

    • Size: 44 GB
    • Index Column: ledger_closed_at
    • Usage: Not used. This index predates the aggregate table history_trade_60000, which aggregates data by the minute using the htrd_agg_bucket_lookup index.
  • htrd_agg_bucket_lookup:

  • htrd_pair_time_lookup:

    • Size: 118 GB
    • Index Column: base_asset_id, counter_asset_id, ledger_closed_at
    • Usage: : Not used. This index predates the aggregate table history_trade_60000, for which this index was added.

history_trades_60000

  • htrd_agg_open_ledger_toid:

    • Size: 112 GB
    • Index Column: open_ledger_toid
    • Usage: I suspect it might be used during reingesting history
  • htrd_agg_timestamp_brin:

    • Size: 2.8 MB
    • Index Column: timestamp
    • Usage: Used by /trade_aggregations endpoint.

@tamirms
Copy link
Contributor Author

tamirms commented Sep 27, 2023

@urvisavla you're right that live_offers and live_liquidity_pools should be used by the compaction routine. I checked the index scan stats on the rw db and both live_offers and live_liquidity_pools have non-zero values. So we can leave those two indexes alone. The htrd_agg_bucket_lookup is also used by the rw db so we allow that index to remain as well.

Regarding offers_by_selling_asset, could you try querying the /offers endpoint on staging with a selling_asset filter to see if it increases the scan count? It could be that we receive no traffic which uses that particular filter.

Regarding htrd_by_trade_type, you can also try query the /trades endpoint with trade_type=liquidity_pool to see if the scan count increases.

@urvisavla
Copy link
Contributor

select schemaname || '.' || indexrelname as index, idx_scan from pg_stat_user_indexes;

Result of running this query on the production RW database:

                                 index                                  |   idx_scan   
------------------------------------------------------------------------+--------------
 public.accounts_data_pkey                                              |   1329897616
 public.accounts_pkey                                                   | 118868532332
 public.accounts_signers_pkey                                           |      1768486
 public.claimable_balance_claimants_pkey                                | 155858761671
 public.claimable_balances_pkey                                         | 155606785996
 public.exp_asset_stats_pkey                                            |    128820560
 public.gorp_migrations_pkey                                            |           26
 public.history_assets_asset_code_asset_type_asset_issuer_key           |    594624967
 public.history_assets_pkey                                             |            0
 public.history_trades_60000_pkey                                       |            0
 public.key_value_store_pkey                                            |    255588367
 public.liquidity_pools_pkey                                            |    160088952
 public.offers_pkey                                                     |  17871233078
 public.trust_lines_pkey                                                | 347912011754
 public.accounts_by_sponsor                                             |            0
 public.accounts_data_account_id_name                                   |            0
 public.accounts_data_by_sponsor                                        |        15860
 public.accounts_home_domain                                            |            0
 public.accounts_inflation_destination                                  |            0
 public.accounts_signers_by_sponsor                                     |            0
 public.asset_by_issuer                                                 |            0
 public.best_offer                                                      |            0
 public.by_hash                                                         |            0
 public.by_inner_hash                                                   |            0
 public.by_ledger                                                       |    115728247
 public.claimabable_balances_by_claimants                               |            0
 public.claimable_balance_claimants_by_destination_last_modified_ledger |            0
 public.claimable_balances_by_asset_last_modified_ledger_id             |            0
 public.claimable_balances_by_last_modified_ledger_and_id               |            0
 public.claimable_balances_by_sponsor_last_modified_ledger_id           |            0
 public.exp_asset_stats_by_issuer                                       |            0
 public.hist_e_by_order                                                 |            0
 public.hist_e_id                                                       |            0
 public.hist_op_p_id                                                    |            0
 public.hist_tx_p_id                                                    |            0
 public.hop_by_hoid                                                     |            0
 public.hs_ledger_by_id                                                 |     18791724
 public.hs_transaction_by_id                                            |            0
 public.htp_by_htid                                                     |            0
 public.htrd_agg_bucket_lookup                                          |      9395862
 public.htrd_agg_counter_asset                                          |            0
 public.htrd_agg_open_ledger_toid                                       |            0
 public.htrd_agg_timestamp_brin                                         |      3131954
 public.htrd_by_base_account                                            |            0
 public.htrd_by_base_liquidity_pool_id                                  |            0
 public.htrd_by_base_offer                                              |            0
 public.htrd_by_counter_account                                         |            0
 public.htrd_by_counter_asset                                           |            0
 public.htrd_by_counter_liquidity_pool_id                               |            0
 public.htrd_by_counter_offer                                           |            0
 public.htrd_by_trade_type                                              |            0
 public.htrd_pair_pid                                                   |            0
 public.htrd_pair_time_lookup                                           |            0
 public.htrd_pid                                                        |            0
 public.htrd_time_lookup                                                |            0
 public.index_history_accounts_on_address                               |   4217607930
 public.index_history_accounts_on_id                                    |            0
 public.index_history_claimable_balances_on_claimable_balance_id        |   1017040408
 public.index_history_claimable_balances_on_id                          |            0
 public.index_history_effects_on_type                                   |            0
 public.index_history_ledgers_on_closed_at                              |            0
 public.index_history_ledgers_on_importer_version                       |            0
 public.index_history_ledgers_on_ledger_hash                            |            0
 public.index_history_ledgers_on_previous_ledger_hash                   |            0
 public.index_history_ledgers_on_sequence                               |    379493592
 public.index_history_liquidity_pools_on_id                             |            0
 public.index_history_liquidity_pools_on_liquidity_pool_id              |    158713310
 public.index_history_operation_claimable_balances_on_id                |            0
 public.index_history_operation_claimable_balances_on_ids               |            0
 public.index_history_operation_claimable_balances_on_operation_id      |            0
 public.index_history_operation_liquidity_pools_on_id                   |            0
 public.index_history_operation_liquidity_pools_on_ids                  |            0
 public.index_history_operation_liquidity_pools_on_operation_id         |            0
 public.index_history_operations_on_id                                  |            0
 public.index_history_operations_on_transaction_id                      |            0
 public.index_history_operations_on_type                                |            0
 public.index_history_transaction_claimable_balances_on_id              |            0
 public.index_history_transaction_claimable_balances_on_ids             |            0
 public.index_history_transaction_claimable_balances_on_transaction_id  |            0
 public.index_history_transaction_liquidity_pools_on_id                 |            0
 public.index_history_transaction_liquidity_pools_on_ids                |            0
 public.index_history_transaction_liquidity_pools_on_transaction_id     |            0
 public.liquidity_pools_by_asset_reserves                               |            0
 public.live_liquidity_pools                                            |      3161895
 public.live_offers                                                     |      3161895
 public.offers_by_buying_asset                                          |            0
 public.offers_by_last_modified_ledger                                  |     28157121
 public.offers_by_seller                                                |            0
 public.offers_by_selling_asset                                         |            0
 public.offers_by_sponsor                                               |            0
 public.signers_by_account                                              | 117792685728
 public.trade_effects_by_order_book                                     |            0
 public.trust_lines_by_account_id                                       |        24443
 public.trust_lines_by_issuer                                           |            0
 public.trust_lines_by_liquidity_pool_id                                |            0
 public.trust_lines_by_sponsor                                          |            0
 public.trust_lines_by_type_code_issuer_account                         |            0
(97 rows)

@urvisavla
Copy link
Contributor

Regarding offers_by_selling_asset, could you try querying the /offers endpoint on staging with a selling_asset filter to see if it increases the scan count? It could be that we receive no traffic which uses that particular filter.

Regarding htrd_by_trade_type, you can also try query the /trades endpoint with trade_type=liquidity_pool to see if the scan count increases.

Ran the following queries on the staging and verified that the index count is increasing proportionally for both offers_by_selling_asset and htrd_by_trade_type. The change in index count was observed on the RW database, but I expected it to use the RO database (?).

curl “http://localhost:8000/offers?selling_asset_issuer=GDW3CNKSP5AOTDQ2YCKNGC6L65CE4JDX3JS5BV427OB54HCF2J4PUEVG&selling_asset_type=credit_alphanum4&selling_asset_code=CHP”
curl "localhost:8000/trades?trade_type=liquidity_pool"

@tamirms
Copy link
Contributor Author

tamirms commented Sep 28, 2023

The change in index count was observed on the RW database, but I expected it to use the RO database (?).

I see you were sending the requests to localhost. If you ran those curl commands on an ingesting node then the queries got executed by the RW database. But if you were to run those curl commands on non-ingesting nodes it should only query the RO database.

@urvisavla
Copy link
Contributor

urvisavla commented Sep 28, 2023

After investigation the following indexes are found to be unused.

  • accounts_home_domain
  • accounts_inflation_destination
  • asset_by_issuer
  • claimabable_balances_by_claimants
    - [ ] htrd_pair_time_lookup**
  • htrd_time_lookup
    - [ ] index_history_claimable_balances_on_id**
  • index_history_effects_on_type
  • index_history_ledgers_on_closed_at
  • index_history_ledgers_on_importer_version
  • index_history_ledgers_on_ledger_hash
  • index_history_ledgers_on_previous_ledger_hash
  • index_history_operation_claimable_balances_on_ids
  • index_history_operations_on_type
  • index_history_transaction_claimable_balances_on_ids
  • trade_effects_by_order_book
  • trust_lines_by_issuer

**used by reap routine

@urvisavla
Copy link
Contributor

Deployed a build on staging with migration to drop indices on Friday, 10/06. Here are the observations:

  1. The deploy job took approximately 40 minutes.
  2. Free storage space on the DB increased ~1.3 TB. Spikes in DB CPU utilization reduced and stabilized at around 15%.
  3. Notable improvement was seen in the LedgersProcessor ingestion processor, with the duration reducing noticeably from ~4ms to ~2ms . No other impact on any other ingestion processor or overall ingestion was observed.
  4. No noticeable change in API throughput, requests and responses. Additionally, there was no noticable change in top slow query routes.

So far the deployment had no significant impact on API performance (reads), resulted in reduced disk space and CPU load on the DB instance and showed no significant impact on ingestion (writes). Suggest to keep it running for another week or so for further observation.

Screenshot 2023-10-09 at 11 22 16 AM Screenshot 2023-10-09 at 11 21 00 AM Screenshot 2023-10-09 at 10 39 31 AM Screenshot 2023-10-09 at 3 04 12 PM

@urvisavla
Copy link
Contributor

Update:

Following are some of the graphs showing improvements in ingestion processors' run duration. Also observed reduction in 5xx errors. No significant change in avg ledger ingestion duration.

10/2: Started traffic mirroring
10/6: Indices were dropped

EffectProcessor
EffectProcessor

LedgersProcessor
LedgersProcessor

ClaimableBalanceChangeProcessor
ClaimableBalanceChangeProcessor

TradeProcessor
TradeProcessor

5XX Response
500 Errors

Average Ledger Ingestion Duration
Average Ledger Ingestion Duration

@urvisavla urvisavla moved this from In Progress to Done in Platform Scrum Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
horizon performance issues aimed at improving performance
Projects
Status: Done
Development

No branches or pull requests

3 participants