Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/db2/schema: Remove unnecessary tables and indexes #2419

Merged

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Mar 26, 2020

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Close #2405

The asset_stats table is not used anymore because it has been replaced by exp_asset_stats.

index_history_transactions_on_id is not necessary because it is identical to hs_transaction_by_id

index_history_ledgers_on_id is not necessary because it is identical to hs_ledger_by_id

asset_by_code is not necessary because there is a compound index history_assets_asset_code_asset_type_asset_issuer_key
on UNIQUE(asset_code, asset_type, asset_issuer) which can be used on queries involving the asset_code column.

for the same reasons, exp_asset_stats_by_code can also be removed

Why

These duplicate indexes don't improve query performance but do inhibit write performance. Removing the indexes will reduce storage and improve database write performance.

Known limitations

N/A

@cla-bot cla-bot bot added the cla: yes label Mar 26, 2020
@tamirms tamirms linked an issue Mar 26, 2020 that may be closed by this pull request
@tamirms tamirms force-pushed the remove-duplicate-indexes branch 4 times, most recently from 7642617 to 99bfab5 Compare March 26, 2020 11:04
@2opremio
Copy link
Contributor

2opremio commented Mar 26, 2020

These duplicate indexes don't improve query performance but do inhibit write performance. Removing the indexes will reduce storage and improve database write performance.

Out of curiosity, do you have an authoritative source on redundant indexes causing performance problems? I have been looking it up and couldn't find anything. Intuitively, I don't see why Postgres would optimize them out, but maybe that's the case.

@tamirms
Copy link
Contributor Author

tamirms commented Mar 26, 2020

@2opremio I couldn't find anything authoritative except for the "Duplicate indexes" section in https://wiki.postgresql.org/wiki/Index_Maintenance

However, I did run a SQL query on the horizon staging db to get index size and usage statistics. Here is the output:

 schemaname |            tablename             |                       indexname                       |  num_rows   | table_size | index_size | unique | number_of_scans |  tuples_read  | tuples_fetched 
------------+----------------------------------+-------------------------------------------------------+-------------+------------+------------+--------+-----------------+---------------+----------------
 public     | exp_asset_stats                  | exp_asset_stats_pkey                                  |        9502 | 1416 kB    | 1560 kB    | Y      |          161001 |        449735 |         160990
 public     | exp_asset_stats                  | exp_asset_stats_by_code                               |        9502 | 1416 kB    | 408 kB     | N      |         3588799 |     569239833 |      568999115
 public     | history_assets                   | history_assets_asset_code_asset_type_asset_issuer_key |       26816 | 4920 kB    | 3840 kB    | Y      |        28632933 |      28632910 |       28632910
 public     | history_assets                   | asset_by_code                                         |       26816 | 4920 kB    | 792 kB     | N      |        15349015 |      25577560 |       25577560
 public     | history_ledgers                  | hs_ledger_by_id                                       | 2.67591e+07 | 20 GB      | 619 MB     | Y      |               0 |             0 |              0
 public     | history_ledgers                  | index_history_ledgers_on_id                           | 2.67591e+07 | 20 GB      | 619 MB     | Y      |       116066364 |     235943505 |      234242133
 public     | history_transactions             | hs_transaction_by_id                                  | 3.31465e+08 | 579 GB     | 7233 MB    | Y      |             102 |           102 |            102
 public     | history_transactions             | index_history_transactions_on_id                      | 3.31465e+08 | 579 GB     | 7233 MB    | Y      |    365093787345 |  150133108513 |   150015537260

If you look at the pairs (hs_ledger_by_id, index_history_transactions_on_id) and (hs_transaction_by_id, index_history_transactions_on_id) you'll find that the two indexes take up the same exact amount of space. Presumably, there must be time spent writing to the duplicate indexes, otherwise the size of the indexes would be 0. If we were to remove them, then we wouldn't need to write to those indexes anymore

@2opremio
Copy link
Contributor

That makes sense. Thanks for the explanation . I wonder why hs_transaction_by_id gets to be used instead of having a 0 counter.

Anyways, LGTM.

The asset_stats table is not used anymore because it has been replaced by exp_asset_stats.

index_history_transactions_on_id is not necessary because it is identical to hs_transaction_by_id

index_history_ledgers_on_id is not necessary because it is identical to hs_ledger_by_id

asset_by_code is not necessary because there is a compound index history_assets_asset_code_asset_type_asset_issuer_key
on `UNIQUE(asset_code, asset_type, asset_issuer)` which can be used on queries involving the `asset_code` column.

For the same reason, exp_asset_stats_by_code can also be removed
@tamirms tamirms force-pushed the remove-duplicate-indexes branch from 96daebe to c4ad431 Compare March 27, 2020 14:06
@tamirms tamirms merged commit eb72937 into stellar:release-horizon-v1.1.0 Mar 27, 2020
@tamirms tamirms deleted the remove-duplicate-indexes branch March 27, 2020 14:13
@ire-and-curses ire-and-curses modified the milestone: Horizon 1.1.0 Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove duplicate indexes
3 participants