Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

Optimize SQL queries in pull_missing_blocks #1458

Merged
merged 2 commits into from
Jan 7, 2019

Conversation

albrow
Copy link
Contributor

@albrow albrow commented Dec 18, 2018

Description

This PR optimizes the query for finding missing blocks. The old implementation is bogging down our SQL server and occasionally causing queries to take over one hour! In a full-ish database with many events and blocks the new implementation is at least a 1000x performance increase.

Testing instructions

Types of changes

Checklist:

  • Prefix PR title with [WIP] if necessary.
  • Add tests to cover changes as needed.
  • Update documentation as needed.
  • Add new entries to the relevant CHANGELOG.jsons.

@albrow albrow requested review from wakkadojo and xianny December 18, 2018 22:43
@@ -14,56 +14,56 @@ import { handleError, INFURA_ROOT_URL } from '../utils';
// Number of blocks to save at once.
const BATCH_SAVE_SIZE = 1000;
// Maximum number of requests to send at once.
const MAX_CONCURRENCY = 10;
const MAX_CONCURRENCY = 20;
Copy link
Contributor Author

@albrow albrow Dec 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also increased the maximum number of concurrent requests. I've been monitoring memory and CPU usage and this should speed up the script without taxing resources too much. We can adjust further in the future.

@coveralls
Copy link

coveralls commented Dec 18, 2018

Coverage Status

Coverage increased (+0.02%) to 85.312% when pulling 1d25925 on fix/missing-blocks-query-optimization into b8f3fa9 on development.

Copy link
Contributor

@xianny xianny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Getting rid of the WHERE.. NOT IN clause should speed things up a lot.

Couple questions:

  • do we expect insignificant amount of overlap in block numbers across different events?
  • if not, did you also try the original UNION query with the new JOIN... NOT NULL clause and find that removing the UNION query sped things up significantly? I would expect the WHERE... NOT IN clause to be the primary slowdown, not the UNION, but don't know offhand.

If yes to one or both, looks good to merge for me 👍

@albrow
Copy link
Contributor Author

albrow commented Jan 7, 2019

@xianny

do we expect insignificant amount of overlap in block numbers across different events?

We expect overlap between block numbers across different events. However, since we save any newly found blocks before checking for missing events in the next table in the list, this doesn't result in any duplicated work.

In other words, if the same block is missing in both raw.exchange_fill_events and raw.exchange_cancel_events, the block number will only be returned once from any query. Data for that block will also only be retrieved and saved once.

if not, did you also try the original UNION query with the new JOIN... NOT NULL clause and find that removing the UNION query sped things up significantly? I would expect the WHERE... NOT IN clause to be the primary slowdown, not the UNION, but don't know offhand.

I just tested locally and adding UNION back into the query (and keeping the JOIN.. NOT NULL) results in a 100-200ms increase (about 40-80%). I vote that we keep the query the way it is now.

@xianny
Copy link
Contributor

xianny commented Jan 7, 2019

Makes sense to me :)

@albrow albrow force-pushed the fix/missing-blocks-query-optimization branch from 7889a03 to 1d25925 Compare January 7, 2019 22:49
@albrow albrow merged commit 7dda953 into development Jan 7, 2019
@albrow albrow deleted the fix/missing-blocks-query-optimization branch January 7, 2019 23:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants