Optimize SQL queries in pull_missing_blocks #1458

albrow · 2018-12-18T22:41:51Z

Description

This PR optimizes the query for finding missing blocks. The old implementation is bogging down our SQL server and occasionally causing queries to take over one hour! In a full-ish database with many events and blocks the new implementation is at least a 1000x performance increase.

Testing instructions

Types of changes

Checklist:

Prefix PR title with [WIP] if necessary.
Add tests to cover changes as needed.
Update documentation as needed.
Add new entries to the relevant CHANGELOG.jsons.

albrow · 2018-12-18T22:46:18Z

packages/pipeline/src/scripts/pull_missing_blocks.ts

@@ -14,56 +14,56 @@ import { handleError, INFURA_ROOT_URL } from '../utils';
 // Number of blocks to save at once.
 const BATCH_SAVE_SIZE = 1000;
 // Maximum number of requests to send at once.
-const MAX_CONCURRENCY = 10;
+const MAX_CONCURRENCY = 20;


I also increased the maximum number of concurrent requests. I've been monitoring memory and CPU usage and this should speed up the script without taxing resources too much. We can adjust further in the future.

coveralls · 2018-12-18T22:53:31Z

Coverage increased (+0.02%) to 85.312% when pulling 1d25925 on fix/missing-blocks-query-optimization into b8f3fa9 on development.

xianny

Nice! Getting rid of the WHERE.. NOT IN clause should speed things up a lot.

Couple questions:

do we expect insignificant amount of overlap in block numbers across different events?
if not, did you also try the original UNION query with the new JOIN... NOT NULL clause and find that removing the UNION query sped things up significantly? I would expect the WHERE... NOT IN clause to be the primary slowdown, not the UNION, but don't know offhand.

If yes to one or both, looks good to merge for me 👍

albrow · 2019-01-07T21:28:16Z

@xianny

do we expect insignificant amount of overlap in block numbers across different events?

We expect overlap between block numbers across different events. However, since we save any newly found blocks before checking for missing events in the next table in the list, this doesn't result in any duplicated work.

In other words, if the same block is missing in both raw.exchange_fill_events and raw.exchange_cancel_events, the block number will only be returned once from any query. Data for that block will also only be retrieved and saved once.

if not, did you also try the original UNION query with the new JOIN... NOT NULL clause and find that removing the UNION query sped things up significantly? I would expect the WHERE... NOT IN clause to be the primary slowdown, not the UNION, but don't know offhand.

I just tested locally and adding UNION back into the query (and keeping the JOIN.. NOT NULL) results in a 100-200ms increase (about 40-80%). I vote that we keep the query the way it is now.

xianny · 2019-01-07T21:39:21Z

Makes sense to me :)

albrow requested review from wakkadojo and xianny December 18, 2018 22:43

albrow commented Dec 18, 2018

View reviewed changes

xianny reviewed Dec 19, 2018

View reviewed changes

albrow added 2 commits January 7, 2019 14:46

Optimize SQL queries in pull_missing_blocks

b5d6653

Update comment in pull_missing_blocks

1d25925

albrow force-pushed the fix/missing-blocks-query-optimization branch from 7889a03 to 1d25925 Compare January 7, 2019 22:49

albrow merged commit 7dda953 into development Jan 7, 2019

albrow deleted the fix/missing-blocks-query-optimization branch January 7, 2019 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize SQL queries in pull_missing_blocks #1458

Optimize SQL queries in pull_missing_blocks #1458

albrow commented Dec 18, 2018 •

edited

Loading

albrow Dec 18, 2018 •

edited

Loading

coveralls commented Dec 18, 2018 •

edited

Loading

xianny left a comment

albrow commented Jan 7, 2019 •

edited

Loading

xianny commented Jan 7, 2019

Optimize SQL queries in pull_missing_blocks #1458

Optimize SQL queries in pull_missing_blocks #1458

Conversation

albrow commented Dec 18, 2018 • edited Loading

Description

Testing instructions

Types of changes

Checklist:

albrow Dec 18, 2018 • edited Loading

Choose a reason for hiding this comment

coveralls commented Dec 18, 2018 • edited Loading

xianny left a comment

Choose a reason for hiding this comment

albrow commented Jan 7, 2019 • edited Loading

xianny commented Jan 7, 2019

albrow commented Dec 18, 2018 •

edited

Loading

albrow Dec 18, 2018 •

edited

Loading

coveralls commented Dec 18, 2018 •

edited

Loading

albrow commented Jan 7, 2019 •

edited

Loading