Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTG-979 Fix collection verification check for API requests #329

Open
wants to merge 10 commits into
base: new-main
Choose a base branch
from

Conversation

n00m4d
Copy link
Contributor

@n00m4d n00m4d commented Nov 29, 2024

What

This PR fixes collection verification check during keys selecting from the Postgre.

Also it adds one more new test case with selecting different type of assets with and without collections.

@@ -303,7 +303,8 @@ fn add_filter_clause<'a>(
}

if !options.show_unverified_collections {
query_builder.push(" AND assets_v3.ast_is_collection_verified = true");
// if there is no collection for asset it doesn't mean that it's unverified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it also doesn't mean that it's verified 🙃
But as I understood other providers have such logic, so there just thoughts out loud

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point in that change was that if there is no collection for an asset we cannot filter it by collection_verified field because there is no such field for asset at all.

@n00m4d n00m4d requested a review from StanChe December 3, 2024 15:57
@n00m4d
Copy link
Contributor Author

n00m4d commented Dec 12, 2024

Added new index as well for this PR - CREATE INDEX assets_v3_is_collection_verified ON assets_v3 (ast_is_collection_verified) WHERE ast_is_collection_verified IS NULL OR ast_is_collection_verified = TRUE;

@n00m4d
Copy link
Contributor Author

n00m4d commented Dec 12, 2024

Tested on BD with 3639661 assets. There were 622453 asset with collection_verified null, 2375833 assets with collection_verified true and 641375 assets with collection_verified false.

results with old indexes

req: explain analyze select * from assets_v3 where ast_collection = decode('0000000000000020000000000000000000000000000000000000000000000000', 'hex') AND (assets_v3.ast_is_collection_verified is null or assets_v3.ast_is_collection_verified = true);

response:

Bitmap Heap Scan on assets_v3  (cost=39762.56..158334.94 rows=1214006 width=249) (actual time=205.336..1671.179 rows=1733174 loops=1)
  Recheck Cond: (((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified IS NULL)) OR ((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND ast_is_collection_verified))
  Filter: ((ast_is_collection_verified IS NULL) OR ast_is_collection_verified)
  Heap Blocks: exact=94635
  ->  BitmapOr  (cost=39762.56..39762.56 rows=1407070 width=0) (actual time=177.062..177.064 rows=0 loops=1)
        ->  Bitmap Index Scan on assets_v3_collection_is_collection_verified  (cost=0.00..8285.66 rows=297723 width=0) (actual time=0.120..0.120 rows=0 loops=1)
              Index Cond: ((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified IS NULL))
        ->  Bitmap Index Scan on assets_v3_collection_is_collection_verified  (cost=0.00..30869.90 rows=1109347 width=0) (actual time=176.934..176.934 rows=1733274 loops=1)
              Index Cond: ((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified = true))
Planning Time: 1.566 ms
JIT:
  Functions: 6
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.087 ms, Inlining 0.000 ms, Optimization 0.564 ms, Emission 11.645 ms, Total 13.295 ms
Execution Time: 1752.515 ms

after few launches execution time becomes 756.960 ms

old request: explain analyze select * from assets_v3 where ast_collection = decode('0000000000000020000000000000000000000000000000000000000000000000', 'hex') AND assets_v3.ast_is_collection_verified = true;

response:

Bitmap Heap Scan on assets_v3  (cost=31147.24..145998.07 rows=1109347 width=249) (actual time=199.773..1092.169 rows=1733174 loops=1)
  Recheck Cond: (ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea)
  Filter: ast_is_collection_verified
  Heap Blocks: exact=94635
  ->  Bitmap Index Scan on assets_v3_collection_is_collection_verified  (cost=0.00..30869.90 rows=1109347 width=0) (actual time=177.246..177.247 rows=1733274 loops=1)
        Index Cond: ((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified = true))
Planning Time: 0.347 ms
JIT:
  Functions: 6
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.901 ms, Inlining 0.000 ms, Optimization 0.471 ms, Emission 6.167 ms, Total 7.540 ms
Execution Time: 1164.026 ms

request without collection: explain analyze select * from assets_v3 where (assets_v3.ast_is_collection_verified is null or assets_v3.ast_is_collection_verified = true);

response:

Seq Scan on assets_v3  (cost=10000000000.00..10000137390.11 rows=2583542 width=249) (actual time=88.593..712.277 rows=2998286 loops=1)
  Filter: ((ast_is_collection_verified IS NULL) OR ast_is_collection_verified)
  Rows Removed by Filter: 641375
Planning Time: 0.112 ms
JIT:
  Functions: 4
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.588 ms, Inlining 2.056 ms, Optimization 56.029 ms, Emission 30.491 ms, Total 89.164 ms
Execution Time: 837.692 ms

The worst time I received here during tests was 2000 ms

results with new index added

new index which was added: CREATE INDEX assets_v3_is_collection_verified ON assets_v3 (ast_is_collection_verified) WHERE ast_is_collection_verified IS NULL OR ast_is_collection_verified = TRUE;

req: explain analyze select * from assets_v3 where ast_collection = decode('0000000000000020000000000000000000000000000000000000000000000000', 'hex') AND (assets_v3.ast_is_collection_verified is null or assets_v3.ast_is_collection_verified = true);

response:

Bitmap Heap Scan on assets_v3  (cost=27129.94..152770.80 rows=1213690 width=249) (actual time=238.529..1055.155 rows=1733174 loops=1)
  Recheck Cond: (((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified IS NULL)) OR ast_is_collection_verified)
  Filter: (((ast_is_collection_verified IS NULL) OR ast_is_collection_verified) AND (ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea))
  Rows Removed by Filter: 642659
  Heap Blocks: exact=100980
  ->  BitmapOr  (cost=27129.94..27129.94 rows=1972549 width=0) (actual time=208.381..208.382 rows=0 loops=1)
        ->  Bitmap Index Scan on assets_v3_collection_is_collection_verified  (cost=0.00..8284.88 rows=297645 width=0) (actual time=0.028..0.028 rows=0 loops=1)
              Index Cond: ((ast_collection = '\x0000000000000020000000000000000000000000000000000000000000000000'::bytea) AND (ast_is_collection_verified IS NULL))
        ->  Bitmap Index Scan on assets_v3_is_collection_verified  (cost=0.00..18238.21 rows=1674904 width=0) (actual time=208.352..208.352 rows=2375833 loops=1)
              Index Cond: (ast_is_collection_verified = true)
Planning Time: 43.805 ms
JIT:
  Functions: 6
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 38.129 ms, Inlining 0.000 ms, Optimization 1.171 ms, Emission 9.186 ms, Total 48.486 ms
Execution Time: 1156.996 ms

req: explain analyze select * from assets_v3 where (assets_v3.ast_is_collection_verified is null or assets_v3.ast_is_collection_verified = true);

response:

Bitmap Heap Scan on assets_v3  (cost=22312.49..149125.17 rows=2582868 width=249) (actual time=124.625..1032.834 rows=2998286 loops=1)
  Recheck Cond: ((ast_is_collection_verified IS NULL) OR ast_is_collection_verified)
  Heap Blocks: exact=100981
  ->  Bitmap Index Scan on assets_v3_is_collection_verified  (cost=0.00..21666.77 rows=2582868 width=0) (actual time=92.910..92.910 rows=2998286 loops=1)
Planning Time: 0.857 ms
JIT:
  Functions: 4
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.344 ms, Inlining 0.000 ms, Optimization 0.611 ms, Emission 7.144 ms, Total 9.099 ms
Execution Time: 1154.588 ms

The worst time I received here was 1500 ms.

Result here is worse than on same req before because there was set SET enable_seqscan = off; to check if PG will use indexes. Such as DB where tests were launched not that big it's faster to make seq scan than use indexes, but things can change once it will grow.

@n00m4d n00m4d requested a review from StanChe December 12, 2024 11:12
@StanChe
Copy link
Contributor

StanChe commented Dec 12, 2024

Please rebase on new-main.
I don't see the index being dropped and recreated on dump sync, please add it

@n00m4d n00m4d changed the base branch from feature/MTG-868-slots-storage to new-main December 12, 2024 11:31
@StanChe
Copy link
Contributor

StanChe commented Dec 13, 2024

Are we trying to do a universal filter, or is the request in the comment that was analized is the primary use case? For a universal filter the index created doesn't seem to be a good choice, it's indexing the bool value filtering it by itself, so just keeping a separate list of values that are true of null. While this would be valid in some cases, for example, when you have only 10% of records matching the filter, in a general case when roughly half the records are filtered, postgres will internally tend to use a full scan just because using the index is not efficient.
If what we're optimizing is that particular request with collection and flag, then we should build a different index: ON assets_v3 (ast_collection) WHERE ast_is_collection_verified IS NULL OR ast_is_collection_verified = TRUE;

But given already have the following index:

("assets_v3_collection_is_collection_verified", "assets_v3(ast_collection, ast_is_collection_verified) WHERE ast_collection IS NOT NULL"),

you could change the query condition to AND ast_is_collection_verified IS DISTINCT FROM FALSE;
This with current indexes will change the execution time as follows (given a big collection on a prod db)

explain analyze select * from assets_v3 where ast_collection = decode('a9a5c4c08128819570c32023cb72ca9b1b7327a3646d52cbd4ab33a09a627584', 'hex') AND (assets_v3.ast_is_collection_verified is null or assets_v3.ast_is_collection_verified = true);

"Index Scan using assets_v3_collection_is_collection_verified on assets_v3  (cost=0.57..4648476.79 rows=1033793 width=251) (actual time=149.644..226873.204 rows=1200434 loops=1)"
"  Index Cond: (ast_collection = '\xa9a5c4c08128819570c32023cb72ca9b1b7327a3646d52cbd4ab33a09a627584'::bytea)"
"  Filter: ((ast_is_collection_verified IS NULL) OR ast_is_collection_verified)"
"Planning Time: 0.500 ms"
"JIT:"
"  Functions: 6"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 1.382 ms, Inlining 62.228 ms, Optimization 55.802 ms, Emission 31.288 ms, Total 150.701 ms"
"Execution Time: 227095.659 ms"

to

explain analyze select * from assets_v3 where ast_collection = decode('a9a5c4c08128819570c32023cb72ca9b1b7327a3646d52cbd4ab33a09a627584', 'hex') AND ast_is_collection_verified IS DISTINCT FROM FALSE;


"Index Scan using assets_v3_collection_is_collection_verified on assets_v3  (cost=0.57..4651888.10 rows=1175411 width=251) (actual time=103.503..9603.134 rows=1200434 loops=1)"
"  Index Cond: (ast_collection = '\xa9a5c4c08128819570c32023cb72ca9b1b7327a3646d52cbd4ab33a09a627584'::bytea)"
"  Filter: (ast_is_collection_verified IS DISTINCT FROM false)"
"Planning Time: 0.618 ms"
"JIT:"
"  Functions: 6"
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 2.866 ms, Inlining 14.397 ms, Optimization 59.276 ms, Emission 29.749 ms, Total 106.287 ms"
"Execution Time: 9644.402 ms"

@@ -231,6 +232,7 @@ impl PgClient {
("assets_v3_specification_asset_class", "assets_v3 (ast_specification_asset_class) WHERE ast_specification_asset_class IS NOT NULL AND ast_specification_asset_class <> 'unknown'::specification_asset_class"),
("assets_v3_specification_version", "assets_v3 (ast_specification_version) WHERE ast_specification_version <> 'v1'::specification_versions"),
("assets_v3_supply", "assets_v3(ast_supply) WHERE ast_supply IS NOT NULL"),
("assets_v3_is_collection_verified", "assets_v3(ast_is_collection_verified) WHERE ast_is_collection_verified IS NULL OR ast_is_collection_verified = TRUE"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've dropped the migration that's adding a new index. Is it needed here?

Copy link
Contributor

@StanChe StanChe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants