Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: allow filter EVM tokens by name #460

Merged
merged 1 commit into from
Jun 27, 2023
Merged

Conversation

pro-wh
Copy link
Collaborator

@pro-wh pro-wh commented Jun 24, 2023

tested locally on ~500 blocks from Emerald. very few tokens though


CREATE EXTENSION pg_trgm;

CREATE INDEX ix_evm_tokens ON chain.evm_tokens USING GIST (token_name gist_trgm_ops);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.postgresql.org/docs/current/pgtrgm.html

can someone check if this is right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh the index name was not good. fixed to be more descriptive

@@ -462,10 +462,11 @@ const (
WHERE
(tokens.runtime = $1) AND
($2::oasis_addr IS NULL OR tokens.token_address = $2::oasis_addr) AND
($3::text IS NULL OR tokens.token_name ILIKE '%' || $3 || '%') AND
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name input is not escaped, searches with % and _ will do special things

Copy link
Member

@ptrus ptrus Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue also mentioned filtering by ticker (symbol in the db).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh let me add that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added same stuff for symbol


BEGIN;

CREATE EXTENSION pg_trgm;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks our wipe db script 💀

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some code to drop extensions too 💀 💀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just go with CREATE EXTENSION IF NOT EXISTS pg_trgm, and avoid the wiping. I'd prefer that, actually, for simplicity. It's extremely unlikely that we'll want to redefine the extension at some point :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I think it would also not be irresponsible to just omit the index. We have 2k EVM tokens in mainnet emerald. Even at 10k, scanning through their short names should be very fast.

Edit: I tried in prod mainnet now, with and without the index. The index does get used. Make of the time difference what you will:

                                                                                                                                                                                                                                                                                 [0/21466]
oasisindexer=> explain analyze select count(*) from chain.evm_tokens where token_name ilike '%eth%';                                                                                                                                                                                      
                                                 QUERY PLAN                                                                                  
-------------------------------------------------------------------------------------------------------------                                
 Aggregate  (cost=68.17..68.18 rows=1 width=8) (actual time=1.491..1.493 rows=1 loops=1)                                                     
   ->  Seq Scan on evm_tokens  (cost=0.00..68.04 rows=53 width=0) (actual time=0.016..1.481 rows=58 loops=1)                                 
         Filter: (token_name ~~* '%eth%'::text)                                                                                              
         Rows Removed by Filter: 1865                        
 Planning Time: 0.325 ms                                              
 Execution Time: 1.544 ms         
(6 rows)                                                 
                                                                      
oasisindexer=> CREATE EXTENSION pg_trgm;                                                                                                     
CREATE EXTENSION                                                                                                                                                                                                                                                                          
oasisindexer=>                                                        
oasisindexer=> CREATE INDEX ix_evm_tokens_name ON chain.evm_tokens USING GIST (token_name gist_trgm_ops);                                    
CREATE INDEX                                                                                                                                                                                                                                                                              
oasisindexer=> CREATE INDEX ix_evm_tokens_symbol ON chain.evm_tokens USING GIST (symbol gist_trgm_ops);                                      
CREATE INDEX                             
oasisindexer=> explain analyze select count(*) from chain.evm_tokens where token_name ilike '%eth%';                                                                                                                                                                                      
                                                            QUERY PLAN                                                                       
-----------------------------------------------------------------------------------------------------------------------------------          
 Aggregate  (cost=51.69..51.70 rows=1 width=8) (actual time=0.455..0.456 rows=1 loops=1)                                                                                                                                                                                                  
   ->  Bitmap Heap Scan on evm_tokens  (cost=4.56..51.56 rows=53 width=0) (actual time=0.360..0.447 rows=58 loops=1)                                                                                                                                                                      
         Recheck Cond: (token_name ~~* '%eth%'::text)                                                                                                                                                                                                                                     
         Heap Blocks: exact=30                                 
         ->  Bitmap Index Scan on ix_evm_tokens_name  (cost=0.00..4.54 rows=53 width=0) (actual time=0.344..0.345 rows=58 loops=1)                                                                                                                                                        
               Index Cond: (token_name ~~* '%eth%'::text)
 Planning Time: 0.538 ms                            
 Execution Time: 0.651 ms   

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on 58 entries?

well I'll move this to a separate PR, let's hold off until it becomes slower

this query is going to run ~on keystroke, so

Copy link
Collaborator Author

@pro-wh pro-wh Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but

{"args":[],"caller":"client.go:60","db":"indexer","err":"ERROR: cannot drop type gtrgm because extension pg_trgm requires it (SQLSTATE 2BP01)","level":"error","module":"postgres","msg":"Query","pid":188,"sql":"DROP TYPE public.gtrgm CASCADE;","time":"158.785µs","ts":"2023-06-27T18:11:29.493112027Z"}

with the current wipe code

Copy link
Contributor

@mitjat mitjat Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can manually exclude gtrgm from the list of types to be wiped? It's a little hairy but oh well, the whole Wipe() concept is vaguely a hack. I checked just now and it seems to be the only type created by the extension.

on 58 entries?
58 that match '%eth%', but ~2k entries total (Rows Removed by Filter: 1865)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then there's a bunch of functions that it depends on

@pro-wh pro-wh force-pushed the pro-wh/feature/tokenname branch from 7bda4f6 to 78e65b1 Compare June 26, 2023 23:29
if err != nil {
return fmt.Errorf("failed to list extensions: %w", err)
}
defer rows.Close()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use multiple defer rows.Close() in this function, reassigning rows 💀 💀 💀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defer captures enough context that it's all OK? Example: https://go.dev/play/p/6g8xQemALDi

If not, please do fix 🙏

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct, pointer receiver
https://go.dev/play/p/fWui7UAvo7N -> 2 2 ❌

pointer to struct, pointer receiver
https://go.dev/play/p/FNsKXdCQ7Fs -> 2 1 ✔️

interface with pointer to struct, pointer receiver
https://go.dev/play/p/Nywd7MMYyvd -> 2 1 ✔️

interface with struct, struct receiver
https://go.dev/play/p/E4f6lT8zO5o -> 2 1 ✔️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stuff is too... it's like the phrase "sensitive to initial conditions" that you might say of a physical system. I'm just gonna wrap these in iifes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I'm going to get rid of this and go with the create extension if not exists way

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm there's a listIndexerTables in its own method, so I'll go with methods

return err
}
// This one was here when I got here.
if extension == "plpgsql" {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never heard of this extension before, so not touching it??? 💀 💀 💀 💀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is even an extension, technically. It's the de-facto default procedural language for writing functions in postgres. And yes, good call not to remove, I've used it in ad-hoc ways before, and we might use it later.

WDYT about avoiding this if and just doing SELECT ... WHERE extname != 'plpgsql' instead? It's how most other filtering in Wipe() is done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok

although let's go with the create extension if not exists

@pro-wh pro-wh changed the base branch from pro-wh/feature/holders10 to main June 26, 2023 23:32
@pro-wh
Copy link
Collaborator Author

pro-wh commented Jun 26, 2023

rebased on main

name: name
schema:
type: string
description: A filter on the name, the name must contain this value as a substring.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh this should be updated to say that the symbol can match too

Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have some ideas for simplifying, see what you think.

return err
}
// This one was here when I got here.
if extension == "plpgsql" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is even an extension, technically. It's the de-facto default procedural language for writing functions in postgres. And yes, good call not to remove, I've used it in ad-hoc ways before, and we might use it later.

WDYT about avoiding this if and just doing SELECT ... WHERE extname != 'plpgsql' instead? It's how most other filtering in Wipe() is done.

if err != nil {
return fmt.Errorf("failed to list extensions: %w", err)
}
defer rows.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defer captures enough context that it's all OK? Example: https://go.dev/play/p/6g8xQemALDi

If not, please do fix 🙏


BEGIN;

CREATE EXTENSION pg_trgm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just go with CREATE EXTENSION IF NOT EXISTS pg_trgm, and avoid the wiping. I'd prefer that, actually, for simplicity. It's extremely unlikely that we'll want to redefine the extension at some point :)


BEGIN;

CREATE EXTENSION pg_trgm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I think it would also not be irresponsible to just omit the index. We have 2k EVM tokens in mainnet emerald. Even at 10k, scanning through their short names should be very fast.

Edit: I tried in prod mainnet now, with and without the index. The index does get used. Make of the time difference what you will:

                                                                                                                                                                                                                                                                                 [0/21466]
oasisindexer=> explain analyze select count(*) from chain.evm_tokens where token_name ilike '%eth%';                                                                                                                                                                                      
                                                 QUERY PLAN                                                                                  
-------------------------------------------------------------------------------------------------------------                                
 Aggregate  (cost=68.17..68.18 rows=1 width=8) (actual time=1.491..1.493 rows=1 loops=1)                                                     
   ->  Seq Scan on evm_tokens  (cost=0.00..68.04 rows=53 width=0) (actual time=0.016..1.481 rows=58 loops=1)                                 
         Filter: (token_name ~~* '%eth%'::text)                                                                                              
         Rows Removed by Filter: 1865                        
 Planning Time: 0.325 ms                                              
 Execution Time: 1.544 ms         
(6 rows)                                                 
                                                                      
oasisindexer=> CREATE EXTENSION pg_trgm;                                                                                                     
CREATE EXTENSION                                                                                                                                                                                                                                                                          
oasisindexer=>                                                        
oasisindexer=> CREATE INDEX ix_evm_tokens_name ON chain.evm_tokens USING GIST (token_name gist_trgm_ops);                                    
CREATE INDEX                                                                                                                                                                                                                                                                              
oasisindexer=> CREATE INDEX ix_evm_tokens_symbol ON chain.evm_tokens USING GIST (symbol gist_trgm_ops);                                      
CREATE INDEX                             
oasisindexer=> explain analyze select count(*) from chain.evm_tokens where token_name ilike '%eth%';                                                                                                                                                                                      
                                                            QUERY PLAN                                                                       
-----------------------------------------------------------------------------------------------------------------------------------          
 Aggregate  (cost=51.69..51.70 rows=1 width=8) (actual time=0.455..0.456 rows=1 loops=1)                                                                                                                                                                                                  
   ->  Bitmap Heap Scan on evm_tokens  (cost=4.56..51.56 rows=53 width=0) (actual time=0.360..0.447 rows=58 loops=1)                                                                                                                                                                      
         Recheck Cond: (token_name ~~* '%eth%'::text)                                                                                                                                                                                                                                     
         Heap Blocks: exact=30                                 
         ->  Bitmap Index Scan on ix_evm_tokens_name  (cost=0.00..4.54 rows=53 width=0) (actual time=0.344..0.345 rows=58 loops=1)                                                                                                                                                        
               Index Cond: (token_name ~~* '%eth%'::text)
 Planning Time: 0.538 ms                            
 Execution Time: 0.651 ms   

@pro-wh pro-wh force-pushed the pro-wh/feature/tokenname branch from 78e65b1 to f29b2db Compare June 27, 2023 21:03
@pro-wh pro-wh merged commit beeaf1c into main Jun 27, 2023
@pro-wh pro-wh deleted the pro-wh/feature/tokenname branch June 27, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants