-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create addresses.info
#6308
Create addresses.info
#6308
Conversation
hey @jeff-dude, i'm trying to build this spell that will have 1 line per address with high level info ( the goal is to easily get aggregated data for any evm address but this would also be super useful for filtering other spells and making them more efficient (ie filtering each address by first tx). this is just for ethereum for now but i want to make it for all evm chains and have a crosschain version of the spell. in order to build it in spellbook, I use the |
at first glance, looks like it could be how you use timestamps in the incremental phase. for source filter, you consistently filter on i also notice you left join to |
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
FROM executed_txs | ||
LEFT JOIN is_contract USING (address) | ||
LEFT JOIN transfers USING (address) | ||
LEFT JOIN {{ source('addresses_events_'~blockchain, 'first_funded_by')}} ffb USING (address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to note that with this left join setup we will only have records for addresses that have executed transactions (EOAs).
If you want to include smart contracts and accounts, we should change the join type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks I swapped it so first_funded_by is the FROM
table with the rest being joined onto and executed_txs + transfers as FULL OUTER JOINs
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
FROM executed_txs | ||
LEFT JOIN is_contract USING (address) | ||
LEFT JOIN {{ source('addresses_events_'~blockchain, 'first_funded_by')}} ffb USING (address) | ||
LEFT JOIN transfers USING (address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment on the join type used for including all the addresses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks I swapped it so first_funded_by is the FROM
table with the rest being joined onto and executed_txs + transfers as FULL OUTER JOINs
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
dbt_subprojects/daily_spellbook/macros/models/sector/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
dbt_subprojects/daily_spellbook/models/addresses/addresses_info.sql
Outdated
Show resolved
Hide resolved
file_format = 'delta', | ||
incremental_strategy = 'merge', | ||
unique_key = ['address'], | ||
merge_update_columns = ['executed_tx_count', 'max_nonce', 'is_smart_contract', 'namespace', 'name', 'first_funded_by', 'first_funded_by_block_time', 'tokens_received_count', 'tokens_received_tx_count', 'tokens_sent_count', 'tokens_sent_tx_count', 'first_transfer_block_time', 'last_transfer_block_time', 'first_received_block_number', 'last_received_block_number', 'first_sent_block_number', 'last_sent_block_number', 'received_volume_usd', 'sent_volume_usd', 'first_tx_block_time', 'last_tx_block_time', 'first_tx_block_number', 'last_tx_block_number', 'last_seen', 'last_seen_block'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for specifying the columns, this looks to be all the columns?
If there are some columns you want to exclude from the incremental merge update, you can specify merge_exclude_columns
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work until those columns were added. Those are specified here as columns to be replaced, this is the first time I make (and see) a spell of this type, afaik all other incremental spells only work as append with no replace component unlike here where this appears to be necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very strange behavior, normal incremental spells definitely work by replacing the incremental results.
I'll have a look at this through the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed these statements and everything runs fine from what I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hildobby
I have pushed some changes.
- removed the
merge_update_columns
, they don't seem needed - replaced
greatest
with a combo ofarray_max
andfilter
to allow it to handle null values - moved everything into the new
addresses
sector, this deserves it's own spot.
This PR creates
addresses.info
with aggregated high level information for all EVM addresses with chain-specific spells culminating into a crosschain one. I have thought of usecases for all addresses in there, it can be used:to easily find a subset of addresses based on some heuristics
to join on as a filter to only query addresses in the time ranges they appeared in, potentially making queries and downstream spells more efficient (for example I think I can finally optimise Create
attacks.address_poisoning
#5995 into a more optimised runtime)to do some address segmentation by chain it shows up on, if its a contract on any chain, where/when it was first funded, etc
I'll also be creating
addresses.index
using it, a spell that solely containsaddress
andindex
where index is aBIGINT
(orINT256
if there's more addresses than I anticipate) created incrementally based on when an address first appeared anywhere. This can then be used:1039
sent to those distinct addresses[10, 212, 1334]
For the incremental updates, the chain-specific macros constantly fetches the last block_number for every address (across txs and token transfers) so it can easily incrementally update based on those with no missing data. For the croschain version, a map is created for every address which has this data for all chains whre the address appeared:
those maps are then updated on incremental runs with this line which ensures that only the chains with new data get overwritten:
I think the spell is now ready, this PR will have most chains (and the crosschain spell) deactivated in prod so it can first build for ethereum and a couple of others, then I'll do some follow up PRs for the other chains and end with the crosschain spell.
@0xRobin lmk if you spot anything missing here, I've looked through extensively but a second pair of eyes might surface stuff I missed!