-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Andrew7234/validator staking history #703
Conversation
b6f042b
to
c18469c
Compare
c18469c
to
fc99ca1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, couple of minor comments!
We should double-check if there's any other validator historic data we need for frontend at this time or if this is it.
CREATE INDEX ix_validator_balance_history_id_epoch ON chain.validator_balance_history (id, epoch); | ||
|
||
ALTER TABLE chain.epochs | ||
ADD COLUMN validators base64_ed25519_pubkey[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADD COLUMN validators base64_ed25519_pubkey[]; | |
ADD COLUMN active_validators base64_ed25519_pubkey[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is this missing a code change which inserts this? (or do i just not see it)
|
||
func (p *processor) ProcessItem(ctx context.Context, batch *storage.QueryBatch, epoch *Epoch) error { | ||
p.logger.Info("downloading validator balances", "epoch", epoch) | ||
validators, err := p.source.GetValidators(ctx, epoch.StartHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up calling oasis-core GetValidators
right? Which returns just the active validators for the epoch starting at that height.
I think we maybe don't want just these, but go over all known consensus accounts since escrow balance can change to any account? (and debonding as well).
So if validator goes out of the validator set - but still has stake which is being added/removed over time, we will never update the state for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we update this to track for all accounts (not just "validators" for the epoch) then we could also use this historic table for things like "account birth/created_at"
which is also a request from the frontend team.
Unless it is not feasible to go over all accounts for every epoch, then we need to think about it a bit more how to approach the "account" stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that grabbing all accounts at every epoch is untenable. We'd probably need StateToGenesis
? Or something else that will be similarly slow, since staking data is a huge part of StateToGenesis
output.
To run this in real time, we'd need StateToGenesis
to be faster than 60min (= 1 epoch). Which it is, even if only when we are somewhat careful about the node's hardware (disks). But to run this in fast-sync, even a 10-minute response would mean we need ~0.5years to process 3 years of blocks. We can get around it by using multiple nodes, but I'm not sure if it's worth it.
Also, downloading so much data would put a strain on (and possibly force us to rethink/redesign) the DB, caches, etc. It's a lot of data. Very conservatively: (1e6 accounts) * (32000 epochs) * (100 bytes per account snapshot at given epoch) = 3.2TB, but easily a few times more.
I believe we limited ourselves to just validators because it's more important that we be able to show the history of a validator (as it builds their credibility), compared to the history of a "normal" user.
But point well taken; if somebody stops being an active validator and later becomes one again, their interim history matters. Andy, how many accounts would we need to touch if we queried every account that was ever a validator? Maybe that's a reasonable compromise.
If this is slow to run or complicated to implement, I strongly suggest we go with what we already have in this PR at least for now, and thus potentially report partial histories for validators. It's much better than nothing, it's 0 additional effort, and it's launchable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we queried every account that was ever a validator
I think this should be feasable. The validators dont flactuate a lot. I believe there are 226 such validators on mainnet (based on oasisscan, i think they track validators in a similar way to this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks all for the feedback! Agreed that querying all accounts that were ever a validator is a good compromise.
One potential complication - having a batch size > 1 can cause issues if a new validator appears. Eg, if we're processing epochs 1-20 in parallel but epoch 3 introduces a new validator, the worker processing epochs 4-20 won't know about the new validator.
For a given epoch N, we want to fetch info for all the validators fetched for epoch N-1, union
-ed with the set of active validators for epoch N. This sequential dependency would require us to always run this analyzer with a batch size of 1 so we can build up the list of known validators. Compute-wise, this should be feasible given that there are ~35k epochs, but I'll need to do some testing to confirm. Another caveat is that we won't have staked balance history for an account before it becomes a validator. Imo this is fine, since we won't be displaying any deep history and it won't hinder staking rewards calculations when we add that in the future.
We could instead hardcode the 226 validators and have the analyzer fetch the staked balance for each one up to some recent block, and then have the analyzer just proceed sequentially(batch_size=1) from then onwards.
Another approach would be to treat each (epoch, validator) as the item instead of just (epoch). I need to do more work to see how complicated the item accounting becomes, but maybe it'll produce a nicer solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to have the analyzer process the epochs sequentially from a given start_height
, taking the union of {current validators} x {validators from prior epoch}. The start height is expected to match consensus analyzer start height, but could be adjusted for a faster reindex. This allows us to track the continuous history for each validator from the epoch they first become a validator, even if the validator drops out after.
Having the analyzer process epochs sequentially ends up not constraining performance since each epoch makes >100 node requests for the validator balances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you Andy! Obviously still some TODOs, which I tried to call out in comments, but at a high level this LGTM. And for what it's worth, it's exactly the design I arrived at when brainstorming this with Jan (at which point I had forgotten that you already have an in-flight PR with this).
|
||
func (p *processor) ProcessItem(ctx context.Context, batch *storage.QueryBatch, epoch *Epoch) error { | ||
p.logger.Info("downloading validator balances", "epoch", epoch) | ||
validators, err := p.source.GetValidators(ctx, epoch.StartHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that grabbing all accounts at every epoch is untenable. We'd probably need StateToGenesis
? Or something else that will be similarly slow, since staking data is a huge part of StateToGenesis
output.
To run this in real time, we'd need StateToGenesis
to be faster than 60min (= 1 epoch). Which it is, even if only when we are somewhat careful about the node's hardware (disks). But to run this in fast-sync, even a 10-minute response would mean we need ~0.5years to process 3 years of blocks. We can get around it by using multiple nodes, but I'm not sure if it's worth it.
Also, downloading so much data would put a strain on (and possibly force us to rethink/redesign) the DB, caches, etc. It's a lot of data. Very conservatively: (1e6 accounts) * (32000 epochs) * (100 bytes per account snapshot at given epoch) = 3.2TB, but easily a few times more.
I believe we limited ourselves to just validators because it's more important that we be able to show the history of a validator (as it builds their credibility), compared to the history of a "normal" user.
But point well taken; if somebody stops being an active validator and later becomes one again, their interim history matters. Andy, how many accounts would we need to touch if we queried every account that was ever a validator? Maybe that's a reasonable compromise.
If this is slow to run or complicated to implement, I strongly suggest we go with what we already have in this PR at least for now, and thus potentially report partial histories for validators. It's much better than nothing, it's 0 additional effort, and it's launchable.
return nil | ||
} | ||
|
||
func (p *processor) QueueLength(ctx context.Context) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Expose this in staging mainnet Grafana. Needs to be added in several places there because our metrics are named awkwardly 😬. (It would be smoother sails if analyzer name were just one of the attributes of a single global queue_length
metric)
5129c9f
to
565b179
Compare
565b179
to
c89e957
Compare
e678ff6
to
f6c69cc
Compare
9512aad
to
db66469
Compare
add validatorBalances analyzer changelog Update analyzer/validatorbalances/validatorbalances.go Co-authored-by: mitjat <[email protected]> address comments query entity address instead of node add startHeight + validator history tracking e2e_regression damask updates in progress eden e2e_cache e2e_regression eden updates post-rebase e2e cache changes
db66469
to
8edbb44
Compare
Tracks active/debonding escrow validators for each epoch.
Todo: add api endpoint
Alternative considered: We could fetch validator escrow balances and process them in the consensus block analyzer. It ended up being messy due since we only want to track validator balances at the start of each epoch and not at every block, and during fast sync this requires creating another
todo_updates
table to track per-block validator escrow balances and then only taking the first one from each epoch duringFinalizeFastSync
. It also requires us to fetch/temporarily store a lot of extraneous data. If anyone feels strongly about it let me know.