Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid overflow when computing rent distribution #12112

Merged
merged 9 commits into from
Oct 1, 2020

Conversation

ryoqun
Copy link
Contributor

@ryoqun ryoqun commented Sep 8, 2020

Problem

rent calculation overflows.

Summary of Changes

Fix it. Also, make any native token distribution code a bit talkative...

  • write more details
  • add tests....

Context

Found via bank capitalization, which in turn is used for staking rewards calculation.

This bug has been introduced at #7396

let validator_rent_shares = validator_stakes
.into_iter()
.map(|(pubkey, staked)| {
let rent_share =
(((staked * rent_to_be_distributed) as f64) / (total_staked as f64)) as u64;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my debug log excerpt:

distributed rent: 120525
staked(374999998287840) / total_staked(2021686897690030)
[1] pry(main)> 120525 * 374999998287840
=> 45196874793641916000
[2] pry(main)> 2 ** 64
=> 18446744073709551616

;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 8, 2020

Alternatively, I think rent distribution just should go to collector_id? Why not?

Current implmentation is complex (thus caused this bug). Why does it distribute over all of stacked votes (=~ validators) at every slot, first of all? It costs quite much computing resource....

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 11, 2020

Alternatively, I think rent distribution just should go to collector_id? Why not?

Current implmentation is complex (thus caused this bug). Why does it distribute over all of stacked votes (=~ validators) at every slot, first of all? It costs quite much computing resource....

@t-nelson Wdyt? Am I missing something? In other words, why collected lamports is treated differentally?: transaction fee (to the leader atm) vs rent fee (over the all staked validator set).

@t-nelson
Copy link
Contributor

@t-nelson Wdyt? Am I missing something? In other words, why collected lamports is treated differentally?: transaction fee (to the leader atm) vs rent fee (over the all staked validator set).

If I had to guess, it's because rent is only collected once per epoch and thus validators occurring earlier in the leader schedule will get a disproportionate reward. So we can't rely on it pretty much averaging out over the course of the epoch, like we do for fees.

Perhaps @rob-solana has some insight to share on the matter?

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 14, 2020

@t-nelson Wdyt? Am I missing something? In other words, why collected lamports is treated differentally?: transaction fee (to the leader atm) vs rent fee (over the all staked validator set).

If I had to guess, it's because rent is only collected once per epoch and thus validators occurring earlier in the leader schedule will get a disproportionate reward. So we can't rely on it pretty much averaging out over the course of the epoch, like we do for fees.

Perhaps @rob-solana has some insight to share on the matter?

If I had to guess, it's because rent is only collected once per epoch and thus validators occurring earlier in the leader schedule will get a disproportionate reward. So we can't rely on it pretty much averaging out over the course of the epoch, like we do for fees.

Oh, this is good point. Thanks for the input! So, even if the leader schedule is uniformly distributed even for some short period at the start of epoch, could this be a concern? I guess so. Maybe, because this is easy to grind the leader schedule?

If this is true, I'll just fix the overflow.

CC: @rwalker-com

@rwalker-com
Copy link
Contributor

just fix the overflow

the rationale for distributing rent according to stake is because fees are already distributed by stake

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 15, 2020

@rwalker-com Thanks for confirming!

So, just last question so that I can leave a good comment for the rationale for future reference.

Rents and fees should be distributed by stake alike at each slot.
Rents are done so naively at the end of each slot. Fees are done
so via `collector_id` to indirectly realize the same outcome
(`collector_id` comes from staked-weighted leader schedule).

Rents can't use `collector_id` to simplify things, because actual
rent collections occur much fewer compared to fees and also
much more frequently at the start of an epoch than the rest of it,
resulting in a bad approximation if `collector_id` is used.

Is my understanding correct?

@rwalker-com
Copy link
Contributor

rwalker-com commented Sep 15, 2020

I don't understand "naively" in the description. Stakes cannot change during an epoch.

collector_id is an unfortunate naming artifact, IMHO. Collector is always the leader. The way we pay fees incentivizes throughput.

Rent is the cost of doing business, everybody has to hold (or have access to) the same list of accounts, so we pay according to stake, which is a rough proxy for value to the network.

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 15, 2020

@rob-solana Thanks again for answering!

I don't understand "naively" in the description. Stakes cannot change during an epoch.

"naively" here could be replaced with "faithfully" or "strictly proportional to the validator's stake weights". I know stakes cannot change during an epoch. I wanted to contrast the difference of rent's distribution compared to fee's one.

The way we pay fees incentivizes throughput.

Rent is the cost of doing business, everybody has to hold (or have access to) the same list of accounts, so we pay according to stake, which is a rough proxy for value to the network.

Thanks for explaining. I'm getting clue. :)

So, I have another question from new perspective:

Should rent be paid according to stake, without considering staked validator's uptime at all? That's because rent should be rewarded for the storage resource utilization cost. And fees should be rewarded for the computing resource utilization cost. These resource utilization costs should separately be rewarded and thus the distribution methods differ between rents and fees.

It's a bit extreme, but stakers can earn rent rewards even if they delegate to a validator which has been delinquent for a whole epoch. Is it ok?

This means they can earn some small lamports without risk of slashing. (assuming we don't slash for downtime; or is it planned to slash? CC: @carllin). Anyway, this behavior is very irrational. And much could be earned by delegating to a healthy validator (with slight risk of slashing).

@rwalker-com
Copy link
Contributor

So, I have another question from new perspective:

Should rent be paid according to stake, without considering staked validator's uptime at all? That's because rent should be rewarded for the storage resource utilization cost. And fees should be rewarded for the computing resource utilization cost. These resource utilization costs should separately be rewarded and thus the distribution methods differ between rents and fees.

It's a bit extreme, but stakers can earn rent rewards even if they delegate to a validator which has been delinquent for a whole epoch. Is it ok?

This means they can earn some small lamports without risk of slashing. (assuming we don't slash for downtime; or is it planned to slash? CC: @carllin). Anyway, this behavior is very irrational. And much could be earned by delegating to a healthy validator (with slight risk of slashing).

Stakers are not paid with rent, only the validator is paid. But you're right: there is leakage there that could be addressed with an uptime requirement (maybe scale rent by votes/slots during the epoch).

@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 16, 2020

@rwalker-com Thanks for further clarification! I'll document what I've learnt later as comments while finishing this pr up.

@stale
Copy link

stale bot commented Sep 24, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 24, 2020
@stale stale bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 28, 2020
@codecov
Copy link

codecov bot commented Sep 28, 2020

Codecov Report

Merging #12112 into master will increase coverage by 0.0%.
The diff coverage is 96.2%.

@@           Coverage Diff           @@
##           master   #12112   +/-   ##
=======================================
  Coverage    82.1%    82.1%           
=======================================
  Files         358      358           
  Lines       83580    83629   +49     
=======================================
+ Hits        68630    68671   +41     
- Misses      14950    14958    +8     

@ryoqun ryoqun removed the v1.2 label Sep 28, 2020
Comment on lines +1253 to +1307
// Distribute collected transaction fees for this slot to collector_id (= current leader).
//
// Each validator is incentivized to process more transactions to earn more transaction fees.
// Transaction fees are rewarded for the computing resource utilization cost, directly
// proportional to their actual processing power.
//
// collector_id is rotated according to stake-weighted leader schedule. So the opportunity of
// earning transaction fees are fairly distributed by stake. And missing the opportunity
// (not producing a block as a leader) earns nothing. So, being online is incentivized as a
// form of transaction fees as well.
//
// On the other hand, rent fees are distributed under slightly different philosophy, while
// still being stake-weighted.
// Ref: distribute_rent_to_validators
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwalker-com I've added comments based on our discussion some days ago. Does this make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

// - rent fee doesn't need to be incentivized for throughput unlike transaction fees
// - leader schedule could be manipulated to locate certain validators more often at the
// start of epoch to unfairly earn more rent for the epoch.
// Ref: collect_fees
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwalker-com I've added comments based on our discussion some days ago. Does this make sense?

@ryoqun ryoqun marked this pull request as ready for review September 29, 2020 09:27
@ryoqun ryoqun requested a review from t-nelson September 29, 2020 09:27
@ryoqun
Copy link
Contributor Author

ryoqun commented Sep 29, 2020

@t-nelson I think this is finally ready for review! Could you review this in your spare time? :)

@@ -29,6 +29,10 @@ pub mod bpf_loader2_program {
solana_sdk::declare_id!("DFBnrgThdzH4W6wZ12uGPoWcMnvfZj11EHnxHcVxLPhD");
}

pub mod no_overflow_rent_distribution {
solana_sdk::declare_id!("4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @mvines I have this private key.

const VALIDATOR_STAKE: u64 = 374_999_998_287_840;

let validator_pubkey = Pubkey::new_rand();
let bootstrap_validator_stake_lamports = VALIDATOR_STAKE;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: don't need this variable.

t-nelson
t-nelson previously approved these changes Sep 29, 2020
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's give Rob a day or so to approve the comments, then in it goes!

if enforce_fix {
assert_eq!(leftover_lamports, 0);
} else if leftover_lamports != 0 {
warn!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but something to think about for the future. Perhaps we should be storing capitalization change events like this with more persistence?

Copy link
Contributor Author

@ryoqun ryoqun Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson Well, once enfoced_fix is true, non-zero leftover_lamports is a very hard error:
https://github.com/solana-labs/solana/pull/12112/files#diff-a7549f152920d85fb44e6a784b8e5e1fR2469
Also, capitalization is now periodically checked when creating a new snapshot #11927

You have a point for alerting by events, but we'll have the most hardest one (dead of cluster) if ever this rent distribution has more leftover or bad capitalization in general.

This lethal nature is justified because this is a possible indication of attempted unauthorized transfer of tokens. (ie security breach, or frankly a theft)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure! We'll probably notice the cluster halting soon enough 😉

Comment on lines 2387 to 2388
// - leader schedule could be manipulated to locate certain validators more often at the
// start of epoch to unfairly earn more rent for the epoch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these 2 lines can just be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! thanks for the review!

@mergify mergify bot dismissed t-nelson’s stale review September 30, 2020 15:48

Pull request has been modified.

@ryoqun ryoqun merged commit e3773d9 into solana-labs:master Oct 1, 2020
mergify bot pushed a commit that referenced this pull request Oct 1, 2020
* Avoid overflow when computing rent distribution

* Use assert_eq!....

* Fix tests

* Add test

* Use FeatureSet

* Add comments

* Address review comments

* Tweak a bit.

* Fix fmt

(cherry picked from commit e3773d9)

# Conflicts:
#	runtime/src/bank.rs
#	runtime/src/feature_set.rs
mergify bot added a commit that referenced this pull request Oct 2, 2020
* Avoid overflow when computing rent distribution (#12112)

* Avoid overflow when computing rent distribution

* Use assert_eq!....

* Fix tests

* Add test

* Use FeatureSet

* Add comments

* Address review comments

* Tweak a bit.

* Fix fmt

(cherry picked from commit e3773d9)

# Conflicts:
#	runtime/src/bank.rs
#	runtime/src/feature_set.rs

* Fix conflict

Co-authored-by: Ryo Onodera <[email protected]>
@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 26, 2020

We're about to enable this on testnet.

Note that this feature flag also now contains a fix not to store 0-lamport rent distributions. This is also the reason this activation was held for a bit. (#12804) (backport: #12811)

@ryoqun
Copy link
Contributor Author

ryoqun commented Nov 20, 2020

ryoqun commented 25 days ago

We're about to enable this on testnet.

Note that this feature flag also now contains a fix not to store 0-lamport rent distributions. This is also the reason this activation was held for a bit. (#12804) (backport: #12811)

well, due to my recnet inflation work, this just enabled only recently....

@ryoqun
Copy link
Contributor Author

ryoqun commented Nov 20, 2020

I'm going to enable this on mainnet-beta. I believe this is quite safe although it's expected to be activated on weekend:

  • Already on testnet with no issue so far
  • Actually, there is no rent distribution on mainnet (for that matter, testnet too at the moment). nevertheless, we'll effectively enable this bug fix on the order of testnet => mainnet-beta, indirectly via pico_inflation

@ryoqun
Copy link
Contributor Author

ryoqun commented Nov 20, 2020

I'm going to enable this on mainnet-beta. I believe this is quite safe although it's expected to be activated on weekend:

* Already on testnet with no issue so far

* Actually, there is no rent distribution on mainnet (for that matter, testnet too at the moment). nevertheless, we'll effectively enable this bug fix on the order of testnet => mainnet-beta, indirectly via `pico_inflation`

Well, I changed my mind. I'll do this early next week.

@ryoqun
Copy link
Contributor Author

ryoqun commented Nov 24, 2020

just enabled on mb, finally!:

$ target/release/solana --keypair ~/solana-mainnet-beta.json epoch-info

Block height: 48467373
Slot: 51087332
Epoch: 118
Epoch Slot Range: [50976000..51408000)
Epoch Completed Percent: 25.771%
Epoch Completed Slots: 111332/432000 (320668 remaining)
Epoch Completed Time: 12h 22m 12s/2days (1day 11h 37m 47s remaining)

$ target/release/solana --keypair ~/solana-mainnet-beta.json feature status
Feature                                      Description                              Status
MoqiU1vryuCGQSxFKA1SZ316JdLEFFhoAu6cKUNk7dN  pubkey log syscall                       inactive
SECCKV5UVUsr8sTVSVAzULjdm87r7mLPaqH2FGZjevR  inflation kill switch                    inactive
YCKSgA6XmjtkQrHBQjpyNrX6EMhJPcYcLWMVgWn36iv  max program call depth 64                active since slot 41904000
3h1BQWPDS5veRsq6mDBWruEpgPxRJkfwGexg5iiQ9mYg consistent recentblockhashes sysvar      active since slot 41040000
4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz no overflow rent distribution            inactive
5RzEHTnf6D7JPZCvwEzjM19kzBsyjSU3HoMfXaQmVgnZ ping-pong packet check #12794            inactive
DFBnrgThdzH4W6wZ12uGPoWcMnvfZj11EHnxHcVxLPhD bpf_loader2 program                      active since slot 41040000
E3PHP7w8kB7np3CTQ1qQ2tW3KCtjRSXBQgW9vM2mWv2Y secp256k1 program                        active since slot 41040000
E5JiFDQCwyC6QfT9REFyMpfK2mHcmv1GUDySU1Ue7TYv spl-token multisig fix                   active since slot 41040000
EdM9xggY5y7AhNMskRG8NgGMnaP4JFNsWi8ZZtyT1af5 max invoke call depth 4                  active since slot 41904000
EnvhHCLvg55P7PDtbvR1NwuTuAeodqpusV3MR5QEK8gs instructions sysvar                      active since slot 41040000
FtjnuAtJTWwX3Kx9m24LduNEhzaGuuPfDW6e14SX2Fy5 rent fixes (#10206, #10468, #11342)      active since slot 46224000
GaBtBJvmS4Arjj5W1NmFcyvPjsHN38UGYDq2MDwbs9Qu pico-inflation                           inactive
Gvd9gGJZDHGMNf1b3jkxrfBQSR5etrfTQSBNKCvLSFJN solana_stake_program v2                  inactive
HxvjqDSiF5sYdSYuCXsUnS8UeAoWsMT9iGoFP8pgV1mB compute budget balancing                 active since slot 41904000

$  ./target/release/solana --keypair ~/solana-mainnet-beta.json feature activate ~/work/solana-no-overflow-rent-distribution-feature.json
Activating no overflow rent distribution (4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz)



$ target/release/solana --keypair ~/solana-mainnet-beta.json feature status
Feature                                      Description                              Status
MoqiU1vryuCGQSxFKA1SZ316JdLEFFhoAu6cKUNk7dN  pubkey log syscall                       inactive
SECCKV5UVUsr8sTVSVAzULjdm87r7mLPaqH2FGZjevR  inflation kill switch                    inactive
YCKSgA6XmjtkQrHBQjpyNrX6EMhJPcYcLWMVgWn36iv  max program call depth 64                active since slot 41904000
3h1BQWPDS5veRsq6mDBWruEpgPxRJkfwGexg5iiQ9mYg consistent recentblockhashes sysvar      active since slot 41040000
4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz no overflow rent distribution            activation pending
5RzEHTnf6D7JPZCvwEzjM19kzBsyjSU3HoMfXaQmVgnZ ping-pong packet check #12794            inactive
DFBnrgThdzH4W6wZ12uGPoWcMnvfZj11EHnxHcVxLPhD bpf_loader2 program                      active since slot 41040000
E3PHP7w8kB7np3CTQ1qQ2tW3KCtjRSXBQgW9vM2mWv2Y secp256k1 program                        active since slot 41040000
E5JiFDQCwyC6QfT9REFyMpfK2mHcmv1GUDySU1Ue7TYv spl-token multisig fix                   active since slot 41040000
EdM9xggY5y7AhNMskRG8NgGMnaP4JFNsWi8ZZtyT1af5 max invoke call depth 4                  active since slot 41904000
EnvhHCLvg55P7PDtbvR1NwuTuAeodqpusV3MR5QEK8gs instructions sysvar                      active since slot 41040000
FtjnuAtJTWwX3Kx9m24LduNEhzaGuuPfDW6e14SX2Fy5 rent fixes (#10206, #10468, #11342)      active since slot 46224000
GaBtBJvmS4Arjj5W1NmFcyvPjsHN38UGYDq2MDwbs9Qu pico-inflation                           inactive
Gvd9gGJZDHGMNf1b3jkxrfBQSR5etrfTQSBNKCvLSFJN solana_stake_program v2                  inactive
HxvjqDSiF5sYdSYuCXsUnS8UeAoWsMT9iGoFP8pgV1mB compute budget balancing                 active since slot 41904000

$ ./target/release/solana confirm -v 3vswiAUPWfP5Nn9dscXsWSXvNupiwuGN2vSWXJs2JXYoCfMHf7FKc4Ep3FcXBUzHsE7NoTFUJUYxGiMfoyfsy7sy
RPC URL: https://api.mainnet-beta.solana.com
Default Signer Path: /home/ryoqun/.config/solana/id.json

Transaction executed in slot 51087649:
  Recent Blockhash: AW9KXe7EzggP2J6G8qVwQBPC3ZfhF4f91YEfjnNZa1Ss
  Signature 0: 3vswiAUPWfP5Nn9dscXsWSXvNupiwuGN2vSWXJs2JXYoCfMHf7FKc4Ep3FcXBUzHsE7NoTFUJUYxGiMfoyfsy7sy
  Signature 1: 2Xa1Qi9x2wMJLPdPNYx8R7qKkTCLabwn26qoXbMGcEam96guEJZo2BPdtbkhpQWHff1YZqCKsjgSo6mS3W2K5sLQ
  MessageHeader { num_required_signatures: 2, num_readonly_signed_accounts: 0, num_readonly_unsigned_accounts: 1 }
  Account 0: GMpQzT92L96tY4AdxTgbBBcLWGdNFJ7WyQhKww3LooKS
  Account 1: 4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz
  Account 2: 11111111111111111111111111111111
  Instruction 0
    Program: 11111111111111111111111111111111 (2)
    Account 0: GMpQzT92L96tY4AdxTgbBBcLWGdNFJ7WyQhKww3LooKS (0)
    Account 1: 4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz (1)
    Transfer { lamports: 953520 }
  Instruction 1
    Program: 11111111111111111111111111111111 (2)
    Account 0: 4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz (1)
    Allocate { space: 9 }
  Instruction 2
    Program: 11111111111111111111111111111111 (2)
    Account 0: 4kpdyrcj5jS47CZb2oJGfVxjYbsMm2Kx97gFyZrxxwXz (1)
    Assign { owner: Feature111111111111111111111111111111111111 }
  Status: Ok
    Fee: ◎0.00001
    Account 0 balance: ◎0.99903648 -> ◎0.99807296
    Account 1 balance: ◎0 -> ◎0.00095352
    Account 2 balance: ◎0.000000001

Confirmed

https://explorer.solana.com/tx/3vswiAUPWfP5Nn9dscXsWSXvNupiwuGN2vSWXJs2JXYoCfMHf7FKc4Ep3FcXBUzHsE7NoTFUJUYxGiMfoyfsy7sy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants