Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ProjectTracking] Chunk validator rewards #11900

Open
12 of 21 tasks
Longarithm opened this issue Aug 7, 2024 · 4 comments
Open
12 of 21 tasks

[ProjectTracking] Chunk validator rewards #11900

Longarithm opened this issue Aug 7, 2024 · 4 comments
Assignees
Labels
A-chain Area: Chain, client & related A-stateless-validation Area: stateless validation Near Core T-core Team: issues relevant to the core team

Comments

@Longarithm
Copy link
Member

Longarithm commented Aug 7, 2024

Goal

Make chunk validator kickouts&rewards fairly depend on the number of endorsements they actually created, not the fact of chunk inclusion

References

Design Doc

Roadmap

With approximate timelines

  • Implement EpochConfigStore
  • Implement and test Easy mode for new protocol version with low thresholds (2w, also depends on experience)
  • Validate the fix in the environment with multiple clients. Consider:
    • Add command to replay block headers from mainnet, run rewards/kickouts calculation logic, and print the differences to validate the fix.
    • Add Nayduck test for kicking out offline validators.
    • TestLoop with simulated delays in chunk application. Much faster and flexible, but never used for simulations before. Testnet may not help because it doesn’t have a heavy load. 2w. Better to start preparing in advance
    • Start using EpochConfigStore in TestLoop to bypass the hard-coded epoch config configuration. @Longarithm
  • If the fix doesn’t work, e.g. results in too many kickouts of honest CVs:
    • Try different ideas, e.g. make BP wait until 90% of chunk endorsements are received. But then one needs to determine the exact percentage and the impact on whether this impacts block production time and TPS.
      Also BP can specifically wait for small CVs, however, the exact logic is unclear.
    • Iterate on these ideas as well.
      2w
  • If simple fixes don’t work, implement Hard mode and iterate on it as well (3w)
@Longarithm Longarithm added A-chain Area: Chain, client & related T-core Team: issues relevant to the core team Near Core A-stateless-validation Area: stateless validation labels Aug 7, 2024
@Longarithm
Copy link
Member Author

Notes after meeting today:

  • Tayfun - to look into e2e impl, primarily find what is needed to write tool to analyse new algo on mainnet endorsement data
  • Alex - to look into needed TestLoop improvements to test new algo + simulate kickouts with synthetic delays

github-merge-queue bot pushed a commit that referenced this issue Aug 20, 2024
…y headers (#11940)

Tracking issue: #11900.
Roadmap document:
[Link](https://docs.google.com/document/d/1VJ6BPnZJMGQXZ56RdOUmwJMRH-BoPIfNQs94AegohsU/edit#heading=h.rd80jvftbqxx)

This PR includes the following changes:
- Introduce a new `ProtocolFeature::ChunkEndorsementsInBlockHeader` to
enable new changes in this and upcoming PRs.
- Introduce `ChunkEndorsementBitmap` to implement bitmap representation
of endorsements. See the comments for more information.
- Add the bitmap to BlockHeaderInfo and BlockInfo (add version V3).
- Update the `EpochInfoAggregator` to use the chunk endorsement bitmap
if exists, instead of the chunk production stats to aggregate
endorsement stats. These stats are used for computing rewards and
kickouts later in the EpochManager.
- Add ReplayHeaders command to simulate the new method in `mainnet`. The
command replays the block headers from the chain store and updates the
EpochManager and compares the validator infos (stats, rewards, and
kickouts) from the original EpochManager and the replayed EpochManager.
Note that instead of adding a new command, we repurpose ReplayChain
command, which seems to be doing a similar operation but it was very
primitive before, so we decided to expand it. The new code update the
`BlockInfos` if the new feature is enabled to inject the endorsements
bitmap before calling the EpochManager.

The following are not included and left for future PRs:
- Adding the bitmap to the BlockHeader (currently it always returns None
for the chunk endorsements bitmap).
- Adding/updating unittests to use the new endorsement bitmap.
- Integration tests.
github-merge-queue bot pushed a commit that referenced this issue Aug 20, 2024
…y headers (#11940)

Tracking issue: #11900.
Roadmap document:
[Link](https://docs.google.com/document/d/1VJ6BPnZJMGQXZ56RdOUmwJMRH-BoPIfNQs94AegohsU/edit#heading=h.rd80jvftbqxx)

This PR includes the following changes:
- Introduce a new `ProtocolFeature::ChunkEndorsementsInBlockHeader` to
enable new changes in this and upcoming PRs.
- Introduce `ChunkEndorsementBitmap` to implement bitmap representation
of endorsements. See the comments for more information.
- Add the bitmap to BlockHeaderInfo and BlockInfo (add version V3).
- Update the `EpochInfoAggregator` to use the chunk endorsement bitmap
if exists, instead of the chunk production stats to aggregate
endorsement stats. These stats are used for computing rewards and
kickouts later in the EpochManager.
- Add ReplayHeaders command to simulate the new method in `mainnet`. The
command replays the block headers from the chain store and updates the
EpochManager and compares the validator infos (stats, rewards, and
kickouts) from the original EpochManager and the replayed EpochManager.
Note that instead of adding a new command, we repurpose ReplayChain
command, which seems to be doing a similar operation but it was very
primitive before, so we decided to expand it. The new code update the
`BlockInfos` if the new feature is enabled to inject the endorsements
bitmap before calling the EpochManager.

The following are not included and left for future PRs:
- Adding the bitmap to the BlockHeader (currently it always returns None
for the chunk endorsements bitmap).
- Adding/updating unittests to use the new endorsement bitmap.
- Integration tests.
@tayfunelmas
Copy link
Contributor

I did some simulation of mainnet epochs (last 5 epochs) using the easy mode algorithm. Did not change the kickout threshold (80%) or rewards rate. Results are in this document, where diffs are between the original run of the network and the simulated run.

@walnut-the-cat
Copy link
Contributor

walnut-the-cat commented Sep 3, 2024

Aug 30th report

  • Removed BlockHeaderInfo to simplify the future changes (#11971).
  • Fixed the issue that the logic for calculating exempted validators for kickout does not consider chunk endorsement rate (almost never kicks out them). (#11982). Note that this will be packages into the same protocol feature as other changes in this category.
  • Started implementing the part where we add chunk endorsement bitmap to the BlockHeader (previously added to BlockInfo) Mostly test fixes left. (#12024).
  • Identified a plan to recover the diff in validator rewards if we start use chunk endorsement ratio (instead of chunk productio ratio). Will implement this next and identify the optimal values for min/max online ratios and kickout ratios.
    • Detailed proposal and discussion can be found here

github-merge-queue bot pushed a commit that referenced this issue Sep 6, 2024
…kouts (#12048)

We have been using the `neard replay-headers` command to simulate
validator rewards and kickouts computations for past epochs to check the
difference between original and replayed runs.

Submitting the new version of the command we have been using for
simulating the reward/kickout computations for chunk validators in the
context of #11900.

This version no longer assumes that the original and replayed values are
the same and prints the diffs in kickouts and rewards (stakes).
github-merge-queue bot pushed a commit that referenced this issue Sep 9, 2024
Issue: #11900.

This PR introduces a cutoff threshold for chunk endorsement ratio. We
use the same kickout threshold as the cutoff threshold. It is currently
80, but we will make it 70 when stabilizing this feature.

If the endorsement ratio is less than the cutoff ratio, it is treated 0,
otherwise treated 1, when computing the average uptime ratio (including
block and chunk production and endorsement).

For this, we introduce a new struct `ValidatorOnlineThresholds` to
contain `online_min_threshold` and `online_max_threshold` as well as
`endorsement_cutoff_threshold` (initialized to the kickout threshold=70
if feature `ChunkEndorsementsInBlockHeader` is enabled).
`ValidatorOnlineThresholds` is initialized from `EpochConfig`. We pass
this struct to `calculate_rewards`, which then calls
`get_validator_online_ratio` to apply the cutoff if present.

We performed simulation of last 5 epochs in mainnet, results are in
[this
doc](https://docs.google.com/document/d/1GsYl0591CA8MyIjgjJ8_4NTP0cEv48BKSsKihGgROHc/edit).
Only the chunk validators with very low endorsement ratio are kicked out
and do not get any reward.

Testing: We added some unittests for the basic logic. We will later add
integration tests for the full behavior.

---------

Co-authored-by: Aleksandr Logunov <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Sep 12, 2024
…HeaderView and genesis header (#12087)

The test `rpc_hash.py` revealed two problems for the changes previously
done for task #11900.
1) `BlockHeaderView` was not updated to represent `BlockHeaderV5`, so
added chunk endorsements bitmap there (we add the inner vector instead
of exposing `ChunkEndorsementBitmap` struct there).
2) Genesis block header was not updated to represent `BlockHeaderV5`, so
added an empty bitmap to the genesis block header.

Then we updated the test `rpc_hash.py` to check for nightly version
producing `BlockHeaderV5` with endorsements bitmap as well.

TODO: We noticed that there are 3 places that we generate `BlockHeader`
versions using `BlockHeader::new()` function (when producing blocks, for
genesis block, and from view to header conversion). Added a TODO to
consolidate them instead of needing to update separate places.

Also fix test_inflation integration test by updating the validator
reward multiplier.
@walnut-the-cat
Copy link
Contributor

Sept 2-6

  • Discussions on how to make chunk endorsement ratio to contribute to rewards, ending up with a simple algorithms to use a cutoff threshold.
  • Implemented changes for various parts of chunk validator rewards, including adding bitmap to BlockHeaderV5 (#12024), unblock moving feature to nightly (#12043), introducing new endorsement ratio cutoff threshold (#12047), and update tools to run experiments (#12048).
  • Experiments on mainnet historical data with the overall algorithm using received chunk endorsements ratio for deciding on kickout and rewards.
  • Somewhat complex implementation of min/max ratios for endorsement, experimented but gave up after discussions (#12034).

Sept 9-13

  • Moved the validator rewards feature to Nightly by fixing tests (#12065). Fix broken Nayduck tests due to moving feature to Nightly (#12077, #12087).
  • Identified an issue with sorting chunk validators with same uptime ratio and prepared a change to alleviate the problem (#12092).
  • Prepared PR to stabilize the feature for production (#12089).

github-merge-queue bot pushed a commit that referenced this issue Oct 14, 2024
We add a simple integration test for checking that offline nodes are
kicked out properly. We kill a block+chunk producer and a
chunk-validator only node. Assert that these two nodes are kicked out
due to no block and endorsement production and not included in the next
validators.

This is part of the testing tasks for #11900.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-chain Area: Chain, client & related A-stateless-validation Area: stateless validation Near Core T-core Team: issues relevant to the core team
Projects
None yet
Development

No branches or pull requests

3 participants