Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: introduce epochs #1691

Merged
merged 9 commits into from
Mar 12, 2024
Merged

feat!: introduce epochs #1691

merged 9 commits into from
Mar 12, 2024

Conversation

insumity
Copy link
Contributor

@insumity insumity commented Mar 11, 2024

Description

Closes: everything under #1516 (excluding #1651 - to be completed at a later stage).

Note that the changes in this PR have already been reviewed in previous PRs (#1660, #1668, #1676, #1672, #1684) that introduced those updates to the feat/epochs branch.

This PR introduces epochs. The high-level idea behind the implementation is that we keep a slice of validators ([]ConsumerValidator) that are currently validating in an epoch. Each ConsumerValidator contains the power the validator had at the beginning of the epoch on the consumer chain, as well as the consumer public key the validator had set at the beginning of the epoch. At the boundaries of an epoch, we compute the ValidatorUpdates we need to send to a consumer chain using the DiffValidators function. DiffValidators can compute the updates by checking if the power or/and the consumer public key have been modified since the last epoch.

This PR slightly modified the majority of integration and E2E tests because those tests usually where advancing the provider chain by one block to verify that VSCPackets were being sent. Because now we have epochs, one block advancements are not enough, and we have to advance by an epoch to verify that VSCPackets are being sent. Additionally note that we increased the signed_blocks_window parameter of consumer clients from 15 to 20, while min_signed_per_window remains 0.5, and correspondingly we increased invokeDowntimeSlash to wait 11 (> 20 * 0.5) blocks instead of 10 to guarantee that a slash action would be triggered.

For a reviewer, the main focus should be the code in validator_set_update.go file and specifically the DiffValidators function and the ComputeNextEpochValidators method and how those are used in relay.go. Also, another important component is on how the consumer genesis is created in proposal.go.

This PR also deprecates any KeyAssignmentReplacement-related code.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • Included the correct type prefix in the PR title
  • Added ! to the type prefix if the change is state-machine breaking
  • Confirmed this PR does not introduce changes requiring state migrations, OR migration code has been added to consumer and/or provider modules
  • Targeted the correct branch (see PR Targeting)
  • Provided a link to the relevant issue or specification
  • Followed the guidelines for building SDK modules
  • Included the necessary unit and integration tests
  • Added a changelog entry to CHANGELOG.md
  • Included comments for documenting Go code
  • Updated the relevant documentation or specification
  • Reviewed "Files changed" and left comments if necessary
  • Confirmed all CI checks have passed
  • If this PR is library API breaking, bump the go.mod version string of the repo, and follow through on a new major release

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! the type prefix if the change is state-machine breaking
  • confirmed this PR does not introduce changes requiring state migrations, OR confirmed migration code has been added to consumer and/or provider modules
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage

if err != nil {
// this should never happen and might not be recoverable because without the public key
// we cannot generate a validator update
panic(fmt.Errorf("could not retrieve validator's (%+v) public key: %w", val, err))

Check warning

Code scanning / CodeQL

Panic in BeginBock or EndBlock consensus methods Warning

Possible panics in BeginBock- or EndBlock-related consensus methods could cause a chain halt
store := ctx.KVStore(k.storeKey)
bz, err := validator.Marshal()
if err != nil {
panic(fmt.Errorf("failed to marshal ConsumerValidator: %w", err))

Check warning

Code scanning / CodeQL

Panic in BeginBock or EndBlock consensus methods Warning

Possible panics in BeginBock- or EndBlock-related consensus methods could cause a chain halt
iterator.Value()
var validator types.ConsumerValidator
if err := validator.Unmarshal(iterator.Value()); err != nil {
panic(fmt.Errorf("failed to unmarshal ConsumerValidator: %w", err))

Check warning

Code scanning / CodeQL

Panic in BeginBock or EndBlock consensus methods Warning

Possible panics in BeginBock- or EndBlock-related consensus methods could cause a chain halt
@github-actions github-actions bot added C:Testing Assigned automatically by the PR labeler C:x/consumer Assigned automatically by the PR labeler C:x/provider Assigned automatically by the PR labeler C:Build Assigned automatically by the PR labeler C:Docs Assigned automatically by the PR labeler C:ADR Assigned automatically by the PR labeler labels Mar 11, 2024
@github-actions github-actions bot added the C:CI Assigned automatically by the PR labeler label Mar 11, 2024
insumity and others added 4 commits March 11, 2024 13:35
* modified ADR to capture the epoch design
* cleanup ./changelog entries

* rebase

* fix!: Validation of SlashAcks fails due to marshaling to Bech32  (backport #1570) (#1577)

fix!: Validation of SlashAcks fails due to marshaling to Bech32  (#1570)

* add different Bech32Prefix for consumer and provider

* separate app encoding and params

* remove ConsumerValPubKey from ValidatorConfig

* update addresses in tests

* make SlashAcks consistent across chains

* add comments for clarity

* Regenerate traces

* Fix argument order

* set bech32prefix for provider to cosmos

* add changelog entries

* add consumer-double-downtime e2e test

* update nightly-e2e workflow

* fix typo

* add consumer-double-downtime to testConfigs

* remove changes on provider

* skip invalid SlashAcks

* seal the config

* clear the outstanding downtime flag for new vals

* add info on upgrading to v4.0.0

* fix upgrade handler

* fix changeover e2e test

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* add AccountPrefix to ChainConfig

* fix docstrings

* update AccountAddressPrefix in app.go

* fix consumer-misb e2e test

---------

Co-authored-by: Philip Offtermatt <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
(cherry picked from commit 8604692)

Co-authored-by: Marius Poke <[email protected]>

* docs: update changelog for v4.0.0 (#1578)

update changelog

* docs: prepare for v4.0.0 (#1581)

* unclog build

* update release notes

* update release date

* added proto declaration

* temp commit

* temp commit

* more changes

* first commit

* add param and fix tests

* reduce epoch size for e2e

* clean up

* mbt fix

* fix diff bug

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* added more tests

* more fixes

* nit fixes

* cleaning up

* increase downtime by one block

* fix logs

* took into account Marius' comments

* tiny fixes

* Update x/ccv/provider/keeper/params.go

Co-authored-by: Simon Noetzlin <[email protected]>

* use Bech32 addresses as keys for maps

* refactor nextBlocks(epoch) to nextEpoch

* fixed comment

* Remove new block creation during consumer chain setup

* Revert "Remove new block creation during consumer chain setup"

This reverts commit 85a52b7.

* added simple param test

* added upper bound and addressed a comment

* Add another edge case for diffing

* used smarted solution (based on Philip's comment) for diffing validators

* refactor!: remove key-assignment replacements (#1672)

* initial commit

* removed KeyAssignmentReplacementsKey

* refactor: simplify key-assignment logic (#1684)

* fixed typo: depreciated to deprecated

---------

Co-authored-by: Marius Poke <[email protected]>

* add the epoch param in the docs

---------

Co-authored-by: mpoke <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
* cleanup ./changelog entries

* rebase

* fix!: Validation of SlashAcks fails due to marshaling to Bech32  (backport #1570) (#1577)

fix!: Validation of SlashAcks fails due to marshaling to Bech32  (#1570)

* add different Bech32Prefix for consumer and provider

* separate app encoding and params

* remove ConsumerValPubKey from ValidatorConfig

* update addresses in tests

* make SlashAcks consistent across chains

* add comments for clarity

* Regenerate traces

* Fix argument order

* set bech32prefix for provider to cosmos

* add changelog entries

* add consumer-double-downtime e2e test

* update nightly-e2e workflow

* fix typo

* add consumer-double-downtime to testConfigs

* remove changes on provider

* skip invalid SlashAcks

* seal the config

* clear the outstanding downtime flag for new vals

* add info on upgrading to v4.0.0

* fix upgrade handler

* fix changeover e2e test

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* add AccountPrefix to ChainConfig

* fix docstrings

* update AccountAddressPrefix in app.go

* fix consumer-misb e2e test

---------

Co-authored-by: Philip Offtermatt <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
(cherry picked from commit 8604692)

Co-authored-by: Marius Poke <[email protected]>

* docs: update changelog for v4.0.0 (#1578)

update changelog

* docs: prepare for v4.0.0 (#1581)

* unclog build

* update release notes

* update release date

* added proto declaration

* temp commit

* temp commit

* more changes

* first commit

* add param and fix tests

* reduce epoch size for e2e

* clean up

* mbt fix

* fix diff bug

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* added more tests

* more fixes

* nit fixes

* cleaning up

* increase downtime by one block

* fix logs

* took into account Marius' comments

* tiny fixes

* Update x/ccv/provider/keeper/params.go

Co-authored-by: Simon Noetzlin <[email protected]>

* use Bech32 addresses as keys for maps

* refactor nextBlocks(epoch) to nextEpoch

* Start adding epochs

* Adjust tests for epochs

* Use invariant script instead of handwriting Makefile

* Fix key assignment valset invariant

* Add better run_invariants script

* Start adding epochs from trace into driver

* Remove new block creation during consumer chain setup

* Adjust model for epochs

* Take into account comments

* Revert changes to actions.go

* Revert changes to x/

* Remove unused listMul

* Advance time by epochLength instead of 1 second

* Indent condition and clarify EndProviderEpoch

---------

Co-authored-by: mpoke <[email protected]>
Co-authored-by: insumity <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Simon Noetzlin <[email protected]>
@@ -465,7 +466,7 @@ func CompatibilityTestConfig(providerVersion, consumerVersion string) TestConfig
".app_state.provider.params.slash_meter_replenish_fraction = \"1.0\" | " + // This disables slash packet throttling
".app_state.provider.params.slash_meter_replenish_period = \"3s\"",
}
} else if semver.Compare(providerVersion, "v4.0.0") < 0 {
} else if semver.Compare(providerVersion, "v4.0.0") <= 0 {
Copy link
Contributor Author

@insumity insumity Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed from < to <= because otherwise the compatibility test for v4.0.0 would get the default test configuration that contains the blocks_per_epoch parameter and fail with:

startChain: Error: failed to validate genesis state: failed to unmarshal provider genesis state: unknown field "blocks_per_epoch" in types.Params

@insumity insumity marked this pull request as ready for review March 11, 2024 13:21
@insumity insumity requested a review from a team as a code owner March 11, 2024 13:21
@insumity insumity marked this pull request as draft March 11, 2024 13:24
@insumity insumity marked this pull request as ready for review March 11, 2024 14:20
Copy link
Contributor

@p-offtermatt p-offtermatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, and one concern: Wasn't there a plan to get rid of the maximum number for BlocksPerEpoch? Is there a reason to leave the (currently very conservative) upper limit?

docs/docs/adrs/adr-014-epochs.md Outdated Show resolved Hide resolved

### BlocksPerEpoch
`BlocksPerEpoch` exists on the provider for **ICS versions >= 3.3.0** (introduced by the implementation of [ADR-014](../adrs/adr-014-epochs.md))
and corresponds to the number of blocks that constitute an epoch. This param is set to 600 by default and cannot exceed 1200.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't there a plan to get rid of the maximum/make it much larger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of keeping this upper limit. It just adds an additional check that in the best case will never be used and in the worst case will prevent from a potential bogus parameter.
Also, I do not believe that it is a very conservative upper limit. If the time we need for a block to be committed drops to something less than 6 seconds, then nothing bad will happen besides sending a greater number of VSCPackets. However, if the time to commit blocks increases (e.g., to 1 minute) then 1200 blocks correspond to 20 hours. If we add to this a potential multi-hour upgrade delay, then we might start having "unbonding pausing" issues with Neutron that has an unbonding period of 20 days.

We could get rid of this upper limit if we also looked at the BFT times when deciding when to send a VSCPacket, but this is not something we currently do, so I feel keeping an upper limit here is justified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it seems silly to me that, if for some reason we can't think of yet governance wants the parameter greater than 1200 blocks, there would first have to be an upgrade to Gaia itself before it could be done. keep in mind that for any param changes, there will need to be a vote anyways. I would recommend just giving clear guidance on appropriate values for the param in the docs instead of having the cap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just the removed the upper limit.

docs/docs/adrs/adr-014-epochs.md Outdated Show resolved Hide resolved
tests/integration/expired_client.go Show resolved Hide resolved
Copy link
Contributor

@sainoe sainoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@insumity insumity added this pull request to the merge queue Mar 12, 2024
Merged via the queue into main with commit 5ce20ef Mar 12, 2024
32 checks passed
insumity added a commit that referenced this pull request Mar 12, 2024
* docs: modify epochs ADR to capture latest design (#1668)

* modified ADR to capture the epoch design

* feat!: introduce epochs (#1660)

* cleanup ./changelog entries

* rebase

* fix!: Validation of SlashAcks fails due to marshaling to Bech32  (backport #1570) (#1577)

fix!: Validation of SlashAcks fails due to marshaling to Bech32  (#1570)

* add different Bech32Prefix for consumer and provider

* separate app encoding and params

* remove ConsumerValPubKey from ValidatorConfig

* update addresses in tests

* make SlashAcks consistent across chains

* add comments for clarity

* Regenerate traces

* Fix argument order

* set bech32prefix for provider to cosmos

* add changelog entries

* add consumer-double-downtime e2e test

* update nightly-e2e workflow

* fix typo

* add consumer-double-downtime to testConfigs

* remove changes on provider

* skip invalid SlashAcks

* seal the config

* clear the outstanding downtime flag for new vals

* add info on upgrading to v4.0.0

* fix upgrade handler

* fix changeover e2e test

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* add AccountPrefix to ChainConfig

* fix docstrings

* update AccountAddressPrefix in app.go

* fix consumer-misb e2e test

---------

Co-authored-by: Philip Offtermatt <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
(cherry picked from commit 8604692)

Co-authored-by: Marius Poke <[email protected]>

* docs: update changelog for v4.0.0 (#1578)

update changelog

* docs: prepare for v4.0.0 (#1581)

* unclog build

* update release notes

* update release date

* added proto declaration

* temp commit

* temp commit

* more changes

* first commit

* add param and fix tests

* reduce epoch size for e2e

* clean up

* mbt fix

* fix diff bug

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* added more tests

* more fixes

* nit fixes

* cleaning up

* increase downtime by one block

* fix logs

* took into account Marius' comments

* tiny fixes

* Update x/ccv/provider/keeper/params.go

Co-authored-by: Simon Noetzlin <[email protected]>

* use Bech32 addresses as keys for maps

* refactor nextBlocks(epoch) to nextEpoch

* fixed comment

* Remove new block creation during consumer chain setup

* Revert "Remove new block creation during consumer chain setup"

This reverts commit 85a52b7.

* added simple param test

* added upper bound and addressed a comment

* Add another edge case for diffing

* used smarted solution (based on Philip's comment) for diffing validators

* refactor!: remove key-assignment replacements (#1672)

* initial commit

* removed KeyAssignmentReplacementsKey

* refactor: simplify key-assignment logic (#1684)

* fixed typo: depreciated to deprecated

---------

Co-authored-by: Marius Poke <[email protected]>

* add the epoch param in the docs

---------

Co-authored-by: mpoke <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>

* test: Add epochs to MBT (#1676)

* cleanup ./changelog entries

* rebase

* fix!: Validation of SlashAcks fails due to marshaling to Bech32  (backport #1570) (#1577)

fix!: Validation of SlashAcks fails due to marshaling to Bech32  (#1570)

* add different Bech32Prefix for consumer and provider

* separate app encoding and params

* remove ConsumerValPubKey from ValidatorConfig

* update addresses in tests

* make SlashAcks consistent across chains

* add comments for clarity

* Regenerate traces

* Fix argument order

* set bech32prefix for provider to cosmos

* add changelog entries

* add consumer-double-downtime e2e test

* update nightly-e2e workflow

* fix typo

* add consumer-double-downtime to testConfigs

* remove changes on provider

* skip invalid SlashAcks

* seal the config

* clear the outstanding downtime flag for new vals

* add info on upgrading to v4.0.0

* fix upgrade handler

* fix changeover e2e test

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* Update tests/e2e/config.go

Co-authored-by: Philip Offtermatt <[email protected]>

* add AccountPrefix to ChainConfig

* fix docstrings

* update AccountAddressPrefix in app.go

* fix consumer-misb e2e test

---------

Co-authored-by: Philip Offtermatt <[email protected]>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
(cherry picked from commit 8604692)

Co-authored-by: Marius Poke <[email protected]>

* docs: update changelog for v4.0.0 (#1578)

update changelog

* docs: prepare for v4.0.0 (#1581)

* unclog build

* update release notes

* update release date

* added proto declaration

* temp commit

* temp commit

* more changes

* first commit

* add param and fix tests

* reduce epoch size for e2e

* clean up

* mbt fix

* fix diff bug

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* cleaning up

* added more tests

* more fixes

* nit fixes

* cleaning up

* increase downtime by one block

* fix logs

* took into account Marius' comments

* tiny fixes

* Update x/ccv/provider/keeper/params.go

Co-authored-by: Simon Noetzlin <[email protected]>

* use Bech32 addresses as keys for maps

* refactor nextBlocks(epoch) to nextEpoch

* Start adding epochs

* Adjust tests for epochs

* Use invariant script instead of handwriting Makefile

* Fix key assignment valset invariant

* Add better run_invariants script

* Start adding epochs from trace into driver

* Remove new block creation during consumer chain setup

* Adjust model for epochs

* Take into account comments

* Revert changes to actions.go

* Revert changes to x/

* Remove unused listMul

* Advance time by epochLength instead of 1 second

* Indent condition and clarify EndProviderEpoch

---------

Co-authored-by: mpoke <[email protected]>
Co-authored-by: insumity <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Simon Noetzlin <[email protected]>

* added changelogs

* rebase and fix compatibility test

* Update docs/docs/adrs/adr-014-epochs.md

Co-authored-by: Philip Offtermatt <[email protected]>

* Update docs/docs/adrs/adr-014-epochs.md

Co-authored-by: Philip Offtermatt <[email protected]>

* nit change in test

* removed blocks per epoch upper limit

---------

Co-authored-by: mpoke <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Simon Noetzlin <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
Co-authored-by: Philip Offtermatt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:ADR Assigned automatically by the PR labeler C:Build Assigned automatically by the PR labeler C:CI Assigned automatically by the PR labeler C:Docs Assigned automatically by the PR labeler C:Testing Assigned automatically by the PR labeler C:x/consumer Assigned automatically by the PR labeler C:x/provider Assigned automatically by the PR labeler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants