-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reconfiguration with dkg #10328
reconfiguration with dkg #10328
Conversation
@@ -365,7 +365,7 @@ module aptos_framework::genesis { | |||
validator.consensus_pubkey, | |||
validator.proof_of_possession, | |||
); | |||
stake::update_network_and_fullnode_addresses( | |||
stake::force_update_network_and_fullnode_addresses( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this for genesis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some smoke tests set this flag (disable validator set change) which made genesis failed...
Alternatives discussedWe need 2 modes of reconfiguration.
The key behavior changes to implement.
This PR starts with the last bullet item. |
aptos-move/framework/aptos-framework/sources/configs/consensus_config.move
Show resolved
Hide resolved
writer, | ||
"aptos_governance::reconfigure(&framework_signer);" | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this approach consensus_config::set
no longer calls reconfigure
, I think the proposal script should call reconfigure
explicitly. (Same below for some other configs)
@zekun000 @movekevin @sherry-x could you confirm if these release builder updates are also needed?
gas_schedule::on_new_epoch(account); | ||
std::version::on_new_epoch(account); | ||
features::on_new_epoch(account); | ||
// TODO: complete the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list should contain on-chain states that may be accessed in rust directly, and here are what I have identified. Anything I missed?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
system_addresses::assert_aptos_framework(account); | ||
assert!(vector::length(&config) > 0, error::invalid_argument(EINVALID_CONFIG)); | ||
std::config_for_next_epoch::upsert<ConsensusConfig>(account, ConsensusConfig {config}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, consensus_config::set()
calls reconfiguration::reconfigure()
.
But in our new design, consensus_config::set(X)
writes X
to a buffer; then at the end of DKG, reconfiguration calls a new func consensus_config::on_new_epoch()
to apply X
.
This means that we can't control consensus_config::set()
behavior with a feature flag, otherwise compile fails with a circular dependency.
But I think if we also update aptos-release-builder
accordingly to append a aptos_governance::reconfigure()
in the proposal script, things sound be fine?
This comment has been minimized.
This comment has been minimized.
let features = config_for_next_epoch::extract<Features>(); | ||
*borrow_global_mut<Features>(@std) = features; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::features::Features
is a critical on-chain state that i think has to have a pending change buffer (and therefore a on_new_epoch
function).
What's special is, it's defined in move-stdlib
package (while reconfiguration.move
and all the other configs I identified so far are in move-framework
package). for all other configs' on_new_epoch()
, we can protect it by setting scope to public(friend)
and make reconfiguration.move
a friend.
But for Features, this protection is not available.
The workaround I'm thinking.
- Leave this function
public
. - Introduce a flag struct
ExtractPermit
. - Any system txn that may invoke
reconfigure
should put0x1::ExtractPermit
before the invocation, and remove it after. - Inside
config_for_next_epoch::extract
, do the work only if the0x1::ExtractPermit
exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential alternative:
- Have a clone of
reconfigure()
that takes a signer:reconfigure_v2(account: &signer)
. - Update all reconfigure callers to use
reconfigure_v2
. - Have
features::on_new_epoch(account: &signer)
and assert the signer is 0x0/0x1.
struct UpsertLock has copy, drop, key { | ||
seq_num: u64, | ||
locked: bool, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from the requested behavior: when a slow reconfigure is in progress, on-chain config changes should be rejected.
Ideally we define a struct T
to represent the lock and put a 0x1::T
when starting DKG and remove it when finishing. But reconfigure can be triggered by either 0x1 (proposal) or 0x0 (epoch expiry), and 0x0 can't put things under 0x1. 2 solutions i can imagine.
- Define
struct SomeLock { locked: bool }
; run a initialization script to put0x1::SomeLock
. Later both 0x0 and 0x1 can read/write thelocked
field. - (implemented currently in the PR) Have 0x0 and 0x1 maintain a struct under their own address. Each maintain a
seqnum
so we know who is the last write. When reading the lock state, read from the last write.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
❌ Forge suite
|
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
This issue is stale because it has been open 45 days with no activity. Remove the |
Context
To generate on-chain randomness, the current validators will together run a DKG protocol right before starting a new epoch. The DKG protocol needs the exact validator set (address + voting power) to be determined for the new epoch, and can take a while. During a DKG, care must be taken for validator set changes and reconfiguration requests.
Already done
None.
In this PR
RECONFIGURE_WITH_DKG
.ConsensusConfig
ExecutionConfig
GasScheduleV2
Features
Version
NewEpochEvent
.Next steps
aptos-release-builder
to slow reconfiguration and on-chain config behavior change.