Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve epoch processing within state transition #1043

Closed
JustinDrake opened this issue May 4, 2019 · 10 comments
Closed

Improve epoch processing within state transition #1043

JustinDrake opened this issue May 4, 2019 · 10 comments
Labels

Comments

@JustinDrake
Copy link
Contributor

JustinDrake commented May 4, 2019

We currently have option A (see #1018):

def state_transition(state: BeaconState, block: BeaconBlock) -> BeaconBlock:
    assert state.slot < block.slot
    while state.slot < block.slot:
        process_slot(state)
        if (state.slot + 1) % SLOTS_PER_EPOCH == 0:
            process_epoch(state)
        state.slot += 1
    process_block(state, block)
    return state

I suggest option B:

def state_transition(state: BeaconState, block: BeaconBlock) -> BeaconBlock:
    assert state.slot < block.slot
    while state.slot < block.slot:
        process_slot(state)
        state.slot += 1
        if state.slot == block.slot:
            process_block(state, block)
        if (state.slot + 1) % SLOTS_PER_EPOCH == 0:
            process_epoch(state)
    return state

Option B seems to be an improvement over option A. In option B the current epoch is naturally processed at the end of the current epoch (as opposed to the start of the next epoch).

Option B is conceptually cleaner because there's no "mixing" of different epochs (which leads to confusion such as #946). It also means that the state root at the end of an epoch is the one corresponding to the end of that epoch's processing, as opposed to postponing epoch processing to the next epoch and getting frankenstein state roots on both sides of the epoch boundary (one is missing an epoch transition, the other mixes processing from different epochs).

This restructure also means that state.slot += 1 can be incorporated within process_slot(state) which reduces moving parts to just process_slot, process_block and process_epoch. cc @dankrad, @djrtwo

@djrtwo
Copy link
Contributor

djrtwo commented May 4, 2019

This was the original off my one error/kludge in attestation voting. Problem here is that an committee/attester of the last slot in an epoch would have to vote on the justification data of the next epoch

@djrtwo
Copy link
Contributor

djrtwo commented May 4, 2019

cc: @arnetheduck

@JustinDrake
Copy link
Contributor Author

JustinDrake commented May 4, 2019

an committee/attester of the last slot in an epoch would have to vote on the justification data of the next epoch

Trying to wrap my head around the problem 😂 By "vote" I'm assuming you are referring to an AttestationData object. By "justification data" I'm assuming you are referring to previous_justified_epoch, current_justified_epoch, previous_justified_root, current_justified_root.

In option B, when process_block is called on the last block, process_epoch has not yet been called. This means that the justification data in state (at the time process_block is called) is still for the current epoch, not the next. Hence the FFG vote (source_epoch, source_root, target_epoch, target_root) within the vote made by an attester of an epoch's last slot is not affected by having process_epoch at the end of that last slot.

@dankrad
Copy link
Contributor

dankrad commented May 4, 2019

This was the original off my one error/kludge in attestation voting. Problem here is that an committee/attester of the last slot in an epoch would have to vote on the justification data of the next epoch

Hmm, I fail to see this -- somehow it feels like semantically the two approaches Justin suggests should be equivalent, it is only a matter of when process_epoch is run?

The only difference would be that the "state" object at SLOTS_PER_EPOCH - 1 (mod epoch) would have the epoch processed or not (this might change what a validator has to do to compute blocks. But as the next block would be in the next epoch already, Option B should still seem like the more natural one?)

Assuming this is correct, I would agree with Justin that Option B seems a bit more natural and elegant, and also makes the interpretation of state.slot clearer.

@djrtwo
Copy link
Contributor

djrtwo commented May 4, 2019

When a validator performs their duty wrt the beacon chain currently, they simply perform the state transition to the post state at the slot they are assigned, pull data from that state, put it in an attestation, and sign it. The semantics here are clear ("attest to the slot" -- perform the state transition to the slot and attest to what the state says) and are currently working very well. To move the epoch transition to the end of the last slot in the epoch rather than the beginning of the first slot in the next epoch, we break this for one slot in each epoch.

There are two options if you do this --

  1. break up the state transition function artificially for this one exceptional slot per epoch. A validator assigned to the last slot in an epoch would need to run the state transition to the slot in question but abort early (prior to the epoch portion) to gather the requisite data. This feels inelegant and like a hack, and it breaks the semantics of "a validator attests to the post state of the assigned slot". In fact, the data they attest to does not correspond to the state root that will exist in historical roots for that slot nor the state root that would be in the block to which that are attesting. This mismatch is ill-advised as the ability to make proofs about validator activity against past state roots becomes clunky requiring exceptional rules for certain slots.
  2. We can go back to the kludge in which the validators assigned to the last slot in an epoch just attest to data (justified roots, etc) from the start of the next epoch. This too feels like a hack. In fact, we used to do this and we had a kludge in place for the justified_root issue, but if my memory serves me correctly, a couple of other issues arose and that was my main reason to change the ordering to the current.

Both of these options is inelegant and require exceptional rules in various places, whereas the current method is clean which no exceptions to the simple semantics. I'm pretty against changing this.

@JustinDrake
Copy link
Contributor Author

Oh boy, there must be a subtle detail I'm missing! 😂

When a validator performs their duty wrt the beacon chain currently, they simply perform the state transition to the post state at the slot they are assigned, pull data from that state, put it in an attestation, and sign it.

Let's be super explicit. Let's assume there is a beacon block at a given slot (i.e. not a skip slot). When you say "pull data from that state" I'm assuming you mean specifically "calculate hash_tree_root(state) and check block.state_root == hash_tree_root(state), where state is the post-state". And by "put it in an attestation", you mean specifically "set attestation_data.beacon_block_root = signed_root(block).

it breaks the semantics of "a validator attests to the post state of the assigned slot"

Assuming the above explicit interpretation is correct, I think validators respect these semantics in option B. Specifically, they are attesting to state where hash_tree_root(state) == block.state_root and signed_root(block) == attestation_data.beacon_block_root. Validators always attest to the post state of their assigned slot—by definition—, including on the last slot.

A validator assigned to the last slot in an epoch would need to run the state transition to the slot in question but abort early (prior to the epoch portion) to gather the requisite data

Why is aborting early necessary? I'd argue we want the state root corresponding to attestation_data.beacon_block_root for the last slot to reflect the epoch transition, and hence be more up-to-date.

@djrtwo
Copy link
Contributor

djrtwo commented May 5, 2019

Oh boy, there must be a subtle detail I'm missing!

I think the detail you're missing is that the following AttestationData fields all correspond to the post-state of the assigned slot. The fields are just broken out separately for better accounting purposes and for more general purpose use across short range forks (eg, target doesn't match, but source does, so can still be included on chain and rewarded partially). I firmly believe that all of these fields should be references from the same BeaconState and that they should be within the assigned epoch.

{
    # LMD GHOST vote
    'beacon_block_root': 'bytes32',

    # FFG vote
    'source_epoch': 'uint64',
    'source_root': 'bytes32',
    'target_epoch': 'uint64',
    'target_root': 'bytes32',
}

(For the last slot in the epoch) If you change to the method you've described then beacon_block_root is that of the full transition through the slot, whereas source and target are whatever they happened to be right before the process_epoch portion of the state transition. In fact, the source and target of the attestation likely do not correspond to the data contained within the state root embedded in the beacon_block_root because that state root is within the next epoch.

@arnetheduck
Copy link
Contributor

Just to play around with this a bit more, I ran the two options through our state sim - to make what @djrtwo is pointing out more concrete, here's a dump of the state at options a and b at slot 132:

options.tar.gz

Slot 132 is interesting because we can see attestations from slots 126, 127 and 128 - right around the epoch boundary. Slots 126 and 127 belong to epoch 1, while 128 is the start of epoch 2, thus in previous_epoch_attestations, we'll see attestations for 126 and 127 while 128 will be in current_epoch_attestations.

Under option A (assuming perfect attestation performance), all attestations in previous_epoch_attestations consider source_epoch to be 0, while under option b, the attestation for slot 127 will already have done epoch processing and will have bumped source_epoch to 1.

The screenshot is from 0.5.1, have fun with GENESIS_SLOT==2**32 :)

Screenshot from 2019-05-06 12-58-28

Another (very minor) difference is the following:

To validate that a block is valid, one must apply it to the state. In option B, we first apply block then do epoch processing meaning that iff the block is invalid, one must roll back the state then apply an empty block, then apply epoch processing to get past the slot with where an invalid block was proposed.

Which is "more" clean is anyone's guess :)

One more possible consideration/convenience would be for all slots of an epoch, should the fields that are touched only by epoch processing stay constant? Under this assumption, option A wins - consider for example balances - under option a, I run the state transition to get to the state for a particular slot and the balances change only iff the epoch changes between state and state+1. Under option b, the last slot of the epoch will see adjusted balances.

How does this affect talking / reasoning about slashings etc? What does it mean that a validator was slashed in epoch 1? should that validator be marked as slashed in all slots of epoch 1 or just the last one? If we say that someone was slashed during epoch 1, with option a that means that we must advance the state to a state that belongs to epoch 2 to see them as slashed, whereas under option a, they'll already be slashed in state 127 (but not 126, both belonging to the same epoch)..

@arnetheduck
Copy link
Contributor

Another consideration is the following: if I call the helper function get_crosslink_committee on state that I received after applying block 127, will that be different from calling get_crosslink_committee for state 126, under option b? that would mean one of two things: the committee changes for the last slot of the epoch, or I must rewrite get_crosslink_committee to consider this quirk, when used outside of the state transition function (because the epoch processing might change active validator indices which cascades into committees) ..

@JustinDrake
Copy link
Contributor Author

JustinDrake commented Aug 20, 2019

Still not groking the tradeoffs but I think this issue can be closed for now, especially post freeze 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants