Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0204: Slashable event verification #204

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

AshwinSekar
Copy link
Contributor

@AshwinSekar AshwinSekar commented Nov 26, 2024

Proposal to deploy a program that records and verifies slashable events on chain

This proposal does not modify any stakes or rewards, the program will only verify and record infractions.

At the moment we only verify duplicate block violations, but a future SIMD will add voting violations as well

@AshwinSekar AshwinSekar changed the title SIMD-0XXX: Slashable event verification SIMD-0204: Slashable event verification Nov 26, 2024
@AshwinSekar AshwinSekar force-pushed the slashing-program branch 4 times, most recently from 2d2ee08 to 339aa9a Compare November 26, 2024 18:27
Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work! Most of my comments are minor nits for clarification


`DuplicateBlockProof` requires 1 account:

0. `proof_account`, expected to be previously intiialized with the proof data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit typo

Suggested change
0. `proof_account`, expected to be previously intiialized with the proof data.
0. `proof_account`, expected to be previously initialized with the proof data.

Comment on lines +78 to +79
The first 4 bytes correspond to the length of the first shred, and the 4 bytes
after that shred correspond to the length of the second shred.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the other fields, it's probably worth specifying that these are unaligned four-byte little-endian unsigned ints

Comment on lines +44 to +46
On the epoch boundary where the `create_slashing_program` feature flag is first
activated the following behavior will be executed in the first block for the new
epoch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth calling out that these steps are the only things that nodes need to implement at the moment, and to have the program details under a separate heading for clarity.

I almost wonder if we should separate this into two SIMDs: one for just the program upgrade portion, and another for the on-chain duplicate block check. I don't feel that strongly though, so let's keep it as is for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said this is the only protocol change, the remaining bits are just how the program works.
It felt weird to have a SIMD with only the upgrading logic, a reader of the SIMD shouldn't give approval without at least reading the program spec.

If we wanted to keep it simpler we could have this SIMD just be the upgrade bit, and the remaining explanation of the program as part of the spl docs.

Comment on lines +52 to +54
2. Verify that the program account
`8sT74BE7sanh4iT84EyVUL8b77cVruLHXGjvTyJ4GwCe` has a verified build hash of
`<FILL IN AFTER IMPLEMENTATION>` [\[1\]](#notes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't typically do this when upgrading a program via feature gate, but I like it a lot! We should definitely start

Comment on lines +56 to +57
3. Copy the contents of `8sT74BE7sanh4iT84EyVUL8b77cVruLHXGjvTyJ4GwCe` into
`S1ashing11111111111111111111111111111111111`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend for S1ashing11111111111111111111111111111111111 to be owned by the upgradeable loader or loader-v4? If it's the former, we should also move the program-data account from the old address to the PDA for S1ashing11111111111111111111111111111111111, to ensure that the program behaves like any other BPF program. cc @buffalojoec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the loaders so i'll defer to your wisdom :), just let me know which one makes the most sense and I can include it here.

Copy link
Contributor

@joncinque joncinque Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the default loader (bpf-loader-upgradeable) for now. We can amend it if loader-v4 becomes the default sooner than this.

the `Clock` sysvar. We want to ensure that these accounts do not get closed before
they are observed by indexers and dashboards.

Otherwise we close the `report_account` and credit the `lamports` to `destination`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be super clear and specify that closing entails setting the owner to the system program and reallocating the account to 0 bytes

- `violation_type` is not `0x00` (corresponds to `DuplicateBlock` violation)
- Deriving the pda using `pubkey`, `slot`, and `ViolationType::DuplicateBlock`
as outlined above does not result in the adddress of `report_account`
- `report_account` is not writeable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this check -- the runtime will do this for us when we try to write to it. On the flip side, we probably want to check that it's owned by the slashing program

Comment on lines 227 to 228
They can accomplish this via the `CloseProofReport` instruction which requires
2 accounts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it needs one more account -- the reporter to sign. But I wonder if we can allow this to be totally permissionless by having the reporter provide the lamport destination when they create the report. That way we're sure slash reports are always cleaned up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that approach, can definitely see people forgetting to close their reports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just send it back to the initial reporter? Or is it worth carrying around a separate address as the destination

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep them separate -- it gives extra flexibility and clarity at the cost of an extra 32 bytes, which shouldn't be too bad, especially considering the accounts will get cleaned up

Comment on lines 237 to 241
- `violation_type`, a one byte value acting as the violation type discriminator
- `slot`, an unaligned eight-byte little-endian unsigned integer indicating the
slot which was reported
- `pubkey`, an unaligned 32 byte array representing the public key of the node
which was reported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these parameters actually needed? I think we'll be OK without them

Comment on lines +251 to +253
- `report_account.epoch + 3` is greater than the current epoch reported from
the `Clock` sysvar. We want to ensure that these accounts do not get closed before
they are observed by indexers and dashboards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other rationale for this number, like association to the leader schedule or time for stake / vote accounts to get penalized? It's ok if there isn't, but it would be good to spell it out.

Comment on lines +122 to +124
- Both shreds specify the same index and shred type, however their payloads
differ
- Both shreds specify the same FEC set, however their merkle roots differ
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be a feature gate to deprecate / drop all shreds except chained merkles? that would simplify this check to just the comparing signature (memcmp 64 bytes vs. 1228) which could add up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we will eventually drop all unchained merkle shreds once it's fully rolled out anza-xyz/agave#2149

Which check are you referring to? Or do you mean that we just compare the signatures instead of the merkle roots? That could work since we already validate the merkle root -> signature as part of the sigverify section.

- Both shreds specify the same FEC set, however their merkle roots differ
- Both shreds specify the same FEC set and are coding shreds, however their
erasure configs conflict
- At least one shred is a coding shred, and its erasure meta indicates an FEC set
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in particular, the shred with the lower FEC set index has to be the coding shred right? and the other can be data or coding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct I'll update the language

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants