Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime node slashing (ADR 0005) #3640

Merged
merged 3 commits into from
Feb 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .changelog/3640.breaking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
go/roothash: Support slashing runtime compute nodes

Runtimes can configure whether compute nodes should be slashed for submitting
incorrect results or equivocating.
14 changes: 11 additions & 3 deletions docs/adr/0005-runtime-compute-slashing.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ type RuntimeStakingParameters struct {

// Slashing are the per-runtime misbehavior slashing parameters.
Slashing map[staking.SlashReason]staking.Slash `json:"slashing,omitempty"`

// RewardSlashEquvocationRuntimePercent is the percentage of the reward obtained when slashing
// for equivocation that is transferred to the runtime's account.
RewardSlashEquvocationRuntimePercent uint8 `json:"reward_equivocation,omitempty"`

// RewardSlashBadResultsRuntimePercent is the percentage of the reward obtained when slashing
// for incorrect results that is transferred to the runtime's account.
RewardSlashBadResultsRuntimePercent uint8 `json:"reward_bad_results,omitempty"`
}
```

Expand Down Expand Up @@ -175,13 +183,13 @@ Any slashing instructions related to freezing nodes are currently ignored.
This proposal introduces/updates the following consensus state in the roothash
module:

- **List of past valid evidence (`0x23`)**
- **List of past valid evidence (`0x24`)**
kostko marked this conversation as resolved.
Show resolved Hide resolved

A hash uniquely identifying the evidence is stored for each successfully
processed evidence that has not yet expired using the following key format:

```
0x23 <H(runtime-id) (hash.Hash)> <round (uint64)> <evidence-hash (hash.Hash)>
0x24 <H(runtime-id) (hash.Hash)> <round (uint64)> <evidence-hash (hash.Hash)>
```

The value is empty as we only need to detect duplicate evidence.
Expand Down Expand Up @@ -293,7 +301,7 @@ performed to verify evidence validity:
- Public keys of signers of both commitments are compared. If they are not the
same, the evidence is invalid.

- Signatures of both proposed batches are compared. If either is invalid, the
- Signatures of both proposed batches are validated. If either is invalid, the
evidence is invalid.

- Otherwise the evidence is valid.
Expand Down
74 changes: 74 additions & 0 deletions go/consensus/tendermint/apps/roothash/roothash.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
beacon "github.com/oasisprotocol/oasis-core/go/beacon/api"
"github.com/oasisprotocol/oasis-core/go/common"
"github.com/oasisprotocol/oasis-core/go/common/cbor"
"github.com/oasisprotocol/oasis-core/go/common/crypto/signature"
"github.com/oasisprotocol/oasis-core/go/common/logging"
"github.com/oasisprotocol/oasis-core/go/consensus/api/transaction"
tmapi "github.com/oasisprotocol/oasis-core/go/consensus/tendermint/api"
Expand Down Expand Up @@ -132,6 +133,20 @@ func (app *rootHashApplication) onCommitteeChanged(ctx *tmapi.Context, state *ro
return fmt.Errorf("failed to fetch runtime state: %w", err)
}

// Expire past evidence of runtime node misbehaviour.
if rtState.CurrentBlock != nil {
if round := rtState.CurrentBlock.Header.Round; round > params.MaxEvidenceAge {
ctx.Logger().Debug("removing expired runtime evidence",
"runtime", rt.ID,
"round", round,
"max_evidence_age", params.MaxEvidenceAge,
)
if err = state.RemoveExpiredEvidence(ctx, rt.ID, round-params.MaxEvidenceAge); err != nil {
return fmt.Errorf("failed to remove expired runtime evidence: %s %w", rt.ID, err)
}
}
}

// Since the runtime is in the list of active runtimes in the registry we
// can safely clear the suspended flag.
rtState.Suspended = false
Expand Down Expand Up @@ -359,6 +374,13 @@ func (app *rootHashApplication) ExecuteTx(ctx *tmapi.Context, tx *transaction.Tr
}

return app.executorProposerTimeout(ctx, state, &xc)
case roothash.MethodEvidence:
var ev roothash.Evidence
if err := cbor.Unmarshal(tx.Body, &ev); err != nil {
return err
}

return app.submitEvidence(ctx, state, &ev)
default:
return roothash.ErrInvalidArgument
}
Expand Down Expand Up @@ -527,6 +549,58 @@ func (app *rootHashApplication) tryFinalizeExecutorCommits(
return fmt.Errorf("failed to process runtime messages: %w", err)
}

// If there was a discrepancy, slash nodes for incorrect results if configured.
if rtState.ExecutorPool.Discrepancy {
ctx.Logger().Debug("executor pool discrepancy",
"slashing", runtime.Staking.Slashing,
)
if penalty, ok := rtState.Runtime.Staking.Slashing[staking.SlashRuntimeIncorrectResults]; ok && !penalty.Amount.IsZero() {
commitments := rtState.ExecutorPool.ExecuteCommitments
var (
// Worker nodes that submitted incorrect results get slashed.
workersIncorrectResults []signature.PublicKey
// Backup worker nodes that resolved the discrepancy get rewarded.
backupCorrectResults []signature.PublicKey
)
for _, n := range rtState.ExecutorPool.Committee.Members {
c, ok := commitments[n.PublicKey]
if !ok || c.IsIndicatingFailure() {
continue
}
kostko marked this conversation as resolved.
Show resolved Hide resolved
switch n.Role {
case scheduler.RoleBackupWorker:
// Backup workers that resolved the discrepancy.
if !commit.MostlyEqual(c) {
continue
}
ctx.Logger().Debug("backup worker resolved the discrepancy",
"pubkey", n.PublicKey,
)
backupCorrectResults = append(backupCorrectResults, n.PublicKey)
case scheduler.RoleWorker:
// Workers that caused the discrepancy.
if commit.MostlyEqual(c) {
continue
}
ctx.Logger().Debug("worker caused the discrepancy",
"pubkey", n.PublicKey,
)
workersIncorrectResults = append(workersIncorrectResults, n.PublicKey)
}
}
// Slash for incorrect results.
if err = onRuntimeIncorrectResults(
ctx,
workersIncorrectResults,
backupCorrectResults,
runtime,
&penalty.Amount,
); err != nil {
return fmt.Errorf("failed to slash for incorrect results: %w", err)
}
}
}

// Generate the final block.
blk := block.NewEmptyBlock(rtState.CurrentBlock, uint64(ctx.Now().Unix()), block.Normal)
blk.Header.IORoot = *hdr.IORoot
Expand Down
204 changes: 204 additions & 0 deletions go/consensus/tendermint/apps/roothash/slashing.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
package roothash

import (
"fmt"

"github.com/oasisprotocol/oasis-core/go/common"
"github.com/oasisprotocol/oasis-core/go/common/crypto/signature"
"github.com/oasisprotocol/oasis-core/go/common/quantity"
abciAPI "github.com/oasisprotocol/oasis-core/go/consensus/tendermint/api"
registryState "github.com/oasisprotocol/oasis-core/go/consensus/tendermint/apps/registry/state"
stakingState "github.com/oasisprotocol/oasis-core/go/consensus/tendermint/apps/staking/state"
registry "github.com/oasisprotocol/oasis-core/go/registry/api"
roothash "github.com/oasisprotocol/oasis-core/go/roothash/api"
staking "github.com/oasisprotocol/oasis-core/go/staking/api"
)

func onEvidenceRuntimeEquivocation(
ctx *abciAPI.Context,
pk signature.PublicKey,
runtime *registry.Runtime,
penaltyAmount *quantity.Quantity,
) error {
regState := registryState.NewMutableState(ctx.State())
stakeState := stakingState.NewMutableState(ctx.State())

node, err := regState.Node(ctx, pk)
if err != nil {
// Node might not exist anymore (old evidence). Or the submitted evidence
// could be for a non-existing node (submitting "fake" but valid evidence).
ctx.Logger().Error("failed to get runtime node by signature public key",
"public_key", pk,
"err", err,
)
return fmt.Errorf("tendermint/roothash: failed to get node by id %s: %w", pk, roothash.ErrInvalidEvidence)
}

// Slash runtime node entity.
entityAddr := staking.NewAddress(node.EntityID)
totalSlashed, err := stakeState.SlashEscrow(ctx, entityAddr, penaltyAmount)
if err != nil {
return fmt.Errorf("tendermint/roothash: error slashing account %s: %w", entityAddr, err)
}
// Since evidence can be submitted for past rounds, the node can be out of stake.
if totalSlashed.IsZero() {
ctx.Logger().Warn("nothing to slash from entity for runtime equivocation",
"penalty", penaltyAmount,
"addr", entityAddr,
)
return nil
}

// If the caller is a node, distribute slashed funds to the controlling entity instead of the
// caller directly.
rewardAddr := ctx.CallerAddress()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if people frontrun transactions like these, to take the reward for themselves

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a valid concern. Incentivizing evidence submission seems like it introduces more problems than it solves :-)

callerNode, err := regState.Node(ctx, ctx.TxSigner())
switch err {
case nil:
// Caller is a node, replace reward address with its controlling entity.
rewardAddr = staking.NewAddress(callerNode.EntityID)
case registry.ErrNoSuchNode:
// Not a node, reward the caller directly.
default:
return fmt.Errorf("tendermint/roothash: failed to lookup node: %w", err)
}

// Distribute slashed funds to runtime and caller.
runtimePercentage := uint64(runtime.Staking.RewardSlashEquvocationRuntimePercent)
return distributeSlashedFunds(ctx, totalSlashed, runtimePercentage, runtime.ID, []staking.Address{rewardAddr})
}

func onRuntimeIncorrectResults(
ctx *abciAPI.Context,
discrepancyCausers []signature.PublicKey,
discrepancyResolvers []signature.PublicKey,
runtime *registry.Runtime,
penaltyAmount *quantity.Quantity,
) error {
regState := registryState.NewMutableState(ctx.State())
stakeState := stakingState.NewMutableState(ctx.State())

var totalSlashed quantity.Quantity
for _, pk := range discrepancyCausers {
// Lookup the node entity.
node, err := regState.Node(ctx, pk)
if err == registry.ErrNoSuchNode {
ctx.Logger().Error("runtime node not found by commitment signature public key",
"public_key", pk,
)
continue
}
if err != nil {
ctx.Logger().Error("failed to get runtime node by commitment signature public key",
"public_key", pk,
"err", err,
)
return fmt.Errorf("tendermint/roothash: getting node %s: %w", pk, err)
}
entityAddr := staking.NewAddress(node.EntityID)

// Slash entity.
slashed, err := stakeState.SlashEscrow(ctx, entityAddr, penaltyAmount)
if err != nil {
return fmt.Errorf("tendermint/roothash: error slashing account %s: %w", entityAddr, err)
}
if err = totalSlashed.Add(slashed); err != nil {
return fmt.Errorf("tendermint/roothash: totalSlashed.Add(slashed): %w", err)
}
ctx.Logger().Debug("runtime node entity slashed for incorrect results",
"slashed", slashed,
"total_slashed", totalSlashed,
"addr", entityAddr,
)
}

// It can happen that nothing was slashed as nodes could be out of stake.
// A node can be out of stake as stake claims are only checked on epoch transitions
// and a node can be slashed multiple times per epoch.
// This should not fail the round, as otherwise a single node without stake could
// cause round failures until it is removed from the committee (on the next epoch transition).
if totalSlashed.IsZero() {
// Nothing more to do in this case.
return nil
}

// Determine who the backup workers' entities to reward are.
var rewardEntities []staking.Address
for _, pk := range discrepancyResolvers {
node, err := regState.Node(ctx, pk)
if err == registry.ErrNoSuchNode {
ctx.Logger().Error("runtime node not found by commitment signature public key",
"public_key", pk,
)
continue
}
if err != nil {
ctx.Logger().Error("failed to get runtime node by commitment signature public key",
"public_key", pk,
"err", err,
)
return fmt.Errorf("tendermint/roothash: getting node %s: %w", pk, err)
}
rewardEntities = append(rewardEntities, staking.NewAddress(node.EntityID))
}

// Distribute slashed funds to runtime and backup workers' entities.
runtimePercentage := uint64(runtime.Staking.RewardSlashBadResultsRuntimePercent)
return distributeSlashedFunds(ctx, &totalSlashed, runtimePercentage, runtime.ID, rewardEntities)
}

func distributeSlashedFunds(
ctx *abciAPI.Context,
totalSlashed *quantity.Quantity,
runtimePercentage uint64,
runtimeID common.Namespace,
otherAddresses []staking.Address,
) error {
stakeState := stakingState.NewMutableState(ctx.State())

// Runtime account reward.
runtimeAccReward := totalSlashed.Clone()
if err := runtimeAccReward.Mul(quantity.NewFromUint64(runtimePercentage)); err != nil {
return fmt.Errorf("tendermint/roothash: runtimeAccReward.Mul: %w", err)
}
if err := runtimeAccReward.Quo(quantity.NewFromUint64(uint64(100))); err != nil {
return fmt.Errorf("tendermint/roothash: runtimeAccReward.Quo(100): %w", err)
}
runtimeAddr := staking.NewRuntimeAddress(runtimeID)
if _, err := stakeState.TransferFromCommon(ctx, runtimeAddr, runtimeAccReward, false); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently these rewards ignore the commission schedule and the whole reward is split between all delegators of the entity, is this fine or should the owning entity be able to take commission?

Copy link
Member Author

@ptrus ptrus Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note: this comment should be in line 193, here we transfer to the runtime account anyway)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually currently the delegators get nothing, it is treated the same as if the entity just increased its self-delegation. Do we want the delegators to get a cut as well?

Copy link
Member Author

@ptrus ptrus Feb 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right yeah i miss-checked that. I think that probably yes? (since if the node is slashed, delegators do lose proportionately). But i'm not sure if it's currently worth the additional complexity. Maybe a good future improvement. I would at least add some notes that runtime slashing rewards are not rewarded to delegators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I changed the code to take all delegators and commission into account.

return fmt.Errorf("tendermint/roothash: failed transferring reward to %s: %w", runtimeAddr, err)
}
ctx.Logger().Debug("runtime account awarded slashed funds",
"reward", runtimeAccReward,
"total_slashed", totalSlashed,
"runtime_addr", runtimeAddr,
)

if len(otherAddresses) == 0 {
// Nothing more to do.
ctx.Logger().Debug("no other accounts to reward")
return nil
}

// (totalSlashed - runtimeAccReward) / n_reward_entities
otherReward := totalSlashed.Clone()
if err := otherReward.Sub(runtimeAccReward); err != nil {
return fmt.Errorf("tendermint/roothash: remainingReward.Sub(runtimeAccReward): %w", err)
}
if err := otherReward.Quo(quantity.NewFromUint64(uint64(len(otherAddresses)))); err != nil {
return fmt.Errorf("tendermint/roothash: remainingReward.Quo(len(discrepancyResolvers)): %w", err)
}

for _, addr := range otherAddresses {
if _, err := stakeState.TransferFromCommon(ctx, addr, otherReward, true); err != nil {
return fmt.Errorf("tendermint/roothash: failed transferring reward to %s: %w", addr, err)
}
ctx.Logger().Debug("account awarded slashed funds",
"reward", otherReward,
"total_slashed", totalSlashed,
"address", addr,
)
}

return nil
}
Loading