Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus should resolve stakes on work completion #314

Closed
willscott opened this issue May 30, 2018 · 19 comments
Closed

Consensus should resolve stakes on work completion #314

willscott opened this issue May 30, 2018 · 19 comments
Labels
c:registry Category: entity/node/runtime registry service p:2 Priority: desired feature

Comments

@willscott
Copy link
Contributor

As a follow-on to #238, one of the steps in task finalization is that consensus needs to resolve the escrow staked by the compute nodes on the commitee

@willscott willscott added p:1 Priority: core feature c:registry Category: entity/node/runtime registry service labels May 30, 2018
@ryscheng ryscheng added this to the Version A: MVP features + Testnet milestone May 31, 2018
@ryscheng
Copy link
Contributor

ryscheng commented Jun 1, 2018

Scoping into this issue, the consensus layer needs to punish and evict misbehaving actors.

@willscott
Copy link
Contributor Author

The bulk of the friction here is going to be getting this working in solidity.

The relevant contracts are:

@abukosek
Copy link
Contributor

OK, so here's my current (limited) understanding of how things work and how to approach this:

  • Consensus contract must call things in the Stake contract.
  • Cross-contract calls are just normal function calls (according to issue Cross Contract Calls RFC #344), so I can just import "./Stake.sol"; and somehow use the Stake methods in there.
  • Escrows are allocated elsewhere (registry?) when a worker wants to join a computation group.
  • Consensus contract should keep track of which workers are "bad", so that it can punish them (take their escrow) at the end of the round (when it reaches the Finalized state). I can implement this by adding two bool[] arrays into the Round struct (one for the normal workers and one for the backup workers) and then if there's a discrepancy, I should somehow check to see which nodes were wrong and set their bool. Then at the end of the round, all workers that have misbehaved lose a part (or all?) of their escrow.
  • To do that, the Consensus contract should somehow be able to get the escrow ID of each worker from its Ethereum address (how?) and use that to call takeAndReleaseEscrow.

Should I just call listActiveEscrows with the Ethereum address of the worker and pick one of the returned escrows (which one?)?

Also, what happens if nobody misbehaved -- do I release each worker's escrow in full or not touch them at all? (My guess is that I should release them in full.)

@Yawning
Copy link
Contributor

Yawning commented Jun 27, 2018

Cross-contract calls are just normal function calls (according to issue #344), so I can just import "./Stake.sol"; and somehow use the Stake methods in there.

contract Foo {
  Bar bar_contract;

  constructor(address _bar_addr) {
    bar_contract = Bar(_bar_addr);
  }

  function do_things() {
    bar_contact.some_method();
  }
}

Consensus contract should keep track of which workers are "bad", so that it can punish them (take their escrow) at the end of the round (when it reaches the Finalized state). I can implement this by adding two bool[] arrays into the Round struct (one for the normal workers and one for the backup workers) and then if there's a discrepancy, I should somehow check to see which nodes were wrong and set their bool. Then at the end of the round, all workers that have misbehaved lose a part (or all?) of their escrow.

Adding the state variable to Commitment is probably "better".

@abukosek
Copy link
Contributor

Thanks! Now I just need to figure out the escrow ID stuff and I can get started :)

@willscott
Copy link
Contributor Author

@abukosek - do you want to take a stab at refactoring stake to work for this use case, now that we've got something a bit more concrete that we need to deal with.

  • We should consider adding the consensus contract address as a parameter specified in the constructor as a source of transactions able to resolve escrows.
  • things are probably simpler with 2 states for escrows, rather than an arbitrary number of escrow id accts: amount staked, and amount committed for being on a committee.

cc @bennetyee - are you opposed to seeing a refactor of stake that's more special case but easier to work with based on these design changes?

@abukosek
Copy link
Contributor

abukosek commented Jul 2, 2018

OK, I can try, but not sure if I'll be able to get all this done by the end of the week (if it's just the Solidity code that needs changing, then that might be doable, but if I also need to touch web3 stuff then I'm pretty sure it's not going to be finished this week). I'll make a WIP PR for the Consensus contract today.

@willscott
Copy link
Contributor Author

Would you be up for starting with the stake side? I think consensus is in more flux than stake at the moment, but a simplified stake will also be useful in having the registry validate stake, and when it comes time to wiring it up to consensus

@abukosek
Copy link
Contributor

abukosek commented Jul 2, 2018

Sure, no problem, I'll start working on it :)

@bennetyee
Copy link
Contributor

@willscott not opposed. so as i understand it entity running compute nodes has to give consensus permission to escrow some max amount, and consensus has to keep track of which node has what amount in jeopardy so that if one node fails (exhibits discrepant behavior) that amount and not some other is forfeit; in the current design, the escrow_id helps to keep track of that amount, but if the consensus contract wants to keep track, that's ok. the current behavior is that escrow closure is when consensus takes some amount that is forfeit, and the rest reverts to the stakeholder (entity); and what is proposed here is that an escrow account is subject to a potentially long sequence of forfeitures without closing, right?

@abukosek
Copy link
Contributor

abukosek commented Jul 3, 2018

I think so, yes.

The way I understand it, the new stake interface should keep track of only one escrow amount per ethereum address.
The changes would be minimal, I think:

  • the Stake contract would be modified to not have an array of escrows per ethereum address, but just one (so it's basically just a mapping of ethereum_address -> (total_stake, amount_escrowed)),
  • then listActiveEscrows{Get,Iterator} could be replaced with a simple fetchEscrowAmount (taking an ethereum address of the owner) and
  • the takeAndReleaseEscrow would be split into two functions takeEscrow and releaseEscrow (both taking an ethereum address of the escrow's owner and takeEscrow also having an extra parameter to specify the amount to take),
  • but also the Stake contract would need to have two hardcoded ethereum addresses -- one that can call takeEscrow (the Consensus contract) and one that can call releaseEscrow (the registry).

Then the normal usage would go something like this:

  • compute node wants to join a computation group, so it talks to the registry
  • registry needs to allocate escrow for the compute node, so it calls the Stake contract
  • Stake contract adds the specified escrow amount to the current escrow amount of the owner of the compute node (ethereum_address_of_compute_node)
  • at the beginning of each round (when registering a new committee), the Consensus contract checks that each compute node has enough escrow to participate (using fetchEscrowAmount for each node)
    • if a node doesn't have enough escrow, the Consensus contract calls the registry to kick that node out and ignores it for the rest of the round
      • if a node is kicked off, its escrow is released (by the registry) and needs to re-register if it wants to participate again
  • compute node does its stuff
  • at the end of the round, if the compute node behaved well, its escrow stays as it is (nothing happens), but if it misbehaved, a part of its escrow is taken with takeEscrow(ethereum_address_of_compute_node, amount) (XXX: It's not currently clear to me who specifies the amount to take -- will this be a hardcoded value for now or what? I'm currently imagining this as a fixed penalty for misbehaving and when a node registers the allocated escrow should be a multiple of this penalty.)
  • if a compute node wants to leave the computation group, it talks to the registry again and the registry releases its escrow with releaseEscrow(ethereum_address_of_compute_node) (this can currently only be done when the node isn't actively participating in a round, i.e. between committees).

@willscott @bennetyee: Does this seem OK so far?

We could also implement this with the current Stake implementation (I'm not against that at all), but the registry would have to keep track of the escrow IDs and it's not clear to me how that could be done -- in the Consensus contract, the only thing we've got are ethereum addresses of the worker nodes and we need to somehow get the correct escrow IDs just from those. We could maybe have one escrow per round and have the registry keep track of the current round, then report the correct escrow ID to the Consensus, but I don't know how hard that would be to implement in the registry. We could also just take the first escrow ID that the node has and take that when it misbehaves. In any case, it would be a bit more complicated. 🤷‍♂️

In the simplified implementation, there's just a single pot of escrow per compute node and as long as the node behaves, nothing happens, but when it misbehaves, the amount of escrow is reduced until it's no longer sufficient for the node to participate in a computation group. When the node decides to leave or is kicked off, the remaining escrow is released.

Disclaimer: I'm still new to all this, so please point out if I'm wrong or if things can't possibly work the way I currently imagine them to :)

One of the issues I see with this simplified design that the existing Stake design doesn't have is that once we start supporting multiple computation groups and if a node wants to be in more that one group at once (is this possible?), the single pot of escrow would be problematic -- e.g. what if it's penalized in one group, but is well-behaved in the other group, then it runs out of escrow and is effectively kicked out of both groups. Maybe that's not such a big problem, since if it's misbehaving in one group, it's probably not a good node? 🤷‍♂️

@willscott
Copy link
Contributor Author

💯 This all sounds good so far.

@bennetyee
Copy link
Contributor

The issue with hardcoded addresses is that we are deciding on a deployment order. The addresses could be hardcoded or be constructor parameters, but in both cases imply that we will deploy the consensus and registry first. We could also add in a notion of owner to allow an owner to (monotonically) set these two addresses before the contract is publicly usable, which would decouple deployment order from the design.

@bennetyee
Copy link
Contributor

bennetyee commented Jul 3, 2018

Wrt keeping track of escrow ids, I had imagined the work flow to be that the entity who runs many compute nodes would use a helper contract or metamask to allocate stake on a per-machine basis, and to pass the escrow id values to the registry on compute node registration, along with the machine's public key and other ancillary information. I don't think that the entity would want to allow their compute nodes access to their stake account private key -- there should be set-up scripts that make it easy to configure a compute node (docker image?), set up/specify an ssh key to control the compute node, have the compute node generate its own keypair, etc, that the entity would run from a highly secure "controller" machine, where dashboards etc might also be present to monitor the entity's fleet, and this might be where the entity's stake account controlling private key be kept.

@willscott
Copy link
Contributor Author

I think given the limitations of evm/solidity, and wrangling the various separate contracts that don't have full state available to them, less state in that regard is probably the right way to go for this first pass.

@bennetyee
Copy link
Contributor

The other aspect of having separate escrow account ids versus a big pot is that there is better fault isolation from simple programming errors. Entities are more likely to believe that if one machine were compromised, say, then only the stake associated with the escrow account for that machine would be forfeit, whereas more code audits (of the consensus contract) are likely to be needed in the big-escrow-pot model.

@willscott
Copy link
Contributor Author

willscott commented Jul 3, 2018 via email

abukosek added a commit that referenced this issue Jul 5, 2018
abukosek added a commit that referenced this issue Jul 5, 2018
@abukosek abukosek mentioned this issue Jul 5, 2018
4 tasks
abukosek added a commit that referenced this issue Jul 6, 2018
abukosek added a commit that referenced this issue Jul 6, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 6, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 6, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 6, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 9, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 10, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 10, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 10, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 10, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 10, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 11, 2018
See issue #314.
abukosek added a commit that referenced this issue Jul 12, 2018
See issue #314.
@willscott willscott added p:2 Priority: desired feature and removed p:1 Priority: core feature labels Jul 15, 2018
@ryscheng ryscheng removed the pending label Jul 24, 2018
@ryscheng
Copy link
Contributor

Now that we've adjusted our scope a little, let's deprioritize until after #310

@Yawning
Copy link
Contributor

Yawning commented Jul 1, 2019

While this still needs to happen, this issue and most of the design discussion refers to when we were flirting with Etherem, and newer issues supersede this.

@Yawning Yawning closed this as completed Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:registry Category: entity/node/runtime registry service p:2 Priority: desired feature
Projects
None yet
Development

No branches or pull requests

5 participants