Rationale against burning pledge collateral for storage faults #407

anorth · 2019-07-26T05:11:07Z

I believe it is desirable for us to develop proof and storage market mechanisms that avoid penalising pledge collateral in the case of plausibly-accidental failures of storage by miners. I think a consensus around this is now emerging (e.g. in #403) but I'm writing this up so we have something to refer to in the future when/if this is questioned.

I am assuming that pledge collateral represents a significant financial lock-up: in the early life of the network, and depending on FIL price, potentially much higher than the cost of hardware for a given storage capacity. Having this at risk in the case of operational failures, which are inevitable, represents a very large risk to a prospective miner – maybe enough to make participation a too-risky financial proposition.

A miner may fail to prove storage of one or more sectors for a large array of real-world reasons, from drive and machine failures, down to cosmic ray bit-flips, and up to datacentre network and power outages and Filecoin network partitions. These are all events that will happen to some miners.

In the case of permanent loss, e.g. of a hard drive, if the costs of that loss are too high, miners will replicate their data to mitigate the risk of losing more than the value of that drive. This will force storage prices up and proven storage down from where they might otherwise be.

At large scale, the risk becomes more about getting a PoSt message onto the network, even when all storage remains available. A datacentre network cable cut or power outage could easily take a datacentre offline for a whole proving period, and the idea that this could cost the miner more than the capital cost of that datacentre - despite actually maintaining petabytes of committed storage, is a bit outlandish. This too could drive very complex cross-DC replication strategies that suffer diseconomies of scale.

From a market perspective, we'd prefer storage clients to make choices about reliability vs cost of storage (through replication and/or negotiated storage collateral (#386)), but if miners have strong non-market reasons to offer only extremely-high reliability, then other options will disappear from the market.

Burning pledge collateral for faults introduces a very sharp edge between a miner deciding ahead of time to decommission a sector, and doing so involuntarily due to a hardware failure. It also raises short-term incentives to censor PoSts from other miners, especially as they arrive close to the deadline. If too extreme, it might even incentivise physical sabotage of competing miners' operations. It's a sharp divergence from the familiar mental model of POW mining, where the worst outcome of an operational failure is lost block rewards, and no potential to lose more than the up-front capital investment in a single moment.

The chain can't tell the difference between malicious and non-malicious failure to prove storage, but treating all failures as malicious and penalising them heavily makes participation extremely high risk for miners, and they'll either need to invest heavily in expensive security, replication and disaster-recovery mechanisms (raising both prices and barriers to entry) or just not participate – neither are good outcomes. Miners need to know that pledge collateral is safe: they will never lose it unless they deliberately misbehave. This is especially doubly true if we expect large FIL holders to lend FIL to miner operators as a stake.

There remains an unresolved desire to incentivise long-term stability of committed storage, especially when it is holding client data. I don't think penalisation of pledge collateral achieves that: it only introduces a 1-proving-period delay for a miner to declare a sector done. IMO market-based storage deal collateral should be a primary incentive here, though I could be convinced we need more.

sternhenri · 2019-07-27T00:41:40Z

There is a lot above, and it may be worth a conversation to cover point by point, but a quick answer on my end wrt my own understanding of the protocol, sous tutelle de @whyrusleeping @dignifiedquire.

All your points about miner strategies to try and avoid avoidable pledge faults are well taken, though I would push back in saying that many of strategies can be employed to mitigate to negligeable the risks of certain black swan events (power cable cut) etc for miners. The risk I am most sensitive to is higher storage prices due to overly strong replication, though most professional storage today does this, through physical replicas, erasure coding, etc. I believe we should expect some miners to rationally want this, it may in fact be desirable (they might signal their confidence through posting higher storage collaterals which is a good signal to clients).
My current view of the design allows for separation of concerns between pledge and storage collateral, enabling pledge to only be slashed for proveable consensus faults by miners not following the protocol.
Thereafter, I may be a bit more nuanced with regards to the invariant that pledge should never be touched unless if absolute certainty of misbehavior exists. I think that is highly desirable, but depending on incentive models, I could see a world in which probabilistic slashing would be acceptable. Again this is not the case today, and I don't foresee this happening lightly and without extensive testing, but it could. Far better to avoid it (and I believe we do).

As is, a miner's consensus power is cut immediately after a storage fault, and their pledge collateral cut (with an ability to recover at first); however the pledge collateral itself is only tied to provable faults dealing with Nothing-at-Stake, though the ramifications of #403 and the surrounding conversations may need to be finalized.

zixuanzh · 2020-01-24T22:16:54Z

Done, pledge collateral is only slashed for consensus fault. Miners need to pay a TemporaryFault fee for storage faults.

This was referenced Jul 26, 2019

Can a sector be re-committed after done? #408

Open

Simpler faults and recovery if penalties are reduced #409

Open

anorth mentioned this issue Aug 4, 2019

Storage collateral lock-up and recovery #386

Closed

mhammersley mentioned this issue Aug 22, 2019

Resolve TODOs around PoSt and fault handling #449

Closed

5 tasks

zixuanzh added S: storage-market labels Sep 12, 2019

zixuanzh closed this as completed Jan 24, 2020

snyk-bot mentioned this issue Feb 27, 2023

[Snyk] Upgrade nanoid from 3.3.4 to 4.0.1 majacQ/specs#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale against burning pledge collateral for storage faults #407

Rationale against burning pledge collateral for storage faults #407

anorth commented Jul 26, 2019 •

edited by pooja

Loading

sternhenri commented Jul 27, 2019

zixuanzh commented Jan 24, 2020

Rationale against burning pledge collateral for storage faults #407

Rationale against burning pledge collateral for storage faults #407

Comments

anorth commented Jul 26, 2019 • edited by pooja Loading

sternhenri commented Jul 27, 2019

zixuanzh commented Jan 24, 2020

anorth commented Jul 26, 2019 •

edited by pooja

Loading