Context Free Sectors #496
Replies: 3 comments 2 replies
-
Thanks @ZenGround0, this is well motivated. I think that opt-in CommD on chain strikes a pretty reasonable balance, at least for a first step that would be easy to implement today. This does have potential to add to the size of chain state, but
We ultimately need to either aggregate (like #240) or move most sector state off-chain. In either case, I would expect that to result in very marginal cost to committing all sectors' CommDs. This is a decent stepping stone. It'll cause us to need to solve the sector state size problem a bit sooner, but we need to do that at some point anyway. One wrinkle here: while we can store CommD on chain, I don't think we should expose it directly to other actors via an API #401, because we won't be able to support that API after making either of the scale improvements mentioned above. I think we can instead support an API that can verify a proof of some form that a sector's CommD is as a caller claims. |
Beta Was this translation helpful? Give feedback.
-
Some additional ideas worth sharing around this. A good point brought up by @nicola is the following: Paraphrasing the argument:
I generally agree with this part Paraphrasing:
I agree with some of this. In particular mostly all of the use cases motivating context free sectors above suffer from probably undesirable abstraction breaking. However I disagree on two new points.
For 1 consider that in the filesystem case the OS mounts and unmounts the filesystem on startup and shutdown. It is possible to imagine Filecoin based filesystem infrastructure "shutting down" and "starting up" through a process of migrating between smart contracts. If the first contract enabling files over raw sector storage is in a bad state it might be better to "remount" by interacting with raw sector storage directly. For 2 imagine a market (filesystem in our analogy) actor with its data state off chain or aggregated compactly. Incorporation of raw data CommD into the compressed state will need to happen alongside a proof of market state that only the market actor's offchain party can provide. Push based ActivateDeals will not be able to complete the activation. At best the market actor will need to stage CommPs from activation in on chain state and then do a provable update based on them. The natural interface here is to initiate the activation (moutning in our analogy) from the market actor providing a proof that a) the miner actor tracks CommD, b) the market actor also tracks CommD. The method of activation has direct scaling implications. Push based ActivateDeals will always incur the transfer of CommD/CommPs on chain. ContextFree activation can scale with improvements in proofs. Similar to off chain market actor state is "other chain" market actor state and so there are potentially blockchain network bridging implications. If you want to "mount" sector data directly on a separate chain instead of in a filecoin actor then the context free method should be strictly easier to support. This is assuming that asynchronous proof based methods are easier to port to other chains than a synchronous vm message method. |
Beta Was this translation helpful? Give feedback.
-
I would love to see a simplified, layered, decoupled, easier-to-evolving design of Filecoin protocol, that's why I am engaged in #725 discussion. The context-free sectors proposal is a great example of decoupling sectors and markets at lower level. This is really a direction I strongly support. I have an immature idea regarding the layers and logical boundary of each layer, showing in the picture below: If we only consider the storage:
The question here is whether the pieceInfo list is too heavy to be on chain for scaling; At least, this is lighter than the current design which have a heavy market/deal in the core, but almost no support for upper-layer apps. However this can be changed later when we have subnets, we could even move storage proving to a (few) subnet(s) to save the root-net bandwidth. |
Beta Was this translation helpful? Give feedback.
-
Definition and Motivation
Today filecoin sector data has a fixed context of use. Sectors are linked explicitly to deals in the builtin market actor. The existing proposal for opening up this interface to user actors maintains this property. Sectors notify user contracts of verifiable data stored by the sector once at activation and then forget the data they verifiably store. In the current conception further interactions may exist between miner and data user contracts on sector events like faults, extensions, data updates, terminations etc. These interactions happen with respect to the original context of use, for example today the builtin market is notified when a deal's sector is terminated.
Since sectors are pointers to verifiable content addressed data they are always potentially useful beyond the original context of use. I'm defining sectors that can be proven to store data outside of their commitment event as "context free".
Note this is mostly orthogonal to #298, this proposal is about adding context free sectors, not reverting the current conception. Both ways of operating should work together with essentially no interference.
Motivating use cases
Context free sectors are useful whenever a user contract has a reason to accept proof of data at some point later than commitment.
Contract starts after sector
A data DAO may come online with the purpose of incentivizing the continued storage of certain data by rewarding sectors containing this data. Context free sectors are able to claim this reward even if they were committed before this contract existed.
Contract uses network for caching
A contract facilitating compute over verified data may pay out to a sector containing data that provably satisfies some computation. This contract could accept data already sealed into a sector rather than wait for a new sector to be sealed with the output. Doing so would likely be useful to speedup compute.
Amortized cost savings
This is general but repair is a nice concrete instance.
Repair protocols may dynamically pay out to an existing sector in the event another sector carrying some of its data has faulted. It's possible that eager activation and tracking is less efficient than lazy payouts for very low probability faults.
BadBit Insurance is another concrete use case.
Insurers are oracles which are paid upfront by storage providers, adjudicate if that hashes are bad bits and payout to miners when their terminated sectors contain bbad hashes. Insurers will prefer to operate with a lower on chain footprint. Instead of being notified of every miner deal and tracking data explicitly they could just track operators at the actor level and lazily accept proof of inclusion of bad hashes for lower amortized costs.
Discovery
This is general but repair is a nice concrete instance.
Repair protocols get utility out of any sector that contains faulted pieces, even if these sectors are committed in contexts unaware of or unwilling to pay up front commitment costs to support the repair protocol. Searching for matching data without opt in is a challenge. One solution is to use a discovery service that tracks network piece inputs offchain and posts miner/sector indexes to a discovery contract that pays out to both the caller and the miner after proving that the piece is included in the sector.
Implementing context free sectors
There are many paths to do this. The current thinking appears to be that this should be left up to markets. Here I explore the tradeoffs of solutions.
CommD in SectorInfo
The simplest solution is to store CommD in any non-CC sector info while CC sectors record cbor null. Then add a "CommD(sectorNumber abi.SectorNumber)" method to the miner actor which returns the commD for proving against. This method will fail if the sector has expired.
This is also the most expensive for the network. Anecdotally the CommR data in the sector info is the biggest contributor to total filecoin state tree size (need data to confirm and be precise).
For: most sectors are CC today so this is not currently a drastic state tree size diff (need data to confirm and be precise). Even if most sectors were CC this change does not impact state scaling more than a factor of 2. This proposal halves the lead time the network has to prepare for failure under load as state size grows big. It does not create a new problem but accelerates an existing one in a bounded (2x) way.
Against: in the best case where the network is filled with useful data this proposal effectively doubles state tree size. This is a large price to pay especially since most sectors will never need to be used context free.
Opt-out variant
Similar to above
Opt-in variant
CommD is optional in non CC sector infos. In the "worst" case where everyone opts in this performs the same as CommD in SectorInfo.
For: Most storage providers will use the default of opt-out so this scales better in practice. The network and operators are more likely to pay the cost for sectors that are actually going to be used context free since they planned to make them available like this at commitment. You could argue that context free use cases are very unlikely to apply to the average sector and context free sectors are not likely to be used unless the client or provider has planned it to be so.
Against: Most storage providers will use the default. The point of context fee sectors is that data usage is difficult to predict by the committing parties. You could argue that the value of context free sectors is in the network effect of all sectors being provable this way. From this standpoint if there is actually value in context free usage this proposal would miss out on most of the value and we would never know.
StoreCommD standard interface
This is the same as above except storage of CommD is delegated to another contract that is called during commitment explicitly alongside calls to user "market" actors. Such contracts could define a standard of use. Two methods would exist
StoreCommD(commD, minerID, sectorNumber)
ProveData(inclusionProof, minerID, sectorNumber)
. Miner actors could either define commD storage policies themselves or delegate commD storage to the client specified piece manifest.Either the opt-in and required variants of the above proposal could be implemented this way.
For: separation of concerns, modular contracts
Against: a bit more expensive, the pattern of detecting bad sector status (termination / fault) alongside proof is now a required separate call to storage provider.
StoreCommD delegation to market
Like the above but instead of miner actors doing this explicitly as part of the programmable market interface add CommD to
SectorContentAddedParams
and delegate CommD storage to the market.For: simplify main data notification interface, separation of concerns
Against: provider has only transitive control and state about its sectorID commD mapping. Miner has to trust both market and commD mapping contract for storage of commDs. Makes sending to arbitary untrusted markets less desirable in some cases (i.e. bad bit insurance, as a miner you can't get a payout on a market that doesn't use a standard sector ID to commD mapping contract and you can't know)
ProveSectorAgain method and PoRep Scavengers
This is a departure from the other proposals. We could drop all storage of CommD from anywhere on chain and still satisfy ContextFreeSectors assuming there is someone offchain who cares about proving a sector's CommD. The tradeoff is much greater proving cost, an offchain proving component and annoying constraints on vm syscalls.
In this proposal there exists a PoRep "scavenger" offchain. They gather PoReps, CommDs and their inputs from the past. When someone wants to use a context free sector the PoRep scavenger provides them with the inputs needed to prove CommR corresponds to CommD. The miner actor has method
ProveSectorAgain
that takes in a PoRep and CommD and arguments, checks the randomness is valid (this is annoying), and validates that CommD is safe to trust.This is annoying for a few reasons. Randomness can go back arbitrarily far which puts constraints on the vm that I understand we would like to limit (i.e. we'd like to say you can't check randomness past 10 finalities or something). One solution is to run a smart contract through a public endpoint that reads randomness from the header and records it onchain. It needs to be called frequently enough to not fall behind the vm's syscall limit. This trusted contract is then consulted by the proving contract to validate the PoRep scavenger's inputs.
More annoying is the interaction with Aggregate PoRep. I confirmed with @nikkolasg that snark pack proofs cannot efficiently prove a subset of their constituent proofs. So any sectors originally added through AggregateProveCommit incur the overhead of their entire batch of sector commitments. Different aggregation constructions would have different properties so with a new proof scaling system (much more feasible to implement than different PoRep) we could achieve efficient subproof. Also maybe the PoRep scavenger could work with storage providers to grab their raw PoReps incident to their original snark pack proof. On the other side of things IIRC (needs confirmation) PoRep scavenger could snark pack together individual ProveCommits with snark pack with only public inputs which would be a good cost savings benefit to snark pack.
For: you only incur cost when you want context free sectors and all sectors are context free by default. If we could scale up porep batching and service large batches of requests then the PoRep scavenger might be able to scale.
Against: naively very expensive to use a context free sector limiting applications. Requires on and offchain infrastructure and starts wanting new PoRep aggregation constructions for more efficiency.
Context free metadata
In the same way that proof of data can be made context free decoupling other properties of the sector from commitment context would allow for more verifiable uses of context free sectors. For example if a miner actor enforces a blanket policy ensuring that it will not mutate a sector for the next X epochs then context free users of the sector have a verifiable guarantee that they can rely on the data not being updated. This is without having registered a
SectorContentUpdated
callback with a miner.My opinion
Since I wrote this it's clear I think there is something to this. I like the simplicity of storing every CommD but am skeptical about the cost tradeoff. I think at least opt-in CommD storage directly from the miner at commitment is a must to enable applications that can't rely on untrusted markets recording the mapping. It would be cool to get a PoRep scavenging system cost efficient and this seems maybe doable.
Beta Was this translation helpful? Give feedback.
All reactions