Context Free Sectors #496

ZenGround0 · 2022-10-20T23:01:44Z

ZenGround0
Oct 20, 2022
Collaborator

Definition and Motivation

Today filecoin sector data has a fixed context of use. Sectors are linked explicitly to deals in the builtin market actor. The existing proposal for opening up this interface to user actors maintains this property. Sectors notify user contracts of verifiable data stored by the sector once at activation and then forget the data they verifiably store. In the current conception further interactions may exist between miner and data user contracts on sector events like faults, extensions, data updates, terminations etc. These interactions happen with respect to the original context of use, for example today the builtin market is notified when a deal's sector is terminated.

Since sectors are pointers to verifiable content addressed data they are always potentially useful beyond the original context of use. I'm defining sectors that can be proven to store data outside of their commitment event as "context free".

Note this is mostly orthogonal to #298, this proposal is about adding context free sectors, not reverting the current conception. Both ways of operating should work together with essentially no interference.

Motivating use cases

Context free sectors are useful whenever a user contract has a reason to accept proof of data at some point later than commitment.

Contract starts after sector

A data DAO may come online with the purpose of incentivizing the continued storage of certain data by rewarding sectors containing this data. Context free sectors are able to claim this reward even if they were committed before this contract existed.

Contract uses network for caching

A contract facilitating compute over verified data may pay out to a sector containing data that provably satisfies some computation. This contract could accept data already sealed into a sector rather than wait for a new sector to be sealed with the output. Doing so would likely be useful to speedup compute.

Amortized cost savings

This is general but repair is a nice concrete instance.

Repair protocols may dynamically pay out to an existing sector in the event another sector carrying some of its data has faulted. It's possible that eager activation and tracking is less efficient than lazy payouts for very low probability faults.

BadBit Insurance is another concrete use case.

Insurers are oracles which are paid upfront by storage providers, adjudicate if that hashes are bad bits and payout to miners when their terminated sectors contain bbad hashes. Insurers will prefer to operate with a lower on chain footprint. Instead of being notified of every miner deal and tracking data explicitly they could just track operators at the actor level and lazily accept proof of inclusion of bad hashes for lower amortized costs.

Discovery

This is general but repair is a nice concrete instance.

Repair protocols get utility out of any sector that contains faulted pieces, even if these sectors are committed in contexts unaware of or unwilling to pay up front commitment costs to support the repair protocol. Searching for matching data without opt in is a challenge. One solution is to use a discovery service that tracks network piece inputs offchain and posts miner/sector indexes to a discovery contract that pays out to both the caller and the miner after proving that the piece is included in the sector.

Implementing context free sectors

There are many paths to do this. The current thinking appears to be that this should be left up to markets. Here I explore the tradeoffs of solutions.

CommD in SectorInfo

The simplest solution is to store CommD in any non-CC sector info while CC sectors record cbor null. Then add a "CommD(sectorNumber abi.SectorNumber)" method to the miner actor which returns the commD for proving against. This method will fail if the sector has expired.

This is also the most expensive for the network. Anecdotally the CommR data in the sector info is the biggest contributor to total filecoin state tree size (need data to confirm and be precise).

For: most sectors are CC today so this is not currently a drastic state tree size diff (need data to confirm and be precise). Even if most sectors were CC this change does not impact state scaling more than a factor of 2. This proposal halves the lead time the network has to prepare for failure under load as state size grows big. It does not create a new problem but accelerates an existing one in a bounded (2x) way.

Against: in the best case where the network is filled with useful data this proposal effectively doubles state tree size. This is a large price to pay especially since most sectors will never need to be used context free.

Opt-out variant

Similar to above

Opt-in variant

CommD is optional in non CC sector infos. In the "worst" case where everyone opts in this performs the same as CommD in SectorInfo.

For: Most storage providers will use the default of opt-out so this scales better in practice. The network and operators are more likely to pay the cost for sectors that are actually going to be used context free since they planned to make them available like this at commitment. You could argue that context free use cases are very unlikely to apply to the average sector and context free sectors are not likely to be used unless the client or provider has planned it to be so.

Against: Most storage providers will use the default. The point of context fee sectors is that data usage is difficult to predict by the committing parties. You could argue that the value of context free sectors is in the network effect of all sectors being provable this way. From this standpoint if there is actually value in context free usage this proposal would miss out on most of the value and we would never know.

StoreCommD standard interface

This is the same as above except storage of CommD is delegated to another contract that is called during commitment explicitly alongside calls to user "market" actors. Such contracts could define a standard of use. Two methods would exist StoreCommD(commD, minerID, sectorNumber) ProveData(inclusionProof, minerID, sectorNumber). Miner actors could either define commD storage policies themselves or delegate commD storage to the client specified piece manifest.

Either the opt-in and required variants of the above proposal could be implemented this way.

For: separation of concerns, modular contracts

Against: a bit more expensive, the pattern of detecting bad sector status (termination / fault) alongside proof is now a required separate call to storage provider.

StoreCommD delegation to market

Like the above but instead of miner actors doing this explicitly as part of the programmable market interface add CommD to SectorContentAddedParams and delegate CommD storage to the market.

For: simplify main data notification interface, separation of concerns
Against: provider has only transitive control and state about its sectorID commD mapping. Miner has to trust both market and commD mapping contract for storage of commDs. Makes sending to arbitary untrusted markets less desirable in some cases (i.e. bad bit insurance, as a miner you can't get a payout on a market that doesn't use a standard sector ID to commD mapping contract and you can't know)

ProveSectorAgain method and PoRep Scavengers

This is a departure from the other proposals. We could drop all storage of CommD from anywhere on chain and still satisfy ContextFreeSectors assuming there is someone offchain who cares about proving a sector's CommD. The tradeoff is much greater proving cost, an offchain proving component and annoying constraints on vm syscalls.

In this proposal there exists a PoRep "scavenger" offchain. They gather PoReps, CommDs and their inputs from the past. When someone wants to use a context free sector the PoRep scavenger provides them with the inputs needed to prove CommR corresponds to CommD. The miner actor has method ProveSectorAgain that takes in a PoRep and CommD and arguments, checks the randomness is valid (this is annoying), and validates that CommD is safe to trust.

This is annoying for a few reasons. Randomness can go back arbitrarily far which puts constraints on the vm that I understand we would like to limit (i.e. we'd like to say you can't check randomness past 10 finalities or something). One solution is to run a smart contract through a public endpoint that reads randomness from the header and records it onchain. It needs to be called frequently enough to not fall behind the vm's syscall limit. This trusted contract is then consulted by the proving contract to validate the PoRep scavenger's inputs.

More annoying is the interaction with Aggregate PoRep. I confirmed with @nikkolasg that snark pack proofs cannot efficiently prove a subset of their constituent proofs. So any sectors originally added through AggregateProveCommit incur the overhead of their entire batch of sector commitments. Different aggregation constructions would have different properties so with a new proof scaling system (much more feasible to implement than different PoRep) we could achieve efficient subproof. Also maybe the PoRep scavenger could work with storage providers to grab their raw PoReps incident to their original snark pack proof. On the other side of things IIRC (needs confirmation) PoRep scavenger could snark pack together individual ProveCommits with snark pack with only public inputs which would be a good cost savings benefit to snark pack.

For: you only incur cost when you want context free sectors and all sectors are context free by default. If we could scale up porep batching and service large batches of requests then the PoRep scavenger might be able to scale.

Against: naively very expensive to use a context free sector limiting applications. Requires on and offchain infrastructure and starts wanting new PoRep aggregation constructions for more efficiency.

Context free metadata

In the same way that proof of data can be made context free decoupling other properties of the sector from commitment context would allow for more verifiable uses of context free sectors. For example if a miner actor enforces a blanket policy ensuring that it will not mutate a sector for the next X epochs then context free users of the sector have a verifiable guarantee that they can rely on the data not being updated. This is without having registered a SectorContentUpdated callback with a miner.

My opinion

Since I wrote this it's clear I think there is something to this. I like the simplicity of storing every CommD but am skeptical about the cost tradeoff. I think at least opt-in CommD storage directly from the miner at commitment is a must to enable applications that can't rely on untrusted markets recording the mapping. It would be cool to get a PoRep scavenging system cost efficient and this seems maybe doable.

anorth · 2022-10-27T16:49:48Z

anorth
Oct 27, 2022
Maintainer

Thanks @ZenGround0, this is well motivated.

I think that opt-in CommD on chain strikes a pretty reasonable balance, at least for a first step that would be easy to implement today. This does have potential to add to the size of chain state, but

not too much in the near future, unless something fairly dramatic changes in growth of data onboarding,
SPs will be paying gas for the additional space they opt in to, so if priced correctly there'll be utility in exchange for the cost

We ultimately need to either aggregate (like #240) or move most sector state off-chain. In either case, I would expect that to result in very marginal cost to committing all sectors' CommDs. This is a decent stepping stone. It'll cause us to need to solve the sector state size problem a bit sooner, but we need to do that at some point anyway.

One wrinkle here: while we can store CommD on chain, I don't think we should expose it directly to other actors via an API #401, because we won't be able to support that API after making either of the scale improvements mentioned above. I think we can instead support an API that can verify a proof of some form that a sector's CommD is as a caller claims.

0 replies

ZenGround0 · 2022-10-27T19:09:40Z

ZenGround0
Oct 27, 2022
Collaborator Author

Some additional ideas worth sharing around this.

A good point brought up by @nicola is the following:

Paraphrasing the argument:

Sector storage is raw data. Other actors will be in charge of organizing and presenting this data in a readable format for separation of concerns. For example one logical unit of readable data may be spread across sectors. A useful analogy is a filesystem. Sector storage in miner actors is something like a hard disk. Actors like today's market actor would become analogous to filesystems. Sector activation is something like a filesystem mount. Only when mounted does data becomes available for other contracts in user space.

I generally agree with this part

Paraphrasing:

We should not allow reading of raw sector storage by arbitrary contracts. Other contracts should be operating at a higher level of abstraction (i.e. the filesystem). OSes force reading disk data through a filesystem after a filesystem has been mounted. Context free sectors are breaking abstractions that shouldn't be broken.

I agree with some of this. In particular mostly all of the use cases motivating context free sectors above suffer from probably undesirable abstraction breaking. However I disagree on two new points.

There are cases where we would want to preserve the abstraction boundary but still use context free sectors.
If we scale market actors in the ways @anorth is talking about scaling miner actors above then any use of the raw sector storage requires something like context free use.

For 1 consider that in the filesystem case the OS mounts and unmounts the filesystem on startup and shutdown. It is possible to imagine Filecoin based filesystem infrastructure "shutting down" and "starting up" through a process of migrating between smart contracts. If the first contract enabling files over raw sector storage is in a bad state it might be better to "remount" by interacting with raw sector storage directly.

For 2 imagine a market (filesystem in our analogy) actor with its data state off chain or aggregated compactly. Incorporation of raw data CommD into the compressed state will need to happen alongside a proof of market state that only the market actor's offchain party can provide. Push based ActivateDeals will not be able to complete the activation. At best the market actor will need to stage CommPs from activation in on chain state and then do a provable update based on them. The natural interface here is to initiate the activation (moutning in our analogy) from the market actor providing a proof that a) the miner actor tracks CommD, b) the market actor also tracks CommD.

The method of activation has direct scaling implications. Push based ActivateDeals will always incur the transfer of CommD/CommPs on chain. ContextFree activation can scale with improvements in proofs.

Similar to off chain market actor state is "other chain" market actor state and so there are potentially blockchain network bridging implications. If you want to "mount" sector data directly on a separate chain instead of in a filecoin actor then the context free method should be strictly easier to support. This is assuming that asynchronous proof based methods are easier to port to other chains than a synchronous vm message method.

1 reply

anorth Oct 29, 2022
Maintainer

Thanks @ZenGround0. I also agree with much of @nicola's premise, but not the conclusions.

Yes, other (i.e. non-builtin) actors should be responsible for organising and presenting raw data for separation of concerns. If we implement anything but raw storage in the miner actor, we will not separate concerns and will restrict the space of possibilities for data organisation. Market-like actors can implement filesystem-like abstractions, more convenient for application use. But those actors need access to the raw storage in order to do so!

We can talk about "should" use recommended design patterns, but we would constrain innovation if we mandated them. Just as there are legitimate reasons to use direct block device access on a traditional computer (e.g. when implementing a database), there will be important reasons to access raw storage in Filecoin sectors.

steven004 · 2023-07-28T10:29:35Z

steven004
Jul 28, 2023
Collaborator

I would love to see a simplified, layered, decoupled, easier-to-evolving design of Filecoin protocol, that's why I am engaged in #725 discussion. The context-free sectors proposal is a great example of decoupling sectors and markets at lower level. This is really a direction I strongly support.

I have an immature idea regarding the layers and logical boundary of each layer, showing in the picture below:

If we only consider the storage:

In each SP's local storage, the sector is a container unit to accommodate arbitrary data, and of course, she need to submit proofs of the data stored as defined in PoRep and PoSt; In this level, from storage point of view, we could even normalize the CC and sectors with deal by adding air-piece for filling blanks in a sector. Remember air-piece is still a piece. At the level, no cares about what data is saved;
SectorOnChainInfo, there could be two options: (1) sector Number, CommR +type, and pieceInfo(CID, size) list in this sector; (2) almost same as (1) but moving the sector-pieces mapping off-chain or high-level, and adding CommD if needed for proving;
All market and deal related information could be in user-space, this will largely facilitate deal and market innovation.

The question here is whether the pieceInfo list is too heavy to be on chain for scaling; At least, this is lighter than the current design which have a heavy market/deal in the core, but almost no support for upper-layer apps. However this can be changed later when we have subnets, we could even move storage proving to a (few) subnet(s) to save the root-net bandwidth.

1 reply

anorth Jul 30, 2023
Maintainer

moving the sector-pieces mapping off-chain or high-level

#730 moves the sector/piece mapping out of the miner actor, into markets. New market actors could then take this mapping out to an L2/subnet or similar. I think we're generally headed in the same direction you are pointing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context Free Sectors #496

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Context Free Sectors #496

ZenGround0 Oct 20, 2022 Collaborator

Definition and Motivation

Motivating use cases

Contract starts after sector

Contract uses network for caching

Amortized cost savings

Discovery

Implementing context free sectors

CommD in SectorInfo

Opt-out variant

Opt-in variant

StoreCommD standard interface

StoreCommD delegation to market

ProveSectorAgain method and PoRep Scavengers

Context free metadata

My opinion

Replies: 3 comments · 2 replies

anorth Oct 27, 2022 Maintainer

ZenGround0 Oct 27, 2022 Collaborator Author

anorth Oct 29, 2022 Maintainer

steven004 Jul 28, 2023 Collaborator

anorth Jul 30, 2023 Maintainer

ZenGround0
Oct 20, 2022
Collaborator

Replies: 3 comments 2 replies

anorth
Oct 27, 2022
Maintainer

ZenGround0
Oct 27, 2022
Collaborator Author

anorth Oct 29, 2022
Maintainer

steven004
Jul 28, 2023
Collaborator

anorth Jul 30, 2023
Maintainer