You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stored data should be readily retrievable on the network and this can be regularly verified (though the use of manual or automated verification that includes retrieving data from various miners over the course of the DataCap allocation timeframe). At this time all LDNs may have full retrievability, but it is not required. Each project should specify what portion of the data is retrievable and provide justification. From there notaries can decide during the due diligence phases if the client’s application is justifiable and can agree to sign it or not.
Currently, notaries have been performing manual retrieval attempt using tools such as lotus client retrieve, however, with more retrieval protocols being introduced, i.e. bitswap, http, this may become a technical challenge for notaries to perform due diligence. Meanwhile, the on-chain message (namely PublishStorageDeals) does not include the necessary information to perform retrieval. Reaons being:
The pieceCid field only works with HTTP retrieval with booster-http module which is not a hard requirement for storage providers. Even though clients store data with storage providers with booster-http, it's not guaranteed that the downloaded CAR file complies to UnixFS IPLD and can be extracted to folder/file structures.
Historically, when making deals with lotus client, it puts dataCid (or payloadCid, rootCid) in the label field of the message. Notaries are now relying on this field to perform retrieval but that should not be relied on because:
a. this is a free string field that is not verifiable
b. client should free feel to put whatever value into this field, i.e. description of this deal
I would like to propose a change to the LDN application fields so it will be easy for notaries and automated tools (retrievers) to perform retrieval testing and sampling. All below changes are only applicable if the client checks the checkbox that confirms the dataset is retrievable by anyone, and the change only applies to new LDNs. If the LDN indicates that the dataset should not be retrievable by anyone, then below changes are not applicable.
Change existing field: the checkbox to confirm the dataset is retrievable by anyone -> confirm the dataset is retrievable and discoverable by anyone. The expectation is to have the dataset index published to IPNI.
Additional field: support fast retrieval - checkbox, default to true. If checked, this expects all storage providers to retain an unsealed copy and serve fast retrieval. Otherwise, it expects all storage providers to either serve fast retrieval or has unseal worker available for serving unsealing and retrieval via graphsync.
Additional field: supported retrieval protocols - multiple selections: bitswap, http, graphsync. This will allow retrievers to know how to perform retrieval testing.
Additional field: confirm whether label is used to store dataCid - checkbox, default to true. If checked, it expects that all PublishStorageDeal messages have the label field set to the rootCid of the deal CAR so retrievers can just rely on this field to perform retrieval
Additional field: If not using label as dataCid, then please provide a list of cid for this dataset or the URL to download such list - depending on the preparation tool, this could be a single RootCID which can be retrieved to get all the CIDs of its links, a list of CIDS that corresponds to the rootCID of each deal CAR file, or a list of CIDS that corresponds to each single file or folder.
Final note: this proposal does not suggest any requirement for dataset retrievability, i.e. minimum retrieval success rate, download speed, etc. Nor does it suggest that all dataset has to be retrievable. It merely adds necessary fields in the application so the retrievers are aware of how content can be retrieved if applicable.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
From the filecoin plus dataset eligibility rules , it says
Currently, notaries have been performing manual retrieval attempt using tools such as
lotus client retrieve
, however, with more retrieval protocols being introduced, i.e.bitswap
,http
, this may become a technical challenge for notaries to perform due diligence. Meanwhile, the on-chain message (namelyPublishStorageDeals
) does not include the necessary information to perform retrieval. Reaons being:pieceCid
field only works with HTTP retrieval withbooster-http
module which is not a hard requirement for storage providers. Even though clients store data with storage providers withbooster-http
, it's not guaranteed that the downloaded CAR file complies to UnixFS IPLD and can be extracted to folder/file structures.lotus client
, it putsdataCid
(orpayloadCid
,rootCid
) in thelabel
field of the message. Notaries are now relying on this field to perform retrieval but that should not be relied on because:a. this is a free string field that is not verifiable
b. client should free feel to put whatever value into this field, i.e. description of this deal
I would like to propose a change to the LDN application fields so it will be easy for notaries and automated tools (
retrievers
) to perform retrieval testing and sampling. All below changes are only applicable if the client checks the checkbox that confirms the dataset is retrievable by anyone, and the change only applies to new LDNs. If the LDN indicates that the dataset should not be retrievable by anyone, then below changes are not applicable.retrievers
to know how to perform retrieval testing.label
is used to storedataCid
- checkbox, default to true. If checked, it expects that all PublishStorageDeal messages have thelabel
field set to therootCid
of the deal CAR so retrievers can just rely on this field to perform retrievallabel
asdataCid
, then please provide a list ofcid
for this dataset or the URL to download such list - depending on the preparation tool, this could be a single RootCID which can be retrieved to get all the CIDs of its links, a list of CIDS that corresponds to the rootCID of each deal CAR file, or a list of CIDS that corresponds to each single file or folder.Final note: this proposal does not suggest any requirement for dataset retrievability, i.e. minimum retrieval success rate, download speed, etc. Nor does it suggest that all dataset has to be retrievable. It merely adds necessary fields in the application so the retrievers are aware of how content can be retrieved if applicable.
Beta Was this translation helpful? Give feedback.
All reactions