Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Retrieval By Any CID, Not Just Root #166

Merged
merged 3 commits into from
Mar 24, 2020

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Mar 19, 2020

Goals

Provide the option for storage providers to maintain references to all CIDs in a payload and their corresponding sectors, so that the provider can serve retrievals for any CID in the payload, not just root.

Implementation

  • Add configuration options to the provider constructor for optional configurations
  • Add an option to "Enable Universal Retrieval" - this means allowing retrieval deals that do not just start at the root of the payload, but rather any CID
  • Plumb the CAR file writers configurable hooks up through PieceIO
  • Write a block recorder hook that uses the data from the CAR writer to record block locations, and write that to a temporary piece metadata file
  • Add a function to record this metadata while generating CommP
  • Modify provider code for commP generation to record metadata based on whether universal retrieval is enabled
  • Transfer metadata if present to the PieceStore
  • Generally improve deletion of temp files in the filestore -- there were several getting left around
  • Also cleanup failure code to include temp file cleanup
  • Add a TestFileStore to support filestore testing
  • Refactor ProviderDealEnvironment to have less functions, provide direct access to filestore & piecestore

For Discussion

  • Why record a second temp file? Well, the data needs to be persisted, and it seemed like the best way to keep it around while not consuming resources on deal steps it's not involved in. I though about putting the metadata into the deal itself -- but this mean it'd get loaded off disk on every deal step, which felt less than idea
  • There's a lot of complicated interface plumbing here, and I wonder if there's a better way to do all this down the line, but for now, this is as straightforward as I could make it.

fixes #61

@rvagg
Copy link
Member

rvagg commented Mar 23, 2020

Any security-related implications worth considering here? I'm thinking mainly of DoS style attacks where I make deals with you to store my CAR files where the blocks are small enough to just make unique CIDs, so your metadata storage takes up nearly as much space as the data itself. Would that be a reasonable attack scenario against a miner (what are the incentives for this?) and how would miners react against such attacks--would it mean that they would (could?) turn off indexing and we'd end up with fewer indexing miners for retrieval?
I'm sure the economics of such an "attack" are difficult to model but I reckon there are others that have more of a clue than I about it to think it through.

@hannahhoward
Copy link
Collaborator Author

@rvagg this PR is as much technical proof-of-concept as working system to be frank. It's something neither implementation (lotus or gfc) will likely turn on by default. There are actually many many product features to work out before it becomes widespread and something we make easy to enable. For example, how do I know as a client which miners support indexing? Especially if I want to make a deal with one that does. How do I know my miner is actually gonna build the index (I don't). Part of the system design is to intentionally separate storage and retrieval markets, so we need to figure out how this index is created and maintained, and by whom. Anyway, this is all to say, we needed to show it could be done, but making the feature usable at a product level is still a separate chunk of work. (in fairness, this is not spelled out in the PR description, and we've only been reminded of this as we began to socialize this new feature outside this team). Anyway though, rather than try to get it all right, we'd like to just implement the feature as is as a base for ongoing discussions.

Copy link
Contributor

@ingar ingar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! One question: if a Provider turns on universal retrieval at some point in the future, do we just error out if we don't have the metadata for their retrieval?

@rvagg
Copy link
Member

rvagg commented Mar 23, 2020

@hannahhoward sorry, not intending to indicate I have a blocking objection, just interested in the general discussion and 👍 to the path forward of just getting something implemented as part of the path forward.

add functions to hook into car writing and output block information
Provide an option to record locations of all CIDs in a piece, so that we can support retrieval not
only of root cid but all cids inside of a piece

Also cleanup file store maintainence so temporary files are always
deleted

Also build better filestore and piecestore test infrastructure
@hannahhoward hannahhoward force-pushed the feat/universal-retrieval branch from 2e01f97 to 1d267a4 Compare March 24, 2020 17:45
@hannahhoward
Copy link
Collaborator Author

@rvagg no it's totally valid, we've realized during discussions that there are a lot of product questions surrounding this feature, and that we need to escalate those concerns to the wider team :)

@hannahhoward hannahhoward merged commit 5824df7 into master Mar 24, 2020
@hannahhoward hannahhoward deleted the feat/universal-retrieval branch April 30, 2020 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filecoin user can retrieve based on any CID, not just root
3 participants