Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requestor pausing probably needs a rearchitect, or should be removed entirely #160

Open
hannahhoward opened this issue Mar 9, 2021 · 0 comments
Labels
effort/weeks Estimated to take multiple weeks exp/wizard Extensive knowledge (implications, ramifications) required P2 Medium: Good to have, but can wait until someone steps up

Comments

@hannahhoward
Copy link
Collaborator

hannahhoward commented Mar 9, 2021

This module supports pausing and resuming in a number of ways:

  • A responder can pause by calling PauseResponse in an OutgoingBlockHook
  • A responder can pause by calling PauseResponse directly on the Graphsync instance
  • A responder can resume by calling UnpauseResponse in a RequestUpdatedHook
  • A responder can resume by calling UnpauseResponse directly on the Graphsync Instance
  • A requestor can pause by calling PauseRequest on the Graphsync instance
  • A requestor can unpause by calling UnpauseRequest on the Graphsync instance

Of these, the requestor pause/unpause is by far the most complicated to implement, and produces unpredictable behavior.

Graphsync is designed to operate in an untrusted environment, and as such, responders can't simply accept commands from requestors to pause at any time (I could DDOS a respondering by simply telling them to respond to requests I kept pausing till they held too much memory for all my requests)

I explored a number of ways to implement this, and eventually settled on a requestor dealing with pause/unpause by simply cancelling the request and sending it again with a do-no-send-cids extension.

There are a number of problems with this:

  1. It adds significant complexity to the requestor implementation, as we have to track all CIDS sent.
  2. It leads to unpredictable behavior -- as soon as we pause on the requestor side, we stop recording the CIDs we receive. However, the responder may send more CIDs before it gets the cancel message. This means upon unpause, those CIDs are sent a second time. This leads to On resume graphsync occasionally enqueues the same outgoing block twice for a single file #158.
  3. The PauseRequest/UnpauseRequest do something very different at the protocol level than what they're named. Where the pausing on the responder has a protocol level responder code, pausing in the requestor is not a concept covered by the protocol. The methods implement a Cancel/Restart more than a pause/unpause (but still with the same request ID -- so the calling module doesn't know that we've actually done this inside the module -- oy).
  4. This was all done before go-data-transfer had implemented restarts. Since go-data-transfer implements restarts by tracking CIDs and using do-not-send-cids, this means we're tracking CIDs twice (go-graphsync doesn't track CIDs to disk, and it should combine it's internal set with an external set passed by go-data-transfer correctly, but it's still kind of bizarre behavior)

Pause/Unpause is part of the protocol on the responder side. There is a response code that indicates a response has been paused, and a mechanism for the client to ask the responder to unpause. It makes sense to support pause/unpause on the responder side.

However, I think that pause/unpause for the requestor should not be part of go-graphsync. We should enable primitives to do do this via higher level code:

  • Cancelling requests (already supported by cancelling the context on the call to Request)
  • Sending arbitrary extensions in request updates - i.e.SendExtensionData(RequestID, ...ExtensionData)
  • Possibly add the ability to pause a response as well as unpause a response in a RequestUpdatedHook

This enables a few ways you might implement in go-data-transfer:

  • Implement Pause/Unpause for the requesting side via sending an extension through graphsync and reading it in a request updated hook and pausing there... meaning ultimately the responder pauses/unpauses
  • Implement Pause/Unpause via simply communicating on data transfer protocol and then pausing by calling PauseResponse on the graphsync instance directly.

Note: this implementations may still require a fair amount of complexity, as any pause initiated on the requesting side must account for the responding side sending more data before it receives the pause request.

Alternatively, we can try to develop a pause /unpause request at the protocol level in go-graphsync so that we can at least more clearly define expected behavior.

@hannahhoward hannahhoward added the need/triage Needs initial labeling and prioritization label Mar 9, 2021
@hannahhoward hannahhoward added effort/weeks Estimated to take multiple weeks exp/wizard Extensive knowledge (implications, ramifications) required P2 Medium: Good to have, but can wait until someone steps up and removed need/triage Needs initial labeling and prioritization labels Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/weeks Estimated to take multiple weeks exp/wizard Extensive knowledge (implications, ramifications) required P2 Medium: Good to have, but can wait until someone steps up
Projects
None yet
Development

No branches or pull requests

1 participant