You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the primary manager class for go-data-transfer communicates data in two different ways:
The Transport interface
The data transfer libp2p protocol
This was based off the original transport for go-data-transfer being GraphSync, and basically anything GraphSync can do, data transfer used Transport for while anything graphsync can't do was sent through the libp2p protocol.
This means the transport interface is very GraphSync specific and incomplete. What we would like is for data transfer's main manager to know nothing about libp2p and networks. It should be an abstract state manager that simply tracks how transfers progress and pauses them at appropriate times. Anything that goes over the wire should go through the transport interface.
That means since GraphSync can't send every message related to data transfer, it should use the go-data-transfers libp2p transport where needed -- but inside the implementation of the GraphSync transport.
Multiple Transports and the libp2p protocol
One potential issue arises if multiple transports intend to use the data transfer libp2p protocol: which transport handles an incoming message on the protocol?
There are two ways to address this:
Put data transport information in the message format
Each transport gets its own sub-protocol (i.e. "/data-transfer/1.3.0/bitswap")
My current thinking is to use the second. It provides a simply way to do transport negotiation (if you respond to the transport, you support the protocol) and I prefer not to put transport information in the message format (given part of data transfer's "special sauce" is making the message format agnostic)
Supporting different capabilities in transports
Should go-data-transfer require a transport to do every possible thing go-data-transfer likes to do? Or allow a transport to support only basic actions with other operations -- restarting, pausing/resuming, sending additional messages directly, setting data limits, etc -- being optional?
If that's the case, how do we negotiate a voucher format / exchange protocol and a transport? Like if I need to do paid filecoin retrieval, my transport had better support data limits, pause, resume. If I'm just doing free retrieval or storage, probably just restarts will suffice.
The proposed interface below takes a shot at this but I'm not sure it's complete
Removing error returns on event handlers
Error returns for event handlers do one of two things:
For OnRequestReceived, they can close the channel if present or pause/resume it if you return datatransfer.ErrPause / ErrResume
For OnResponseReceived, they can close the channel if present
For everything else, the transport is supposed to "log the error"
I think this is a bit silly. RequestReceived and ResponseReceived should have specific actions you can take, while for all other returns, if an error occurs, data transfer itself should log it.
Cleaning up the transport sequence for events on a transport
One ongoing complaint about data transfer is not knowing where the remote is with accepting your request. The new transport lays out a clear chain of events:
Initiator:
calls OpenChannel
OnChannelOpened event when request actually sent to the responder (since GS now queues outgoing request)
OnResponseReceived when responder accepts
OnTransferInitiated when data starts sending/receiving
New States =
Requested -> Opened -> Accepted -> Ongoing
Responder
OnRequestReceived when incoming request received
OnTransferInitiated when data starts sending/receiving
New Staters =
Opened -> Accepted -> Ongoing
What about OnTransferQueued?
This event really exists cause of a weakness in Graphsync: validation is only performed at the point transfer starts sending/receiving. This is something Graphsync needs to fix. We should validate requests immediately on receipt, then queue them if they are accepted, rather than throwing them in the queue even though they might get rejected. I propose we fix this.
Is OnTransferInitiated neccesary?
Arguably, OnTransferInitiated can be determined from the Queued/Send/Received goes over zero. I'm not sure how I feel about that.
Data Limits To The Transport
This proposal suggests moving data limit handling into the transport. That doesn't imply the protocol needs to handle it directly: only that go-data-transfer itself doesn't handle knowing when a data limit is hit. Moreover, as a follow on to #308 I think we should make DataLimits getting hit a seperate set of events from Pause/Resume -- they really are different states -- one is I'm stuck till I get more data, the other is one side or the other decided to pause.
Unique and data progress to the transport
Similar to data limits, I think we should have the transport figure out when progress is actually made, rather than dispatch events about data sending/receiving that aren't actually new progress.
Proposed Interface
The proposed new transport interface is as follows:
// EventsHandler are semantic data transfer events that happen as a result of graphsync hookstypeEventsHandlerinterface {
// OnChannelOpened is called at the point the transport begins processing the// request (prior to that it may simply be queued) -- only applies to initiatorOnChannelOpened(chidChannelID)
// OnTransferInitiated is called at the point the transport actually begins sending/receiving dataOnTransferInitiated(chidChannelID)
// OnResponseReceived is called when we receive a response to a request// Parameters:// - chid - channel id of the channel that received the response// - msg - response received// - actions - ChannelActions are actions that can be taken on the channelOnResponseReceived(chidChannelID, msgResponse, actionsRequestActions)
// OnRequestReceived is called when we receive a new request to send data// for the given channel ID// Parameters:// - chid - channel id of the channel that received the response// - msg - request received// - actions - ChannelActions are actions that can be taken on the channelOnRequestReceived(chidChannelID, msgRequest, actionsResponseActions)
// OnTransferCompleted is called when we finish transferring data for the given channel IDOnTransferCompleted(chidChannelID, errerror)
// OnRequestCancelled is called when a request we opened (with the given channel Id) to// receive data is cancelled by us.OnTransferCancelled(chidChannelID, errerror)
// OnMessageSendError is called when a network error occurs while sending a messageOnMessageSendError(chidChannelID, errerror)
// OnSendDataError is called when a network error occurs sending data// at the transport layerOnSendDataError(chidChannelID, errerror)
// OnReceiveDataError is called when a network error occurs receiving data// at the transport layerOnReceiveDataError(chidChannelID, errerror)
// OnDataLimitReached is called when a channel hits a previously set data limitOnDataLimitReached(chidChannelID, errerror)
// OnContextAugment allows the transport to attach data transfer tracing information// to its local context, in order to create a hierarchical traceOnContextAugment(chidChannelID) func(context.Context) context.Context// OnDataQueued is called when data is queued for sending for the given channel IDOnDataQueued(chidChannelID, link ipld.Link, sizeuint64, indexint64)
// OnDataReceived is called when we receive data for the given channel IDOnDataReceived(chidChannelID, link ipld.Link, sizeuint64, indexint64)
// OnDataSent is called when we send data for the given channel IDOnDataSent(chidChannelID, link ipld.Link, sizeuint64, indexint64)
}
/*Transport defines the interface for a transport layer for datatransfer. Where the data transfer manager will coordinate setting up push andpull requests, persistence, validation, etc, the transport layer is responsible for movingdata back and forth, and may be medium specific. For example, some transportsmay have the ability to pause and resume requests, while others may not.Some may dispatch data update events, while others may only support messageevents. Some transport layers may opt to use the actual data transfer networkprotocols directly while others may be able to encode messages in their owndata protocol.Transport is the minimum interface that must be satisfied to serve as a datatransfertransport layer. Transports must be able to open (open is always called by the receiving peer)and close channels, and set at an event handler. Beyond that, additional actions you can takewith a transport is entirely based on the ChannelActions interface, which may have differenttraits exposing different actions. What capabilities are present in ChannelActions is determinedby the Capabilities method */typeTransportinterface {
Capabilities() TransportCapabilities// OpenChannel opens a channel on a given transport to move data back and forthOpenChannel(
ctx context.Context,
channelChannel,
msgMessage,
) errorWithChannel(ctx context.Context, chidChannelID, actionsfunc(ChannelActions) error) error// SetEventHandler sets the handler for events on channelsSetEventsHandler(eventsEventsHandler) error// CleanupChannel removes any associated data on a closed channelCleanupChannel(chidChannelID)
}
// TransportCapabilities describes additional capabilities supported by ChannelActionstypeTransportCapabilitiesstruct {
// CanSendMessages indicates the channel can send messages in ResponseActions and ChannelActionsCanSendMessagesbool// CanRestart indicates ChannelActions will support RestartActionsCanRestartbool// CanPauseResume indicates all actions interfaces support PauseActionsCanPauseResumebool// CanLimitTransfer indications all actions will support LimitActionsCanLimitTransferbool
}
// ChannelActions are default actions that can be taken on a channeltypeChannelActionsinterface {
CloseActions
}
// RequestActions are default actions that can be taken on a channeltypeRequestActionsinterface {
CloseActionsSendMessageActions
}
// ResponseActions are default actions that can be taken on a channeltypeResponseActionsinterface {
CloseActions
}
// CloseActions is a trait that allows closing a channeltypeCloseActionsinterface {
// CloseChannel close this channel and effectively closes it to further// actionCloseChannel() error
}
// SendMessageActions is a trait that allows sending messages// directlytypeSendMessageActionsinterface {
// SendMessage sends an arbitrary message over a transportSendMessage(messageMessage) error
}
// RestartActions is a trait that allows restarting a channeltypeRestartActionsinterface {
// RestartChannel restarts a channelRestartChannel(channelChannelState)
}
// PauseActions a trait that allows pausing and resuming a channeltypePauseActionsinterface {
// PauseChannel paused the given channel IDPauseChannel() error// ResumeChannel resumes the given channelResumeChannel() error
}
// LimitActions is a trait that allows send limits on how much data can transfer on a channeltypeLimitActionsinterface {
// SetDataLimit tells a transport to pause a channel once the given amount of data is transferredSetDataLimit(uint64) error
}
Feel free to reach out to @aschmahmann for feedback as where we are headed
Note one additional change is I have removed the "unique" boolean on the progress events. A non-unique progress event should simply pass a block size of zero. (and let data transfer's logic determine no progress has occurred)
I also changed the name of OnChannelCompleted to OnTransferCompleted to distinguish that sometimes after the transfer itself is finished there is still more work to do.
The text was updated successfully, but these errors were encountered:
Goals
The transport interface has several issues:
Use of Libp2p Protocol vs transport
Currently, the primary manager class for go-data-transfer communicates data in two different ways:
This was based off the original transport for go-data-transfer being GraphSync, and basically anything GraphSync can do, data transfer used Transport for while anything graphsync can't do was sent through the libp2p protocol.
This means the transport interface is very GraphSync specific and incomplete. What we would like is for data transfer's main manager to know nothing about libp2p and networks. It should be an abstract state manager that simply tracks how transfers progress and pauses them at appropriate times. Anything that goes over the wire should go through the transport interface.
That means since GraphSync can't send every message related to data transfer, it should use the go-data-transfers libp2p transport where needed -- but inside the implementation of the GraphSync transport.
Multiple Transports and the libp2p protocol
One potential issue arises if multiple transports intend to use the data transfer libp2p protocol: which transport handles an incoming message on the protocol?
There are two ways to address this:
My current thinking is to use the second. It provides a simply way to do transport negotiation (if you respond to the transport, you support the protocol) and I prefer not to put transport information in the message format (given part of data transfer's "special sauce" is making the message format agnostic)
Supporting different capabilities in transports
Should go-data-transfer require a transport to do every possible thing go-data-transfer likes to do? Or allow a transport to support only basic actions with other operations -- restarting, pausing/resuming, sending additional messages directly, setting data limits, etc -- being optional?
If that's the case, how do we negotiate a voucher format / exchange protocol and a transport? Like if I need to do paid filecoin retrieval, my transport had better support data limits, pause, resume. If I'm just doing free retrieval or storage, probably just restarts will suffice.
The proposed interface below takes a shot at this but I'm not sure it's complete
Removing error returns on event handlers
Error returns for event handlers do one of two things:
I think this is a bit silly. RequestReceived and ResponseReceived should have specific actions you can take, while for all other returns, if an error occurs, data transfer itself should log it.
Cleaning up the transport sequence for events on a transport
One ongoing complaint about data transfer is not knowing where the remote is with accepting your request. The new transport lays out a clear chain of events:
Initiator:
New States =
Requested -> Opened -> Accepted -> Ongoing
Responder
New Staters =
Opened -> Accepted -> Ongoing
What about OnTransferQueued?
This event really exists cause of a weakness in Graphsync: validation is only performed at the point transfer starts sending/receiving. This is something Graphsync needs to fix. We should validate requests immediately on receipt, then queue them if they are accepted, rather than throwing them in the queue even though they might get rejected. I propose we fix this.
Is OnTransferInitiated neccesary?
Arguably, OnTransferInitiated can be determined from the Queued/Send/Received goes over zero. I'm not sure how I feel about that.
Data Limits To The Transport
This proposal suggests moving data limit handling into the transport. That doesn't imply the protocol needs to handle it directly: only that go-data-transfer itself doesn't handle knowing when a data limit is hit. Moreover, as a follow on to #308 I think we should make DataLimits getting hit a seperate set of events from Pause/Resume -- they really are different states -- one is I'm stuck till I get more data, the other is one side or the other decided to pause.
Unique and data progress to the transport
Similar to data limits, I think we should have the transport figure out when progress is actually made, rather than dispatch events about data sending/receiving that aren't actually new progress.
Proposed Interface
The proposed new transport interface is as follows:
Feel free to reach out to @aschmahmann for feedback as where we are headed
Note one additional change is I have removed the "unique" boolean on the progress events. A non-unique progress event should simply pass a block size of zero. (and let data transfer's logic determine no progress has occurred)
I also changed the name of OnChannelCompleted to OnTransferCompleted to distinguish that sometimes after the transfer itself is finished there is still more work to do.
The text was updated successfully, but these errors were encountered: