Improve Transaction Service robustness by periodically resending messages during negotiation #2137

philipr-za · 2020-08-17T11:16:16Z

Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). There was an initial hope that we would be able to get the reliability of Store and Forward to be 100%. But, we have realized that due to the unpredictable app shutdowns that occur so often in the mobile apps and the long timeouts and other issues that Tor presents that it is actually not possible to be 100% sure that a given message made it to the counter-party.

We did start to explore a strategy of resending incomplete transaction negotiation messages, but it was during the time where we were experiencing a lot of networking flooding. So we decided to hold back on that strategy to rather try focus on indirect communication reliability. We have come to the conclusion that there are some unsolvable issues with the current communications stack that we just have to live with. We have also solved the bugs with message propogation that resulted in the previous flooding. Based on these conclusions we propose to implement an automatic transaction message resending strategy to improve the robustness of the transaction negotiation process with the limitations of the SAF messaging in mind.

Below we summarize the messages sent between Sender and Receiver during a Transaction negotiation:

Message	Sender		Receiver
1	Sends `SenderPartialTransaction` message	`>`
2		`<`	Sends `RecipientSignedMessage`
	Completes the transaction and can now start the broadcast protocol
3	Sends `TransactionFinalizedMessage`	`>`
			Once received the Receiver can now start the broadcast and chain monitoring protocols

So the strategy I propose is that a wallet will respond to a repeated message from the counterparty and when expecting a message a wallet will periodically resend the last message from the negotiation to illicit the response. For example, once Message 1 is sent if the Sender has not received Message 2 it will periodically resend Message 1 in order to illict Message 2 from the counter party. Once the Receiver has received Message 1 they will send Message 2 but until they receive Message 3 they will periodically resend Message 2. The resending will be governed by the following rules:

The party will resend when the wallet starts up if it has been more than 60 minutes since the last time a message was sent.
If the wallet has been open continuously it will resend the message every 60 minutes.
If the transaction is not completed after 3 days it is cancelled.

Currently, the mobile wallet frontends cancel the transaction after 3 days. I propose that this cancellation be done by the backend (with the period being configurable) so that this behaviour is consistent between mobile apps and the desktop clients. @kukabi and @Jasonvdb do you agree with this change?

Informing the counter-party of a manual cancellation has been on the cards for a while but with this new resend strategy it will be even more important in order to reduce unnecessary traffic on the network. So part of this implementation will be to implement a new CancelTransaction message that can be used to inform the counterparty of a manual cancellation and also as a response when a counterparty resend a message for a transaction that is already cancelled.

One last element of this proposal are some rules about how often a response will be expected. This is to reduce the potential for DoS by an attacker. A party will only respond to resent messages once every 5 minutes per transaction. This should be plenty of time when messages are only being resent every 60 minutes.

Thoughts @sdbondi and @mikethetike?

The text was updated successfully, but these errors were encountered:

Jasonvdb · 2020-08-17T11:20:05Z

@kukabi and @Jasonvdb do you agree with this change?

Yeah makes sense to me, as long as the function to manually cancel is still exposed.

stringhandler · 2020-08-18T07:46:42Z

I really like this. I think it will help quite a lot

kukabi · 2020-08-19T11:53:23Z

Sorry for the late response. I agree with @Jasonvdb.

@kukabi and @Jasonvdb do you agree with this change?

Yeah makes sense to me, as long as the function to manually cancel is still exposed.

sdbondi · 2020-08-20T09:21:31Z

I agree with the proposed strategy, specifically due to the unpredictable availability of mobile connectivity/background processes, that some robustness has to be built into the transaction protocol.

Without knowing every detail of the implementation, these are the issues that I see with the current protocol that we should try to solve:

If the receiver rejects the transaction (or has some internal error), the sender has absolutely no way to know this. Perhaps in addition to a RecipientSignedMessage there should be a "transaction rejected(reason=internal_error)" counterpart.
When the receiver is "Waiting for the sender to complete the transaction", there is no robustness against a message not arriving or the sender just not sending the message in the first place (but broadcasting it anyway) - I think querying the blockchain for this info after some long timeout is the only choice we have (RPC should help with BN comms). The main thing is that there should be some resolution to the transaction, it should not just wait indefinitely for a fabled message to arrive.
Cancellation causes messages to "get stuck" - I agree with the "Cancelled transaction" curtesy message as proposed. I think this should be the same rejection message i.e "transaction rejected(reason=cancelled)". Any rejection message cannot be enforced as part of the protocol, so the ability to query the status (or just resend the protocol message and get a response) is important - or else, give up if the txn is not finalized, otherwise check if the transaction has been broadcast).

We have come to the conclusion that there are some unsolvable issues with the current communications stack that we just have to live with.

I'm not sure I would have phrased it in this way. Perhaps I should expand on the reasons why "guaranteed delivery" is not implemented. TLDR; it is at best not easy at worst not possible. Nor IMO is it desirable especially for p2p.

Not easy in the sense that to do this generically (for all kinds of messages in all situations) does not scale (some form of ack for every message?), and introduces excessive complexity (per message deliverability parameters, edge cases, support for NACK? etc). Moreover, an ack does not guarantee deliverability anyway but tells you that the peer received the message (something that a request/response protocol would be better at).

Not desirable because:

Aforementioned complexity (bug/attack surface)
Performance tradeoff: bandwidth, more messages (ACK or otherwise) to process, blocked because waiting for an ack
Resource usage: managing 1000s (unbounded) message states and keeping message bodies in memory for resending
Different domain-level messages have different semantics - e.g. (1) message broadcasting doesn't care that each and every peer got the message. (2) A ping message sent directly to a peer does not need to be reliably delivered and so shouldn't pay the price for it.

We are not trying to be more reliable than the underlying transport (TCP)

My take on this is that comms should enable the user to create protocols, and reliability guarantees should be designed and built into a protocol as needed.

philipr-za · 2020-08-20T09:37:07Z

I'm not sure I would have phrased it in this way.

Ok I agree that perhaps unsolvable was the wrong term, I meant it in the context of the initial statement:

There was an initial hope that we would be able to get the reliability of Store and Forward to be 100%

I agree with the reasons it is not feasible/desirable to achieve a 100% reliable SAF protocol.

The SAF transport is quite different to a direct TCP transport though so should just keep that in mind when building domain-level protocols using these different transports, which we are using in conjunction in many places in the domain layers.

I think querying the blockchain for this info after some long timeout is the only choice we have

Currently we cancel after 3 days but you are right that the receiver could at that stage do one final blockchain check to see if their output somehow ended up there without them receiving the Finalized message before cancellation.

sdbondi · 2020-08-20T09:44:39Z

💯 Looking forward to seeing these improvements! 🚀

Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation. In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days). A large portion of the PR are tests so its not as big as it looks.

…he wallet Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in #2137 it is not tenable. This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation. In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days). A large portion of the PR is tests and some code reorganization so its not as big as it looks.

philipr-za added C-proposal Before becoming formal RFCs, new ideas are written up as proposals and discussed in the community Wallet backend labels Aug 17, 2020

philipr-za mentioned this issue Aug 24, 2020

Check if a pending inbound transaction Output is in the chain before cancelling #2159

Closed

philipr-za mentioned this issue Aug 25, 2020

Implement transaction negotiation message resending in the wallet #2161

Merged

12 tasks

stringhandler added this to the Base node v0.6 milestone Sep 7, 2020

stringhandler closed this as completed in #2161 Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Transaction Service robustness by periodically resending messages during negotiation #2137

Improve Transaction Service robustness by periodically resending messages during negotiation #2137

philipr-za commented Aug 17, 2020

Jasonvdb commented Aug 17, 2020

stringhandler commented Aug 18, 2020

kukabi commented Aug 19, 2020

sdbondi commented Aug 20, 2020 •

edited

Loading

philipr-za commented Aug 20, 2020 •

edited

Loading

sdbondi commented Aug 20, 2020

Improve Transaction Service robustness by periodically resending messages during negotiation #2137

Improve Transaction Service robustness by periodically resending messages during negotiation #2137

Comments

philipr-za commented Aug 17, 2020

Jasonvdb commented Aug 17, 2020

stringhandler commented Aug 18, 2020

kukabi commented Aug 19, 2020

sdbondi commented Aug 20, 2020 • edited Loading

philipr-za commented Aug 20, 2020 • edited Loading

sdbondi commented Aug 20, 2020

sdbondi commented Aug 20, 2020 •

edited

Loading

philipr-za commented Aug 20, 2020 •

edited

Loading