Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Transaction Service robustness by periodically resending messages during negotiation #2137

Closed
philipr-za opened this issue Aug 17, 2020 · 6 comments · Fixed by #2161
Labels
C-proposal Before becoming formal RFCs, new ideas are written up as proposals and discussed in the community

Comments

@philipr-za
Copy link
Contributor

Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). There was an initial hope that we would be able to get the reliability of Store and Forward to be 100%. But, we have realized that due to the unpredictable app shutdowns that occur so often in the mobile apps and the long timeouts and other issues that Tor presents that it is actually not possible to be 100% sure that a given message made it to the counter-party.

We did start to explore a strategy of resending incomplete transaction negotiation messages, but it was during the time where we were experiencing a lot of networking flooding. So we decided to hold back on that strategy to rather try focus on indirect communication reliability. We have come to the conclusion that there are some unsolvable issues with the current communications stack that we just have to live with. We have also solved the bugs with message propogation that resulted in the previous flooding. Based on these conclusions we propose to implement an automatic transaction message resending strategy to improve the robustness of the transaction negotiation process with the limitations of the SAF messaging in mind.

Below we summarize the messages sent between Sender and Receiver during a Transaction negotiation:

Message Sender Receiver
1 Sends SenderPartialTransaction message >
2 < Sends RecipientSignedMessage
Completes the transaction and can now start the broadcast protocol
3 Sends TransactionFinalizedMessage >
Once received the Receiver can now start the broadcast and chain monitoring protocols

So the strategy I propose is that a wallet will respond to a repeated message from the counterparty and when expecting a message a wallet will periodically resend the last message from the negotiation to illicit the response. For example, once Message 1 is sent if the Sender has not received Message 2 it will periodically resend Message 1 in order to illict Message 2 from the counter party. Once the Receiver has received Message 1 they will send Message 2 but until they receive Message 3 they will periodically resend Message 2. The resending will be governed by the following rules:

  1. The party will resend when the wallet starts up if it has been more than 60 minutes since the last time a message was sent.
  2. If the wallet has been open continuously it will resend the message every 60 minutes.
  3. If the transaction is not completed after 3 days it is cancelled.

Currently, the mobile wallet frontends cancel the transaction after 3 days. I propose that this cancellation be done by the backend (with the period being configurable) so that this behaviour is consistent between mobile apps and the desktop clients. @kukabi and @Jasonvdb do you agree with this change?

Informing the counter-party of a manual cancellation has been on the cards for a while but with this new resend strategy it will be even more important in order to reduce unnecessary traffic on the network. So part of this implementation will be to implement a new CancelTransaction message that can be used to inform the counterparty of a manual cancellation and also as a response when a counterparty resend a message for a transaction that is already cancelled.

One last element of this proposal are some rules about how often a response will be expected. This is to reduce the potential for DoS by an attacker. A party will only respond to resent messages once every 5 minutes per transaction. This should be plenty of time when messages are only being resent every 60 minutes.

Thoughts @sdbondi and @mikethetike?

@philipr-za philipr-za added C-proposal Before becoming formal RFCs, new ideas are written up as proposals and discussed in the community Wallet backend labels Aug 17, 2020
@Jasonvdb
Copy link

@kukabi and @Jasonvdb do you agree with this change?

Yeah makes sense to me, as long as the function to manually cancel is still exposed.

@stringhandler
Copy link
Collaborator

I really like this. I think it will help quite a lot

@kukabi
Copy link
Contributor

kukabi commented Aug 19, 2020

Sorry for the late response. I agree with @Jasonvdb.

@kukabi and @Jasonvdb do you agree with this change?

Yeah makes sense to me, as long as the function to manually cancel is still exposed.

@sdbondi
Copy link
Member

sdbondi commented Aug 20, 2020

I agree with the proposed strategy, specifically due to the unpredictable availability of mobile connectivity/background processes, that some robustness has to be built into the transaction protocol.

Without knowing every detail of the implementation, these are the issues that I see with the current protocol that we should try to solve:

  1. If the receiver rejects the transaction (or has some internal error), the sender has absolutely no way to know this. Perhaps in addition to a RecipientSignedMessage there should be a "transaction rejected(reason=internal_error)" counterpart.
  2. When the receiver is "Waiting for the sender to complete the transaction", there is no robustness against a message not arriving or the sender just not sending the message in the first place (but broadcasting it anyway) - I think querying the blockchain for this info after some long timeout is the only choice we have (RPC should help with BN comms). The main thing is that there should be some resolution to the transaction, it should not just wait indefinitely for a fabled message to arrive.
  3. Cancellation causes messages to "get stuck" - I agree with the "Cancelled transaction" curtesy message as proposed. I think this should be the same rejection message i.e "transaction rejected(reason=cancelled)". Any rejection message cannot be enforced as part of the protocol, so the ability to query the status (or just resend the protocol message and get a response) is important - or else, give up if the txn is not finalized, otherwise check if the transaction has been broadcast).

We have come to the conclusion that there are some unsolvable issues with the current communications stack that we just have to live with.

I'm not sure I would have phrased it in this way. Perhaps I should expand on the reasons why "guaranteed delivery" is not implemented. TLDR; it is at best not easy at worst not possible. Nor IMO is it desirable especially for p2p.

Not easy in the sense that to do this generically (for all kinds of messages in all situations) does not scale (some form of ack for every message?), and introduces excessive complexity (per message deliverability parameters, edge cases, support for NACK? etc). Moreover, an ack does not guarantee deliverability anyway but tells you that the peer received the message (something that a request/response protocol would be better at).

Not desirable because:

  1. Aforementioned complexity (bug/attack surface)
  2. Performance tradeoff: bandwidth, more messages (ACK or otherwise) to process, blocked because waiting for an ack
  3. Resource usage: managing 1000s (unbounded) message states and keeping message bodies in memory for resending
  4. Different domain-level messages have different semantics - e.g. (1) message broadcasting doesn't care that each and every peer got the message. (2) A ping message sent directly to a peer does not need to be reliably delivered and so shouldn't pay the price for it.

We are not trying to be more reliable than the underlying transport (TCP)

My take on this is that comms should enable the user to create protocols, and reliability guarantees should be designed and built into a protocol as needed.

@philipr-za
Copy link
Contributor Author

philipr-za commented Aug 20, 2020

I'm not sure I would have phrased it in this way.

Ok I agree that perhaps unsolvable was the wrong term, I meant it in the context of the initial statement:

There was an initial hope that we would be able to get the reliability of Store and Forward to be 100%

I agree with the reasons it is not feasible/desirable to achieve a 100% reliable SAF protocol.

The SAF transport is quite different to a direct TCP transport though so should just keep that in mind when building domain-level protocols using these different transports, which we are using in conjunction in many places in the domain layers.

I think querying the blockchain for this info after some long timeout is the only choice we have

Currently we cancel after 3 days but you are right that the receiver could at that stage do one final blockchain check to see if their output somehow ended up there without them receiving the Finalized message before cancellation.

@sdbondi
Copy link
Member

sdbondi commented Aug 20, 2020

💯 Looking forward to seeing these improvements! 🚀

philipr-za added a commit to philipr-za/tari that referenced this issue Aug 25, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
philipr-za added a commit to philipr-za/tari that referenced this issue Aug 26, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
philipr-za added a commit to philipr-za/tari that referenced this issue Aug 27, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
philipr-za added a commit to philipr-za/tari that referenced this issue Aug 31, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
@stringhandler stringhandler added this to the Base node v0.6 milestone Sep 7, 2020
philipr-za added a commit to philipr-za/tari that referenced this issue Sep 10, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
philipr-za added a commit to philipr-za/tari that referenced this issue Sep 15, 2020
Currently the logic for sending transactions and negotiating them between parties is not very robust. Each stage in the negotiation is sent only once. When sending we first try to send the message directly and if that fails, or discovery needs to occur, the message is sent indirectly via Store and Forward (SAF). We hoped to get SAF 100% reliable but due to reasons laid out in tari-project#2137 it is not tenable. 

This PR adds in the logic to periodically resends the initial Transaction Sender Message and the Transaction Reply in order to illicit the next step in the negotiation protocol from the counterparty. The PR also adds the logic to respond to repeated messages but with a cool down period in which repeats will be ignored to protect against DoS attacks. 

The PR also adds in a new comms level message to inform the counterparty if a transaction protocol has been cancelled. Initially this is sent if the Sender cancels an in progress transaction so that the receiver knows to stop expecting it. This message is also sent if a Transaction Reply is received for a cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction will be cancelled if it hasn’t resolved after a long timeout period (Default is being set to 3 days).

A large portion of the PR are tests so its not as big as it looks.
stringhandler added a commit that referenced this issue Sep 15, 2020
…he wallet

Currently the logic for sending transactions and negotiating them between
parties is not very robust. Each stage in the negotiation is sent only once.
When sending we first try to send the message directly and if that fails, or
discovery needs to occur, the message is sent indirectly via Store and Forward
(SAF). We hoped to get SAF 100% reliable but due to reasons laid out in #2137
it is not tenable.

This PR adds in the logic to periodically resends the initial Transaction
Sender Message and the Transaction Reply in order to illicit the next step in
the negotiation protocol from the counterparty. The PR also adds the logic to
respond to repeated messages but with a cool down period in which repeats will
be ignored to protect against DoS attacks.

The PR also adds in a new comms level message to inform the counterparty if a
transaction protocol has been cancelled. Initially this is sent if the Sender
cancels an in progress transaction so that the receiver knows to stop expecting
it. This message is also sent if a Transaction Reply is received for a
cancelled message to let the Received know about the cancellation.

In order to stop a wallet resending these message indefinitely a transaction
will be cancelled if it hasn’t resolved after a long timeout period (Default is
being set to 3 days).

A large portion of the PR is tests and some code reorganization so its not as
big as it looks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-proposal Before becoming formal RFCs, new ideas are written up as proposals and discussed in the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants