Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit test for lost CRMP ack messages #7291

Merged
merged 2 commits into from
Jun 3, 2021

Conversation

pan-apple
Copy link
Contributor

Problem

Need a test where the ack to a received message is dropped. It's expected that message reliability protocol will kick in, and do retransmits until it gets an ack from the receiver (or the retransmit count is exhausted).

Change overview

This PR adds a test where 2 ack messages are dropped. It's expected that the 2nd retry (1 original + 2 retry) will cause the ack to be received by the sender.

However, this test uncovered an issue. The test is currently failing with the following logs.

[1622585846951] [0xe1277] CHIP: [IN] Secure message was encrypted: Msg ID 4
[1622585846951] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271140 msec
[1622585846951] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585846951] [0xe1277] CHIP: [IN] Secure transport received message destined to fabric 1, node 0x0000000006A11E3D. Key ID 2
[1622585846951] [0xe1277] CHIP: [EM] Received message of type 1 and protocolId 65536 on exchange 44255
[1622585846951] [0xe1277] CHIP: [EM] ec id: 44255, Delegate: 0xea022210
[1622585846951] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847019] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271209 msec
[1622585847019] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585847019] [0xe1277] CHIP: [IN] Message counter verify failed, err = 4047
[1622585847019] [0xe1277] CHIP: [EM] Error receiving message from UDP:127.0.0.1:11097: Error 4047 (0x00000FCF)
[1622585847019] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847019] [0xe1277] CHIP: [EM] Retransmit MsgId:00000004 Send Cnt 1
[1622585847085] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271274 msec
[1622585847085] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585847085] [0xe1277] CHIP: [IN] Message counter verify failed, err = 4047
[1622585847085] [0xe1277] CHIP: [EM] Error receiving message from UDP:127.0.0.1:11097: Error 4047 (0x00000FCF)
[1622585847085] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847085] [0xe1277] CHIP: [EM] Retransmit MsgId:00000004 Send Cnt 2
../src/messaging/tests/TestReliableMessageProtocol.cpp:559: assertion failed: "rm->TestGetCountRetransTable() == 0"

The issue seems to be related to message counter check.

            if (mSynced.mWindow.test(offset))
            {
                return CHIP_ERROR_INVALID_ARGUMENT; // duplicated, in window
            }

The code above return failure for the retransmitted message, as it considers it as a duplicate message.

Testing

CheckResendApplicationMessageWithLostAcks unit test is supposed to test this condition.

@@ -95,6 +95,10 @@ class MockAppDelegate : public ExchangeDelegate
System::PacketBufferHandle && buffer) override
{
IsOnMessageReceivedCalled = true;
if (mDropAckResponse)
{
ec->GetReliableMessageContext()->SetAckPending(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this is enough. ReliableMessageContext has three functions keeping track of acks:

  1. HasPeerRequestedAck()
  2. IsAckPending()
  3. GetPendingPeerAckId()

For standalone acks, it looks like HasAckPending() is used. For piggybacked acks, HasPeerRequestedAck() is checked. I guess in this case there is no actual response so the piggybacking does not come in? But it's still not clear to me why HasPeerRequestedAck() even exists and why its use in ExchangeMessageDispatch::SendMessage is correct (instead of testing IsAckPending() there).

In particular, once HasPeerRequestedAck() starts testing true it never goes back to false afaict... That seems pretty dubious.

If none of that has to do with this PR, please just let me know and I'll file a followup.

Copy link
Contributor Author

@pan-apple pan-apple Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a follow up will be useful for HasPeerRequestedAck. This test is relying on standalone ack.

My thinking is that, if it's a piggyback ack, the sender of ack will probably request an ack as well (for the message on which ack is being piggybacked). So, if the ack is lost, so is the message on which it was piggybacked. And, that might trigger a retransmit from the ack sender. The overall test case might be trickier, but still doable. Need more analysis if the missing piggybacked ack itself can cause this kind of a deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #7339 on my comments above.

@woody-apple
Copy link
Contributor

@todo
Copy link

todo bot commented Jun 2, 2021

- Enable test for lost CRMP ack messages

// TODO - Enable test for lost CRMP ack messages
// The following check is commented out because of https://github.com/project-chip/connectedhomeip/issues/7292
// NL_TEST_ASSERT(inSuite, rm->TestGetCountRetransTable() == 0);
NL_TEST_ASSERT(inSuite, mockReceiver.IsOnMessageReceivedCalled);
mockReceiver.mTestSuite = nullptr;
err = ctx.GetExchangeManager().UnregisterUnsolicitedMessageHandlerForType(Echo::MsgType::EchoRequest);
NL_TEST_ASSERT(inSuite, err == CHIP_NO_ERROR);
rm->ClearRetransTable(rc);


This comment was generated by todo based on a TODO comment in 0adb96d in #7291. cc @pan-apple.

@pan-apple
Copy link
Contributor Author

rebased

@woody-apple
Copy link
Contributor

@andy31415 andy31415 merged commit 575f99d into project-chip:master Jun 3, 2021
@pan-apple pan-apple deleted the ack-drop branch June 3, 2021 16:30
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
* Unit test for lost CRMP ack messages

* disable the check in the test for time being
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants