Unit test for lost CRMP ack messages #7291

pan-apple · 2021-06-01T22:24:25Z

Problem

Need a test where the ack to a received message is dropped. It's expected that message reliability protocol will kick in, and do retransmits until it gets an ack from the receiver (or the retransmit count is exhausted).

Change overview

This PR adds a test where 2 ack messages are dropped. It's expected that the 2nd retry (1 original + 2 retry) will cause the ack to be received by the sender.

However, this test uncovered an issue. The test is currently failing with the following logs.

[1622585846951] [0xe1277] CHIP: [IN] Secure message was encrypted: Msg ID 4
[1622585846951] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271140 msec
[1622585846951] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585846951] [0xe1277] CHIP: [IN] Secure transport received message destined to fabric 1, node 0x0000000006A11E3D. Key ID 2
[1622585846951] [0xe1277] CHIP: [EM] Received message of type 1 and protocolId 65536 on exchange 44255
[1622585846951] [0xe1277] CHIP: [EM] ec id: 44255, Delegate: 0xea022210
[1622585846951] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847019] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271209 msec
[1622585847019] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585847019] [0xe1277] CHIP: [IN] Message counter verify failed, err = 4047
[1622585847019] [0xe1277] CHIP: [EM] Error receiving message from UDP:127.0.0.1:11097: Error 4047 (0x00000FCF)
[1622585847019] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847019] [0xe1277] CHIP: [EM] Retransmit MsgId:00000004 Send Cnt 1
[1622585847085] [0xe1277] CHIP: [IN] Sending msg from 0x000000000001E306 to 0x0000000006A11E3D at utc time: 72271274 msec
[1622585847085] [0xe1277] CHIP: [IN] Sending secure msg on generic transport
[1622585847085] [0xe1277] CHIP: [IN] Message counter verify failed, err = 4047
[1622585847085] [0xe1277] CHIP: [EM] Error receiving message from UDP:127.0.0.1:11097: Error 4047 (0x00000FCF)
[1622585847085] [0xe1277] CHIP: [IN] Secure msg send status No Error
[1622585847085] [0xe1277] CHIP: [EM] Retransmit MsgId:00000004 Send Cnt 2
../src/messaging/tests/TestReliableMessageProtocol.cpp:559: assertion failed: "rm->TestGetCountRetransTable() == 0"

The issue seems to be related to message counter check.

            if (mSynced.mWindow.test(offset))
            {
                return CHIP_ERROR_INVALID_ARGUMENT; // duplicated, in window
            }

The code above return failure for the retransmitted message, as it considers it as a duplicate message.

Testing

CheckResendApplicationMessageWithLostAcks unit test is supposed to test this condition.

bzbarsky-apple · 2021-06-01T23:08:20Z

src/messaging/tests/TestReliableMessageProtocol.cpp

@@ -95,6 +95,10 @@ class MockAppDelegate : public ExchangeDelegate
                           System::PacketBufferHandle && buffer) override
    {
        IsOnMessageReceivedCalled = true;
+        if (mDropAckResponse)
+        {
+            ec->GetReliableMessageContext()->SetAckPending(false);


It's not clear to me why this is enough. ReliableMessageContext has three functions keeping track of acks:

HasPeerRequestedAck()

IsAckPending()

GetPendingPeerAckId()

For standalone acks, it looks like HasAckPending() is used. For piggybacked acks, HasPeerRequestedAck() is checked. I guess in this case there is no actual response so the piggybacking does not come in? But it's still not clear to me why HasPeerRequestedAck() even exists and why its use in ExchangeMessageDispatch::SendMessage is correct (instead of testing IsAckPending() there).

In particular, once HasPeerRequestedAck() starts testing true it never goes back to false afaict... That seems pretty dubious.

If none of that has to do with this PR, please just let me know and I'll file a followup.

I think a follow up will be useful for HasPeerRequestedAck. This test is relying on standalone ack.

My thinking is that, if it's a piggyback ack, the sender of ack will probably request an ack as well (for the message on which ack is being piggybacked). So, if the ack is lost, so is the message on which it was piggybacked. And, that might trigger a retransmit from the ack sender. The overall test case might be trickier, but still doable. Need more analysis if the missing piggybacked ack itself can cause this kind of a deadlock.

Filed #7339 on my comments above.

woody-apple · 2021-06-02T05:12:42Z

@andy31415 @Damian-Nordic @LuDuda ?

todo · 2021-06-02T19:49:15Z

- Enable test for lost CRMP ack messages

connectedhomeip/src/messaging/tests/TestReliableMessageProtocol.cpp

Lines 560 to 570 in 0adb96d

    
           // TODO - Enable test for lost CRMP ack messages 
        
           // The following check is commented out because of https://github.com/project-chip/connectedhomeip/issues/7292 
        
           //    NL_TEST_ASSERT(inSuite, rm->TestGetCountRetransTable() == 0); 
        
           NL_TEST_ASSERT(inSuite, mockReceiver.IsOnMessageReceivedCalled); 
        
           mockReceiver.mTestSuite = nullptr; 
        
           err = ctx.GetExchangeManager().UnregisterUnsolicitedMessageHandlerForType(Echo::MsgType::EchoRequest); 
        
           NL_TEST_ASSERT(inSuite, err == CHIP_NO_ERROR); 
        
           rm->ClearRetransTable(rc);

This comment was generated by todo based on a `TODO` comment in `0adb96d` in #7291. cc @pan-apple.

pan-apple · 2021-06-02T19:50:30Z

rebased

woody-apple · 2021-06-03T03:55:12Z

@saurabhst @Damian-Nordic @andy31415?

* Unit test for lost CRMP ack messages * disable the check in the test for time being

pullapprove bot requested review from andy31415, bzbarsky-apple, chrisdecenzo, Damian-Nordic, hawk248, jepenven-silabs and msandstedt June 1, 2021 22:29

pullapprove bot added the review - pending label Jun 1, 2021

pan-apple mentioned this pull request Jun 1, 2021

CRMP/MRP is unable to recover if ack is lost #7292

Closed

bzbarsky-apple approved these changes Jun 1, 2021

View reviewed changes

msandstedt approved these changes Jun 2, 2021

View reviewed changes

woody-apple approved these changes Jun 2, 2021

View reviewed changes

pan-apple added 2 commits June 2, 2021 12:50

Unit test for lost CRMP ack messages

877fcc9

disable the check in the test for time being

7352961

pan-apple force-pushed the ack-drop branch from 0adb96d to 7352961 Compare June 2, 2021 19:50

saurabhst approved these changes Jun 3, 2021

View reviewed changes

andy31415 approved these changes Jun 3, 2021

View reviewed changes

pullapprove bot added review - approved and removed review - pending labels Jun 3, 2021

andy31415 merged commit 575f99d into project-chip:master Jun 3, 2021

todo bot mentioned this pull request Jun 3, 2021

- Enable test for lost CRMP ack messages #7358

Closed

pan-apple deleted the ack-drop branch June 3, 2021 16:30

nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021

Unit test for lost CRMP ack messages (project-chip#7291)

4e8f30a

* Unit test for lost CRMP ack messages * disable the check in the test for time being

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit test for lost CRMP ack messages #7291

Unit test for lost CRMP ack messages #7291

pan-apple commented Jun 1, 2021

bzbarsky-apple Jun 1, 2021

pan-apple Jun 1, 2021 •

edited

Loading

bzbarsky-apple Jun 2, 2021

woody-apple commented Jun 2, 2021

todo bot commented Jun 2, 2021

pan-apple commented Jun 2, 2021

woody-apple commented Jun 3, 2021

Unit test for lost CRMP ack messages #7291

Unit test for lost CRMP ack messages #7291

Conversation

pan-apple commented Jun 1, 2021

Problem

Change overview

Testing

bzbarsky-apple Jun 1, 2021

Choose a reason for hiding this comment

pan-apple Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

bzbarsky-apple Jun 2, 2021

Choose a reason for hiding this comment

woody-apple commented Jun 2, 2021

todo bot commented Jun 2, 2021

- Enable test for lost CRMP ack messages

This comment was generated by todo based on a TODO comment in 0adb96d in #7291. cc @pan-apple.

pan-apple commented Jun 2, 2021

woody-apple commented Jun 3, 2021

pan-apple Jun 1, 2021 •

edited

Loading

This comment was generated by todo based on a `TODO` comment in `0adb96d` in #7291. cc @pan-apple.