Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chip-tool pairing ble-wifi command fails #6978

Closed
arunbharadwaj opened this issue May 19, 2021 · 24 comments
Closed

chip-tool pairing ble-wifi command fails #6978

arunbharadwaj opened this issue May 19, 2021 · 24 comments
Assignees

Comments

@arunbharadwaj
Copy link
Contributor

arunbharadwaj commented May 19, 2021

Problem

When I run chip-tool pairing ble-wifi SSID PASSWORD 112233 20202021 3840 pairing fails to complete successfully

Steps to reproduce:

  • Flash all-clusters-app on esp32
  • Run above command for chip-tool on a Linux laptop
  • Attached logs from both esp32 console and Linux chip-tool

chiptool.may18.txt
esp32.may18.txt

Proposed Solution

@vivien-apple @bzbarsky-apple @pan-apple @dhrishi @andy31415

@bzbarsky-apple
Copy link
Contributor

This is failing in message counter verification on the esp32 side, as @pan-apple points out:

E (17539) chip[IN]: Message counter verify failed, err = 4000047

Presumably partially fallout from #5389. It sounds like the controller is losing track of its counter whole continuing to use the same PASE session, which is of course not an OK thing to do.

@andy31415
Copy link
Contributor

@kghost - for the message counter implementation

@kghost
Copy link
Contributor

kghost commented May 19, 2021

Let me check.

@kghost kghost self-assigned this May 19, 2021
@andy31415
Copy link
Contributor

Reproduced failure in pairing locally (after rebuilding docker image for vscode). I get a " <0x1b>[0;31mE (21838) chip[EM]: Accept FAILED, err = CHIP Error 4000047 (0x003D092F): Invalid argument<0x1b>[0m␍␊" which is very unhelpful:

  • there is no 'Accept' but rather this is 'OnReceive'
  • Invalid argument tells me nothing about what actually went wrong. Will have to build a devkitc and attach a debugger probably.

@kghost
Copy link
Contributor

kghost commented May 19, 2021

I have reproduced the problem, let me fix it.

@bzbarsky-apple
Copy link
Contributor

To be clear, I think @pan-apple was looking into this, and in particular into making sure the controller side doesn't lose track of the counter state for its PASE session.

@andy31415
Copy link
Contributor

We also need to add unit tests for PeerMesssageCounter.h (and more ... but if we look at message counter synchronization, we need unit tests as well)

@kghost
Copy link
Contributor

kghost commented May 19, 2021

The problem is that the client (chip-tool) is creating a new connection, but the counter in the new connection has been reset to 1. so the server won't accept any message from the client.

@dhrishi
Copy link
Contributor

dhrishi commented May 20, 2021

I am also seeing the exact same issue on the latest CHIP code in all-clusters-app for Rendezvous BLE, Wi-Fi as well as Bypass.

@arunbharadwaj
Copy link
Contributor Author

Is there any fix (can be unmerged) that I can apply to get myself unblocked on this issue?

@sagar-apple
Copy link
Contributor

sagar-apple commented May 20, 2021

There seem to be multiple issues. I have changes that fix the counter sync issue in a local brnach but even after that the network provisioning commands fail because BLE is timing out.

@bzbarsky-apple
Copy link
Contributor

With a workaround for #7012 (comment out the Close call in ExchangeManager::OnConnectionExpired) and the fix from #7041, BLE provisioning seems to work for me on Mac + m5stack.

@arunbharadwaj is that good enough to unblock you, perhaps?

@dhrishi
Copy link
Contributor

dhrishi commented May 24, 2021

Hi @bzbarsky-apple I tried with the latest commit + change of commenting Close call as mentioned above + cherry-picking the 2 commits in 7041, I see an issue during sending onoff command. Am I missing anything? The exact same command worked during TE2.

Device side:

E (78695) chip[IN]: Message counter verify failed, err = 4000047
E (78695) chip[IN]: Secure transport received message destined to node ID (0x000000000002104E)
I (78695) chip[EM]: Received message of type 8 and protocolId 327680
V (78705) chip[EM]: ec id: 39023, Delegate: 0x3ffc9ac8
V (78715) chip[DMG]: ICR moving to [Initialize]
V (78715) chip[ZCL]: Received Cluster Command: Cluster=6 Command=0 Endpoint=0
I (78725) chip[ZCL]: On/Off set value: 0 0
I (78725) chip[ZCL]: ERR: reading on/off 86
V (78735) chip[DMG]: ICR moving to [AddCommand]
V (78735) chip[IN]: Secure message was encrypted: Msg ID 6
I (78745) chip[IN]: Sending msg from 0x000000000002104E to 0x000000000001B669 at utc time: 78070 msec
I (78755) chip[IN]: Sending secure msg on generic transport
I (78755) chip[IN]: Secure msg send status No Error
V (78765) chip[DMG]: ICR moving to [   Sending]
V (78765) chip[DMG]: ICR moving to [Uninitiali]
E (78775) chip[EM]: Accept FAILED, err = CHIP Error 4000047 (0x003D092F): Invalid argument
E (78785) chip[IN]: Secure transport received message destined to node ID (0x000000000002104E)
I (78795) chip[EM]: Received message of type 16 and protocolId 0

Python device controller side:

chip-device-ctrl > zcl OnOff Off 135246 0 0
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 158012828 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:IN: Secure transport received message destined to node ID (0x000000000001B669)
CHIP:EM: Received message of type 9 and protocolId 327680
SetCommandIndexStatus commandHandle=1 commandIndex=1
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fa06d2ee1b0
CHIP:ZCL:   status: EMBER_ZCL_STATUS_FAILURE (0x01)
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 158012976 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fa06d2ee1b0
CHIP:ZCL:   status: EMBER_ZCL_STATUS_SUCCESS (0x00)
Received command status response:
Container: 
    ProtocolId = 5
    ProtocolCode = 134
    EndpointId = 1
    ClusterId = 6
    CommandId = 0
    CommandIndex = 1

@bzbarsky-apple
Copy link
Contributor

E (78695) chip[IN]: Message counter verify failed, err = 4000047

in the device side log is probably the next thing that's broken. @sagar-apple I seem to recall you were working on this; is there a WIP thing you can point people to?

@sagar-apple
Copy link
Contributor

E (78695) chip[IN]: Message counter verify failed, err = 4000047

in the device side log is probably the next thing that's broken. @sagar-apple I seem to recall you were working on this; is there a WIP thing you can point people to?

Sorry, I've been away these past few days. My WIP changes are here sagar-apple@1cb9812. These changes are basically just storing the last local and peer counter values when a CHIPDevice is destroyed. The next time the a connection is reestablished I go and update the counter values. I was running into more issues after that though and so I never cleaned up and pushed my changes out.

I think at this point, I can probably put the counter fix up and revisit the remaining failures after.

@kghost
Copy link
Contributor

kghost commented May 25, 2021

@sagar-apple Just have a short conversation with @erjiaqing , Can you try #7041 over your message counter persistent change, and try it again to check if it is fixed.

@arunbharadwaj
Copy link
Contributor Author

@kghost, I cherry-picked @sagar-apple 's commit from his tree (1cb98120) and #7041 on top of it. Pairing seems to work. After pairing, tried onoff toggle. The board led seems to toggle but the esp32 logs point to some failures related to spake2+ handshake and message counter verify failure. I have attached the logs here. Please take a look.
sagar-onoff-toggle.chip-tool.txt
sagar-onoff-toggle.esp32.txt
sagar-ble-pairing.chip-tool.txt
sagar-ble-pairing.esp32.txt

@sagar-apple
Copy link
Contributor

I tried #7041 with #7106 and I'm basically still seeing the same network provisioning failures. The exchange we use to send the add network command is being closed by a "race" between us closing the connection because we released the device at the end of pairing + sending out this new add network command immediately after.

This is a snippet from the controller/commissioner.

[1621970877243] [0x3c46c72] CHIP: [EM] Received message of type 9 and protocolId 327680 on exchange 44252
[1621970877243] [0x3c46c72] CHIP: [DMG] InvokeCommand =
[1621970877243] [0x3c46c72] CHIP: [DMG] {
[1621970877243] [0x3c46c72] CHIP: [DMG] 	CommandList =
[1621970877243] [0x3c46c72] CHIP: [DMG] 	[
[1621970877243] [0x3c46c72] CHIP: [DMG] 		CommandDataElement =
[1621970877243] [0x3c46c72] CHIP: [DMG] 		{
[1621970877243] [0x3c46c72] CHIP: [DMG] 			CommandPath =
[1621970877243] [0x3c46c72] CHIP: [DMG] 			{
[1621970877243] [0x3c46c72] CHIP: [DMG] 				EndpointId = 0x1,
[1621970877243] [0x3c46c72] CHIP: [DMG] 				ClusterId = 0x3f,
[1621970877243] [0x3c46c72] CHIP: [DMG] 				CommandId = 0x0,
[1621970877243] [0x3c46c72] CHIP: [DMG] 			},
[1621970877243] [0x3c46c72] CHIP: [DMG] 			
[1621970877243] [0x3c46c72] CHIP: [DMG] 			StatusElement =
[1621970877243] [0x3c46c72] CHIP: [DMG] 			{
[1621970877243] [0x3c46c72] CHIP: [DMG] 				GeneralCode = 0x0,
[1621970877243] [0x3c46c72] CHIP: [DMG] 				ProtocolId = 0x5,
[1621970877243] [0x3c46c72] CHIP: [DMG] 				protocolCode = 0x0,
[1621970877243] [0x3c46c72] CHIP: [DMG] 			},
[1621970877243] [0x3c46c72] CHIP: [DMG] 			
[1621970877243] [0x3c46c72] CHIP: [DMG] 		},
[1621970877243] [0x3c46c72] CHIP: [DMG] 		
[1621970877243] [0x3c46c72] CHIP: [DMG] 	],
[1621970877243] [0x3c46c72] CHIP: [DMG] 	
[1621970877243] [0x3c46c72] CHIP: [DMG] }
[1621970877243] [0x3c46c72] CHIP: [ZCL] DefaultResponse:
[1621970877243] [0x3c46c72] CHIP: [ZCL]   Transaction: 0x10325f2f0
[1621970877243] [0x3c46c72] CHIP: [ZCL]   status: EMBER_ZCL_STATUS_SUCCESS (0x00)
[1621970877243] [0x3c46c72] CHIP: [CTL] Device confirmed that it has received the root certificate
[1621970877243] [0x3c46c72] CHIP: [CTL] Operational credentials provisioned on device 0x7fca30088170
[1621970877245] [0x3c46c72] CHIP: [TOO] Secure Pairing Success
[1621970877245] [0x3c46c72] CHIP: [TOO] Pairing Success
[1621970877245] [0x3c46c72] CHIP: [DMG] ICR moving to [Initialize]
[1621970877245] [0x3c46c72] CHIP: [DMG] ICR moving to [AddCommand]
[1621970877245] [0x3c46c72] CHIP: [IN] Connection from 'BLE' expired
[1621970877245] [0x3c46c72] CHIP: [CTL] OnConnectionExpired was called for unknown device, ignoring it.
[1621970877245] [0x3c46c72] CHIP: [IN] New pairing for device 0x0000000000bc5c01, key 0!!
[1621970877245] [0x3c46c72] CHIP: [IN] Secure message was encrypted: Msg ID 4
[1621970877245] [0x3c46c72] CHIP: [IN] Sending msg from 0x000000000001B669 to 0x0000000000BC5C01 at utc time: 6204503996 msec
[1621970877245] [0x3c46c72] CHIP: [IN] Sending secure msg on generic transport
[1621970877245] [0x3c46c72] CHIP: [IN] Secure msg send status No Error
[1621970877245] [0x3c46c72] CHIP: [DMG] ICR moving to [   Sending]
[1621970877245] [0x3c46c72] CHIP: [DMG] ICR moving to [Uninitiali]
[1621970877498] [0x3c46c71] CHIP: [IN] Secure transport received message destined to node ID (0x000000000001B669)
[1621970877498] [0x3c46c71] CHIP: [EM] Received message of type 9 and protocolId 327680 on exchange 44253
[1621970877498] [0x3c46c71] CHIP: [EM] OnMessageReceived failed, err = CHIP Error 4112 (0x00001010): Unsolicited msg with originator bit clear

@bzbarsky-apple
Copy link
Contributor

@sagar-apple That sounds like #7012 yes?

@sagar-apple
Copy link
Contributor

@sagar-apple That sounds like #7012 yes?

Yup yup that's it.

@dhrishi
Copy link
Contributor

dhrishi commented May 26, 2021

@sagar-apple @bzbarsky-apple Even though we are seeing errors on the device console related to ACCEPT failed and message counter verification failed, the controller side shows EMBER_ZCL_STATUS_FAILURE followed by EMBER_ZCL_STATUS_SUCCESS. Does that indicate success for cluster control?

chip-device-ctrl > zcl OnOff Off 135246 0 0
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 68480254 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:IN: Secure transport received message destined to node ID (0x000000000001B669)
CHIP:EM: Received message of type 9 and protocolId 327680
SetCommandIndexStatus commandHandle=1 commandIndex=1
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fd32a459318
CHIP:ZCL:   status: EMBER_ZCL_STATUS_FAILURE (0x01)
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 68480520 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fd32a459318
CHIP:ZCL:   status: EMBER_ZCL_STATUS_SUCCESS (0x00)
Received command status response:
Container: 
    ProtocolId = 0
    ProtocolCode = 65535
    EndpointId = 0
    ClusterId = 6
    CommandId = 0
    CommandIndex = 1

@bzbarsky-apple
Copy link
Contributor

@dhrishi There are two messages being sent; the first one ends up with a failure response, the second with a success response. It's hard to say what those mean without knowing what messages are sent (so either logging that on the controller side or seeing the corresponding device-side log).

@vivien-apple
Copy link
Contributor

@sagar-apple @bzbarsky-apple Even though we are seeing errors on the device console related to ACCEPT failed and message counter verification failed, the controller side shows EMBER_ZCL_STATUS_FAILURE followed by EMBER_ZCL_STATUS_SUCCESS. Does that indicate success for cluster control?

chip-device-ctrl > zcl OnOff Off 135246 0 0
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 68480254 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:IN: Secure transport received message destined to node ID (0x000000000001B669)
CHIP:EM: Received message of type 9 and protocolId 327680
SetCommandIndexStatus commandHandle=1 commandIndex=1
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fd32a459318
CHIP:ZCL:   status: EMBER_ZCL_STATUS_FAILURE (0x01)
CHIP:IN: Sending msg from 0x000000000001B669 to 0x000000000002104E at utc time: 68480520 msec
CHIP:IN: Sending secure msg on generic transport
CHIP:IN: Secure msg send status No Error
CHIP:ZCL: DefaultResponse:
CHIP:ZCL:   Transaction: 0x7fd32a459318
CHIP:ZCL:   status: EMBER_ZCL_STATUS_SUCCESS (0x00)
Received command status response:
Container: 
    ProtocolId = 0
    ProtocolCode = 65535
    EndpointId = 0
    ClusterId = 6
    CommandId = 0
    CommandIndex = 1

The second response is an issue with IM. In theory you can safely ignore it. I think @erjiaqing has a patch to solve that.

@woody-apple
Copy link
Contributor

We're in progress of reviewing older issues. Is this still an issue given how old the SDK this was tested against? Please re-open if so. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants