M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729

kean-apple · 2022-03-28T17:39:46Z

M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0

SHA: 125e73c

Steps:

Pair M5 board using iOS chiptool - verify Able to control
From iOS chiptool, use unpair option to remove fabric from M5

In step 2, seeing M5 reboot consistently 3 out of 3 times

E (51631) chip[ZCL]: OpCreds: RemoveFabric
E (51671) chip[DIS]: Fabric (1) deleted. Calling OnFabricDeletedFromStorage
E (51681) chip[SVR]: Retrieved from server storage: f/1/g
E (51681) chip[SVR]: Retrieved from server storage: f/1/k/0
E (51691) chip[SVR]: Retrieved from server storage: f/1/g
E (51701) chip[SVR]: Retrieved from server storage: f/1/k/0
E (51721) chip[SVR]: Retrieved from server storage: f/t
E (51721) chip[SVR]: Retrieved from server storage: f/1/g
E (51731) chip[SVR]: Retrieved from server storage: f/t
E (51731) chip[SVR]: Retrieved from server storage: f/1/g
E (51761) chip[DMG]: AccessControl: removing fabric 1
E (51791) chip[ZCL]: OpCreds: Fabric 0x1 was deleted from fabric storage.
E (51791) chip[ZCL]: OpCreds: Call to fabricListChanged
E (51801) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aab
E (51811) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aac
E (51821) chip[EVL]: LogEvent event number: 0x0000000000010002 priority: 1, endpoint id: 0x0 cluster id: 0x0000_0028 event id: 0x2 Sys timestamp: 0x000000000000C712
E (51841) chip[DIS]: Updating services using commissioning mode 0
E (51841) chip[DIS]: CHIP minimal mDNS started advertising.
E (51851) chip[DIS]: Broadcasting mDns reply for query from FE80::8EAA:B5FF:FE80:4B4C
E (51861) chip[DIS]: Broadcasting mDns reply for query from FDB4:E776:C3DE:4C6E:8EAA:B5FF:FE80:4B4C
E (51861) chip[DIS]: Broadcasting mDns reply for query from 192.168.7.145
E (51881) chip[ZCL]: OpCreds: Call to fabricListChanged
E (51881) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aad
E (51891) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aae
E (51891) chip[DMG]: ICR moving to [AddingComm]
E (51901) chip[DMG]: ICR moving to [AddedComma]
E (51901) chip[DMG]: Decreasing reference count for CommandHandler, remaining 0
E (51921) chip[EM]: Piggybacking Ack for MessageCounter:5804051 on exchange: 4443r

abort() was called at PC 0x40136ac5 on core 0
0x40136ac5: chip::SessionManager::PrepareMessage(chip::SessionHandle const&, chip::PayloadHeader&, chip::System::PacketBufferHandle&&, chip::EncryptedPacketBufferHandle&) at /Users/keanlim/connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/lib/support/CodeUtils.h:489
(inlined by) chip::SessionManager::PrepareMessage(chip::SessionHandle const&, chip::PayloadHeader&, chip::System::PacketBufferHandle&&, chip::EncryptedPacketBufferHandle&) at /Users/keanlim/connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/transport/SessionManager.cpp:207

Backtrace:0x40081e26:0x3ffe6f500x400923a9:0x3ffe6f70 0x40098ac6:0x3ffe6f90 0x40136ac5:0x3ffe7000 0x40133ab6:0x3ffe70e0 0x4013382a:0x3ffe7130 0x40124143:0x3ffe7180 0x40124187:0x3ffe71b0 0x401241c1:0x3ffe71d0 0x401247c4:0x3ffe71f0 0x4012627e:0x3ffe7220 0x401268e9:0x3ffe7250 0x40133951:0x3ffe7280 0x40133e17:0x3ffe72c0 0x40136ddd:0x3ffe7330 0x40137454:0x3ffe73a0 0x40137579:0x3ffe7410 0x401c79b7:0x3ffe7480 0x4014a9b5:0x3ffe74d0 0x4014aaf5:0x3ffe7520 0x401391a1:0x3ffe7550 0x40139476:0x3ffe7570 0x4013949d:0x3ffe75e0 0x40095631:0x3ffe7600
0x40081e26: panic_abort at /Users/keanlim/tools/esp-idf/components/esp_system/panic.c:402

The text was updated successfully, but these errors were encountered:

kean-apple · 2022-03-28T17:40:02Z

m5-board-reset-removing-fabric2-chiptool-log.txt
M5-board-reset-removing-fabric2-M5-logs.txt

bzbarsky-apple · 2022-03-29T15:56:08Z

Duplicate of #16748 (which has a clearer description of what the problem is).

@turon

Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of project-chip#16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (project-chip#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of project-chip#16748 which relate to completely cleanly handling fabric removal edge cases. Issue project-chip#16748 Fixes project-chip#17579 Fixes project-chip#17680 Fixes project-chip#16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass

@turon

* Add TestSelfFabricRemoval.yaml test * Fix crash on removal of accessing fabric Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of #16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of #16748 which relate to completely cleanly handling fabric removal edge cases. Issue #16748 Fixes #17579 Fixes #17680 Fixes #16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass * Restyled by whitespace * Restyled by clang-format * Regen ZAP after comment * Address review comments * Restyled by clang-format * Reorder one argument used in test-only code * Restyled by clang-format Co-authored-by: Restyled.io <[email protected]>

kean-apple added the smoke test label Mar 29, 2022

bzbarsky-apple closed this as completed Mar 29, 2022

tcarmelveilleux mentioned this issue Apr 27, 2022

Fix crash on removal of accessing fabric #17815

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729

M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729

kean-apple commented Mar 28, 2022

kean-apple commented Mar 28, 2022

bzbarsky-apple commented Mar 29, 2022

M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729

M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729

Comments

kean-apple commented Mar 28, 2022

kean-apple commented Mar 28, 2022

bzbarsky-apple commented Mar 29, 2022