-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0 #16729
Labels
Comments
Duplicate of #16748 (which has a clearer description of what the problem is). |
tcarmelveilleux
added a commit
to tcarmelveilleux/connectedhomeip
that referenced
this issue
Apr 27, 2022
Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of project-chip#16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (project-chip#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of project-chip#16748 which relate to completely cleanly handling fabric removal edge cases. Issue project-chip#16748 Fixes project-chip#17579 Fixes project-chip#17680 Fixes project-chip#16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass
tcarmelveilleux
added a commit
that referenced
this issue
Apr 28, 2022
* Add TestSelfFabricRemoval.yaml test * Fix crash on removal of accessing fabric Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of #16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of #16748 which relate to completely cleanly handling fabric removal edge cases. Issue #16748 Fixes #17579 Fixes #17680 Fixes #16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass * Restyled by whitespace * Restyled by clang-format * Regen ZAP after comment * Address review comments * Restyled by clang-format * Reorder one argument used in test-only code * Restyled by clang-format Co-authored-by: Restyled.io <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
M5 board (all clusters app) resets after removing fabric - abort() was called at PC 0x40136ac5 on core 0
SHA: 125e73c
Steps:
In step 2, seeing M5 reboot consistently 3 out of 3 times
E (51631) chip[ZCL]: OpCreds: RemoveFabric
E (51671) chip[DIS]: Fabric (1) deleted. Calling OnFabricDeletedFromStorage
E (51681) chip[SVR]: Retrieved from server storage: f/1/g
E (51681) chip[SVR]: Retrieved from server storage: f/1/k/0
E (51691) chip[SVR]: Retrieved from server storage: f/1/g
E (51701) chip[SVR]: Retrieved from server storage: f/1/k/0
E (51721) chip[SVR]: Retrieved from server storage: f/t
E (51721) chip[SVR]: Retrieved from server storage: f/1/g
E (51731) chip[SVR]: Retrieved from server storage: f/t
E (51731) chip[SVR]: Retrieved from server storage: f/1/g
E (51761) chip[DMG]: AccessControl: removing fabric 1
E (51791) chip[ZCL]: OpCreds: Fabric 0x1 was deleted from fabric storage.
E (51791) chip[ZCL]: OpCreds: Call to fabricListChanged
E (51801) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aab
E (51811) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aac
E (51821) chip[EVL]: LogEvent event number: 0x0000000000010002 priority: 1, endpoint id: 0x0 cluster id: 0x0000_0028 event id: 0x2 Sys timestamp: 0x000000000000C712
E (51841) chip[DIS]: Updating services using commissioning mode 0
E (51841) chip[DIS]: CHIP minimal mDNS started advertising.
E (51851) chip[DIS]: Broadcasting mDns reply for query from FE80::8EAA:B5FF:FE80:4B4C
E (51861) chip[DIS]: Broadcasting mDns reply for query from FDB4:E776:C3DE:4C6E:8EAA:B5FF:FE80:4B4C
E (51861) chip[DIS]: Broadcasting mDns reply for query from 192.168.7.145
E (51881) chip[ZCL]: OpCreds: Call to fabricListChanged
E (51881) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aad
E (51891) chip[DMG]: Endpoint 0, Cluster 0x0000_003E update version to 316b3aae
E (51891) chip[DMG]: ICR moving to [AddingComm]
E (51901) chip[DMG]: ICR moving to [AddedComma]
E (51901) chip[DMG]: Decreasing reference count for CommandHandler, remaining 0
E (51921) chip[EM]: Piggybacking Ack for MessageCounter:5804051 on exchange: 4443r
abort() was called at PC 0x40136ac5 on core 0
0x40136ac5: chip::SessionManager::PrepareMessage(chip::SessionHandle const&, chip::PayloadHeader&, chip::System::PacketBufferHandle&&, chip::EncryptedPacketBufferHandle&) at /Users/keanlim/connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/lib/support/CodeUtils.h:489
(inlined by) chip::SessionManager::PrepareMessage(chip::SessionHandle const&, chip::PayloadHeader&, chip::System::PacketBufferHandle&&, chip::EncryptedPacketBufferHandle&) at /Users/keanlim/connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/transport/SessionManager.cpp:207
Backtrace:0x40081e26:0x3ffe6f500x400923a9:0x3ffe6f70 0x40098ac6:0x3ffe6f90 0x40136ac5:0x3ffe7000 0x40133ab6:0x3ffe70e0 0x4013382a:0x3ffe7130 0x40124143:0x3ffe7180 0x40124187:0x3ffe71b0 0x401241c1:0x3ffe71d0 0x401247c4:0x3ffe71f0 0x4012627e:0x3ffe7220 0x401268e9:0x3ffe7250 0x40133951:0x3ffe7280 0x40133e17:0x3ffe72c0 0x40136ddd:0x3ffe7330 0x40137454:0x3ffe73a0 0x40137579:0x3ffe7410 0x401c79b7:0x3ffe7480 0x4014a9b5:0x3ffe74d0 0x4014aaf5:0x3ffe7520 0x401391a1:0x3ffe7550 0x40139476:0x3ffe7570 0x4013949d:0x3ffe75e0 0x40095631:0x3ffe7600
0x40081e26: panic_abort at /Users/keanlim/tools/esp-idf/components/esp_system/panic.c:402
The text was updated successfully, but these errors were encountered: