-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RemoveFabric command crashes server when removing accessing fabric #16748
Comments
I'd lean toward storing the node id redundantly in the session context. That way we can support this "lingering session" use cases elegantly and decouple them from configuration updates that are occurring over the session. |
Note that the same assertion fails when the controller is subscribed to Leave/ShutDown events and the factory reset is triggered. |
Discussed with Boris, we will achieve
Step 3-6 will be deferred to the next schedule |
I can verify I hit this too while testing something. :-) |
@kghost I agree with your solution, but I think you will need to make sure that the LEAVE event is generated prior to removing ACLs for the removed fabric. Otherwise, generating the report containing the LEAVE event will fail the permission check, right? |
Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of project-chip#16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (project-chip#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of project-chip#16748 which relate to completely cleanly handling fabric removal edge cases. Issue project-chip#16748 Fixes project-chip#17579 Fixes project-chip#17680 Fixes project-chip#16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass
* Add TestSelfFabricRemoval.yaml test * Fix crash on removal of accessing fabric Because of an access to prior fabric data that is now deleted, in SessionManager::PrepareMessage, while trying to reply to RemoveFabric, applications crash when RemoveFabric is done on the accessing fabric. This crash was awaiting full fix of #16748 to be fixed, but that issue is much bigger scope. We can actually fix the crash with a suggestion made by @turon (#16748 (comment)) to keep the *local node ID* in the SecureSession so that SessionManager does not try to look-back at the FabricTable whenever preparing a CASE message where the fabric may be gone. This is a root cause fix for that very crash, but does not address the other aspects of #16748 which relate to completely cleanly handling fabric removal edge cases. Issue #16748 Fixes #17579 Fixes #17680 Fixes #16729 This PR does the following: - Add local node ID to the SecureSession and fix all associated plumbing - Use the local node ID for nonce generation in PrepareMessage rather than looking-up the fabric table (which may no longer hold the fabric that has that prior node ID) - Improve CASE session establishment logging - Fix the tests needed - Fix bad comments in TestPairingSession tests Testing done: - Added a YAML test (TestSelfFabricRemoval.yaml) for this case - Validated it failed before code fixes with the previously seen crash. - Validated that it passes with the new fixes - Added necessary tests to TestPairingSession for new methods - Unit tests pass - Cert tests pass * Restyled by whitespace * Restyled by clang-format * Regen ZAP after comment * Address review comments * Restyled by clang-format * Reorder one argument used in test-only code * Restyled by clang-format Co-authored-by: Restyled.io <[email protected]>
Fixed in #17815. |
Problem
After #16098 we crash if we RemoveFabric and pass it the fabric index of the fabric the command is running on.
This happens because of this code in
SessionManager::PrepareMessage
:when sending the response message for the RemoveFabric command. That
VerifyOrDie
dies, becausefabric
is in fact null by this point.Proposed Solution
We need to either fix this code to get the node id from somewhere else (e.g. store it in the session instead of getting it from the fabric table every time) or change how exactly RemoveFabric works so that it manages to send the message before doing the actual fabric removal.... but then we may have situations where we claim to have removed the fabric when we actually have not.
@kghost @tcarmelveilleux @turon
The text was updated successfully, but these errors were encountered: