Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHIP will crash when session expired during waiting for ACK #15987

Closed
erjiaqing opened this issue Mar 9, 2022 · 3 comments · Fixed by #17796
Closed

CHIP will crash when session expired during waiting for ACK #15987

erjiaqing opened this issue Mar 9, 2022 · 3 comments · Fixed by #17796

Comments

@erjiaqing
Copy link
Contributor

Problem

CI Crash:
https://github.com/project-chip/connectedhomeip/actions/runs/1950122299

Backtrace:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f59f90b4859 in __GI_abort () at abort.c:79
#2  0x0000565452d53dc2 in chipAbort() () at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/lib/support/CodeUtils.h:489
#3  0x0000565452d673aa in chip::Optional<chip::ReferenceCountedHandle<chip::Transport::Session> >::Value() const & (this=0x565452f7df00 <chip::Server::sServer+6208>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/lib/core/Optional.h:170
#4  0x0000565452d8ae04 in chip::SessionHolder::Get() const (this=0x565452f7dee8 <chip::Server::sServer+6184>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/transport/SessionHolder.h:54
#5  0x0000565452d8ae74 in chip::Messaging::ExchangeContext::GetSessionHandle() const (this=0x565452f7dea8 <chip::Server::sServer+6120>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/messaging/ExchangeContext.h:160
#6  0x0000565452eb74cb in chip::Messaging::ReliableMessageMgr::<lambda(auto:5*)>::operator()<chip::Messaging::ReliableMessageMgr::RetransTableEntry>(chip::Messaging::ReliableMessageMgr::RetransTableEntry *) const (__closure=0x7ffc303f4880, entry=0x565452f7db10 <chip::Server::sServer+5200>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:149
#7  0x0000565452eb7ba0 in chip::internal::LambdaProxy<chip::Messaging::ReliableMessageMgr::RetransTableEntry, chip::Messaging::ReliableMessageMgr::ExecuteActions()::<lambda(auto:5*)> >::Call(void *, void *) (context=0x7ffc303f4880, target=0x565452f7db10 <chip::Server::sServer+5200>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/lib/support/Pool.h:126
#8  0x0000565452e306c4 in chip::internal::StaticAllocatorBitmap::ForEachActiveObjectInner(void*, chip::Loop (*)(void*, void*))
    (this=0x565452f7dab8 <chip::Server::sServer+5112>, context=0x7ffc303f4880, lambda=0x565452eb7b79 <chip::internal::LambdaProxy<chip::Messaging::ReliableMessageMgr::RetransTableEntry, chip::Messaging::ReliableMessageMgr::ExecuteActions()::<lambda(auto:5*)> >::Call(void *, void *)>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/lib/support/Pool.cpp:97
#9  0x0000565452eb75b3 in chip::BitMapObjectPool<chip::Messaging::ReliableMessageMgr::RetransTableEntry, 24>::ForEachActiveObject<chip::Messaging::ReliableMessageMgr::ExecuteActions()::<lambda(auto:5*)> >(chip::Messaging::ReliableMessageMgr::<lambda(auto:5*)> &&) (this=0x565452f7dab8 <chip::Server::sServer+5112>, function=...)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/lib/support/Pool.h:257
#10 0x0000565452eb6737 in chip::Messaging::ReliableMessageMgr::ExecuteActions() (this=0x565452f7daa8 <chip::Server::sServer+5096>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:120
#11 0x0000565452eb67cc in chip::Messaging::ReliableMessageMgr::Timeout(chip::System::Layer*, void*)
    (aSystemLayer=0x565452f78de0 <chip::DeviceLayer::SystemLayerImpl()::gSystemLayerImpl>, aAppState=0x565452f7daa8 <chip::Server::sServer+5096>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:170
#12 0x0000565452f1a8b5 in chip::System::TimerData::Callback::Invoke() const (this=0x7ffc303f4960)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/system/SystemTimer.h:61
#13 0x0000565452f1c0a6 in chip::System::TimerPool<chip::System::TimerList::Node>::Invoke(chip::System::TimerList::Node*)
    (this=0x565452f793e8 <chip::DeviceLayer::SystemLayerImpl()::gSystemLayerImpl+1544>, timer=0x565454b4fc80)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/system/SystemTimer.h:224
#14 0x0000565452f1bc31 in chip::System::LayerImplSelect::HandleEvents() (this=0x565452f78de0 <chip::DeviceLayer::SystemLayerImpl()::gSystemLayerImpl>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/system/SystemLayerImplSelect.cpp:391
#15 0x0000565452e7fe56 in chip::DeviceLayer::Internal::GenericPlatformManagerImpl_POSIX<chip::DeviceLayer::PlatformManagerImpl>::_RunEventLoop()
    (this=0x565452f79810 <chip::DeviceLayer::PlatformManagerImpl::sInstance+16>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/include/platform/internal/GenericPlatformManagerImpl_POSIX.cpp:181
#16 0x0000565452d5afa8 in chip::DeviceLayer::PlatformManager::RunEventLoop() (this=0x565452f79800 <chip::DeviceLayer::PlatformManagerImpl::sInstance>)
    at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/src/include/platform/PlatformManager.h:352
#17 0x0000565452d5bfc1 in ChipLinuxAppMainLoop() () at ../../examples/all-clusters-app/linux/third_party/connectedhomeip/examples/platform/linux/AppMain.cpp:605
#18 0x0000565452d53efb in main(int, char**) (argc=1, argv=0x7ffc303f4bf8) at ../../examples/all-clusters-app/linux/main.cpp:26

Proposed Solution

<suggested fix, suggested enhancement>

@erjiaqing erjiaqing changed the title CHIP Server Crash CHIP will crash when SessionExpired during waiting for ACK Mar 9, 2022
@erjiaqing erjiaqing changed the title CHIP will crash when SessionExpired during waiting for ACK CHIP will crash when session expired during waiting for ACK Mar 9, 2022
@kghost
Copy link
Contributor

kghost commented Mar 9, 2022

This is triggered by closing the exchange multiple time.

First graceful close by application, it will not clear the retrans table. Then from OnSessionClosed event, it sense that the exchange is already closed, skip purging the retrans table, but if should purge the retrans table.

Possible solution:
Split close state into 2 states: Closed, Aborted.

  • Close: the exchange is graceful shutdown, no more sends, no more receives, but packets in retrans table will be send.
  • Abort: the exchange is shutdown completely, no more actions.

@Kxuan
Copy link
Contributor

Kxuan commented Mar 15, 2022

Same crash on ESP32.

backtrace:

I (327990) app-devicecallbacks: PostAttributeChangeCallback - Cluster ID: '0x0300', EndPoint ID: '0x01', Attribute ID: '0x0007'
I (328010) Light-Main: channel-1 44 => 44
I (328010) app-devicecallbacks: Current free heap: 107256
I (328020) chip[ZCL]: Color Temperature 189

abort() was called at PC 0x400e1089 on core 0
0x400e1089: chip::SessionHolder::Get() const at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/lib/support/CodeUtils.h:477
 (inlined by) chip::Optional<chip::ReferenceCountedHandle<chip::Transport::Session> >::Value() const & at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/lib/core/Optional.h:170
 (inlined by) chip::SessionHolder::Get() const at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/transport/SessionHolder.h:54



Backtrace:0x400819ca:0x3ffbd7400x40090799:0x3ffbd760 0x40096c1e:0x3ffbd780 0x400e1089:0x3ffbd7f0 0x400f5770:0x3ffbd810 0x401a79a9:0x3ffbd850 0x400f53a2:0x3ffbd880 0x400f54de:0x3ffbd8c0 0x400f6cb1:0x3ffbd8e0 0x400f6d14:0x3ffbd900 0x400fd252:0x3ffbd920 0x400d713d:0x3ffbd990 0x401ac84c:0x3ffbd9d0 0x40093841:0x3ffbd9f0
0x400819ca: panic_abort at /data/work/oppo/toolchains/esp-idf/components/esp_system/panic.c:402

0x40090799: esp_system_abort at /data/work/oppo/toolchains/esp-idf/components/esp_system/esp_system.c:128

0x40096c1e: abort at /data/work/oppo/toolchains/esp-idf/components/newlib/abort.c:46

0x400e1089: chip::SessionHolder::Get() const at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/lib/support/CodeUtils.h:477
 (inlined by) chip::Optional<chip::ReferenceCountedHandle<chip::Transport::Session> >::Value() const & at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/lib/core/Optional.h:170
 (inlined by) chip::SessionHolder::Get() const at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/transport/SessionHolder.h:54

0x400f5770: chip::Messaging::ExchangeContext::GetSessionHandle() const at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/messaging/ExchangeContext.h:163
 (inlined by) operator()<chip::Messaging::ReliableMessageMgr::RetransTableEntry> at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:149
 (inlined by) Call at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/lib/support/Pool.h:126

0x401a79a9: chip::internal::StaticAllocatorBitmap::ForEachActiveObjectInner(void*, chip::Loop (*)(void*, void*)) at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/lib/support/Pool.cpp:97

0x400f53a2: chip::Messaging::ReliableMessageMgr::ExecuteActions() at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/lib/support/Pool.h:257
 (inlined by) chip::Messaging::ReliableMessageMgr::ExecuteActions() at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:120

0x400f54de: chip::Messaging::ReliableMessageMgr::Timeout(chip::System::Layer*, void*) at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/messaging/ReliableMessageMgr.cpp:170 (discriminator 3)

0x400f6cb1: chip::System::TimerData::Callback::Invoke() const at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/system/SystemTimer.h:61
 (inlined by) chip::System::TimerPool<chip::System::TimerList::Node>::Invoke(chip::System::TimerList::Node*) at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/system/SystemTimer.h:224

0x400f6d14: chip::System::LayerImplLwIP::HandlePlatformTimer() at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/system/SystemLayerImplLwIP.cpp:139

0x400fd252: chip::DeviceLayer::Internal::GenericPlatformManagerImpl_FreeRTOS<chip::DeviceLayer::PlatformManagerImpl>::_RunEventLoop() at /data/work/oppo/matter/oppo-lamp/build/esp-idf/chip/../../../third_party/connectedhomeip/config/esp32/third_party/connectedhomeip/src/include/platform/internal/GenericPlatformManagerImpl_FreeRTOS.cpp:178

0x400d713d: chip::DeviceLayer::PlatformManager::RunEventLoop() at /data/work/oppo/matter/oppo-lamp/third_party/connectedhomeip/src/include/platform/PlatformManager.h:351
 (inlined by) app_main at /data/work/oppo/matter/oppo-lamp/main/main.cpp:146

0x401ac84c: main_task at /data/work/oppo/toolchains/esp-idf/components/freertos/port/port_common.c:129

0x40093841: vPortTaskWrapper at /data/work/oppo/toolchains/esp-idf/components/freertos/port/xtensa/port.c:131

@turon turon added the crash label Apr 5, 2022
@kghost
Copy link
Contributor

kghost commented Apr 26, 2022

This should be fixed as side-effect of #16882 @erjiaqing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants