Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M5 ESP32 (all clusters app) : no resource for Read interaction when attempting to read from iOS chip tool app #14899

Closed
kean-apple opened this issue Feb 8, 2022 · 16 comments

Comments

@kean-apple
Copy link

kean-apple commented Feb 8, 2022

Problem

Hitting no resource for Read interaction errors when attempting to read Temp sensor on ESP32 M5 board using iOS chip tool app. This causes the iOS chip tool app to not display a value.

SHA: 2f785f4

Step to reproduce:

  1. Build iOS chip tool app
  2. In chiptool app, Pair M5 ESP32 board with all clusters app running ESP IDF v4.4
  3. Verify able to toggle light in chip tool app via Light on/off cluster option -> see LED toggle on M5
  4. Go to Temperature sensor, verify able to read temp value from M5 board. Manually change temp value on M5 board eg. devices->thermometer-> external->Thermometer->temperature->+/- . Hit refresh button on chip tool app and check it reflects new value

In step 4, I'm seeing this 50% of times I see follow errors and app doesn't reflect new value
I (100566) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:5070081 on exchange 46738r
I (102716) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:5070082 on exchange 46739r
I (102726) chip[IM]: no resource for Read interaction
I (102736) chip[IN]: Prepared secure message 0x3ffca0a4 to 0xFEBB776F12FC11DA (1) of type 0x1 and protocolId (0, 1) on exchange 46739r with MessageCounter:4339786.
I (102746) chip[IN]: Sending encrypted msg 0x3ffca0a4 with MessageCounter:4339786 to 0xFEBB776F12FC11DA (1) at monotonic time: 101973 msec
I (102816) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:5070083 on exchange 46739r

To recover, I need to reset the accessory and pair again

@kean-apple
Copy link
Author

@bzbarsky-apple
Copy link
Contributor

The default setup is that we can support up to 4 reads at once. See CHIP_IM_MAX_NUM_READ_HANDLER.

Now why you're ending up in a state where there are apparently that many in flight at once... I do see quite a number of reads in the log. Depending on what's being read we could just have a bunch of them not completed, or there could be another deeper issue here.

Enabling more verbose logging on the server so we could see what requests it's actually getting, and on the client so we can see what responses it got back, might be helpful...

@woody-apple
Copy link
Contributor

@bzbarsky-apple I assume we need some feedback, or queue here to handle a limit? (That said, when we hook this up to cached reads, that likely will help... short of not handling a large number in flight).

@bzbarsky-apple
Copy link
Contributor

The server is sending a RESOURCE_EXHAUSTED response, I would assume, but we don't usefully map that to a non-generic error in the Darwin framework right now.

That said, in this case it's really not clear to me why we end up in the bad state... Need more detailed logs.

@woody-apple
Copy link
Contributor

@kean-apple ?

@kean-apple
Copy link
Author

will need to find a way to enable more verbose logging on ESP32

@bzbarsky-apple
Copy link
Contributor

Need to compile so that ESP_LOGI and ESP_LOGV are not no-ops.

Or just change src/platform/ESP32/Logging.cpp to use ESP_LOGE for the Progress and Detail categories as a quick hack.

@kean-apple
Copy link
Author

@bzbarsky-apple I used the quick hack you mentioned..
Attached are the M5 debug logs
(2012295) all-clusters-app: Display awake but will switch off automatically in 30 seconds
E (2013715) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:10836690 on exchange 51371r
E (2013715) chip[EM]: Handling via exchange: 51371r, Delegate: 0x3ffcf4c8
E (2013725) chip[IM]: Received Read request
E (2013725) chip[DMG]: IM RH moving to [GeneratingReports]
E (2013925) chip[EM]: Sending Standalone Ack for MessageCounter:10836690 on exchange 51371r
E (2013925) chip[IN]: Prepared secure message 0x3ffe8694 to 0xDBB3E4A04234F04F (1) of type 0x10 and protocolId (0, 0) on exchange 51371r with MessageCounter:5542237.
E (2013945) chip[IN]: Sending encrypted msg 0x3ffe8694 with MessageCounter:5542237 to 0xDBB3E4A04234F04F (1) at monotonic time: 2013099 msec
E (2015045) chip[DIS]: Broadcasting mDns reply for query from 192.168.7.226
E (2023845) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:10836691 on exchange 51372r
E (2023855) chip[EM]: Handling via exchange: 51372r, Delegate: 0x3ffcf4c8
E (2023865) chip[IM]: Received Read request
E (2023865) chip[IM]: no resource for Read interaction
E (2023875) chip[EM]: Piggybacking Ack for MessageCounter:10836691 on exchange: 51372r
E (2023875) chip[IN]: Prepared secure message 0x3ffca5dc to 0xDBB3E4A04234F04F (1) of type 0x1 and protocolId (0, 1) on exchange 51372r with MessageCounter:5542238.
E (2023895) chip[IN]: Sending encrypted msg 0x3ffca5dc with MessageCounter:5542238 to 0xDBB3E4A04234F04F (1) at monotonic time: 2023049 msec
E (2023975) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:10836692 on exchange 51372r
E (2023985) chip[EM]: Found matching exchange: 51372r, Delegate: 0x0
E (2023995) chip[EM]: Rxd Ack; Removing MessageCounter:5542238 from Retrans Table on exchange 51372r
E (2024005) chip[EM]: Remove
Uploading no read resoure-debug-logs-m5.txt…
d CHIP MessageCounter:5542238 from RetransTable on exchange 51372r
no-read-resouce-chiptool.txt

Tested with SHA: cfab82d

@z-michel
Copy link

z-michel commented Feb 10, 2022

I also see a very similar situation with my M5Stack when toggling the OnOff state via the M5stack display so decided to investigate more. Our controller is reading an attribute on the Basic Cluster periodically and that works fine until I toggle the state of the OnOff attribute using the buttons on the M5 stack itself (Device->Light Bulb->Endpoint: 1->Cluster: OnOff -> Attribute: OnOff -> Toggle). After that, all subsequent reads fail because it seems like something is borked in src/app/reporting/Engine.cpp. I'll provide my logs below, including some custom ones I threw in there. It looks like after the Switch state is toggled, mRunScheduled is never set back to false and all subsequent read requests fail here. Reads will continue to "silently" fail this way until the CHIP_IM_MAX_NUM_READ_HANDLER limit of 4 reads is hit and then we'll start seeing the "No resources for Read interaction" log message start appearing. I don't know this code very well so I could be off but that's certainly what appears to be happening.

Logs from doing this:
------- SUCCESSFUL READ -----------
E (59631) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:6038030 on exchange 39839r
E (59631) chip[EM]: Found matching exchange: 39839r, Delegate: 0x0
E (59641) chip[EM]: Rxd Ack; Removing MessageCounter:9293456 from Retrans Table on exchange 39839r
E (59651) chip[EM]: Removed CHIP MessageCounter:9293456 from RetransTable on exchange 39839r
E (69521) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038031 on exchange 39840r
E (69521) chip[EM]: Handling via exchange: 39840r, Delegate: 0x3ffcf318
E (69531) chip[IM]: Received Read request
E (69531) chip[DMG]: IM RH moving to [GeneratingReports]
E (69541) chip[DMG]: ----- PRE SCHEDULE RUN -------
**E (69541) chip[DMG]: LayerImplLwIP::ScheduleWork, 1
E (69551) chip[DMG]: LayerImplLwIP::ScheduleWork, 2
E (69561) chip[DMG]: LayerImplLwIP::ScheduleWork, 3
E (69561) chip[DMG]: mRunScheduled!**
E (69571) chip[DMG]: ----- POST SCHEDULE RUN -------
E (69571) chip[DMG]: <RE:Run> Cluster 28, Attribute 1 is dirty
E (69581) chip[DMG]: Reading attribute: Cluster=0x0000_0028 Endpoint=0 AttributeId=0x0000_0001 (expanded=0)
E (69581) chip[DMG]: AccessControl: checking f=1 a=c s=0x73A5D360049A40D8 t= c=0x0000_0028 e=0 p=v
E (69601) chip[DMG]: AccessControl: overriding DENY (for now)
E (69611) chip[DMG]: <RE> Sending report (payload has 47 bytes)...
E (69611) chip[EM]: Piggybacking Ack for MessageCounter:6038031 on exchange: 39840r
E (69621) chip[IN]: Prepared secure message 0x3ffca074 to 0x73A5D360049A40D8 (1)  of type 0x5 and protocolId (0, 1) on exchange 39840r with MessageCounter:9293457.
E (69641) chip[IN]: Sending encrypted msg 0x3ffca074 with MessageCounter:9293457 to 0x73A5D360049A40D8 (1) at monotonic time: 68868 msec
E (69651) chip[DMG]: <RE> OnReportConfirm: NumReports = 0
E (69651) chip[DMG]: <RE> ReportsInFlight = 0 with readHandler 0, RE has no more messages
E (69661) chip[DMG]: IM RH moving to [AwaitingDestruction]
E (69671) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:6038032 on exchange 39840r
E (69681) chip[EM]: Found matching exchange: 39840r, Delegate: 0x0
E (69681) chip[EM]: Rxd Ack; Removing MessageCounter:9293457 from Retrans Table on exchange 39840r
E (69691) chip[EM]: Removed CHIP MessageCounter:9293457 from RetransTable on exchange 39840r
I (73641) ROUTE_HOOK: Received RIO
I (73641) ROUTE_HOOK: prefix FD11:22:: lifetime 1800

------- Toggling the OnOff State via M5 Stack Display --------

**I (73851) all-clusters-app: Opening device list
I (74751) all-clusters-app: Opening device 1
I (75441) all-clusters-app: Opening endpoint 0
I (75981) all-clusters-app: Opening cluster 0
I (76471) all-clusters-app: Opening attribute 0
E (77001) chip[DMG]: Endpoint 1, Cluster 0x0000_0006 update version to 17dc5521
E (77001) chip[DMG]: LayerImplLwIP::ScheduleWork, 1
E (77001) chip[DMG]: LayerImplLwIP::ScheduleWork, 2
E (77011) chip[DMG]: LayerImplLwIP::ScheduleWork, 3
E (77011) chip[DMG]: mRunScheduled!
I (77021) app-devicecallbacks: PostAttributeChangeCallback - Cluster ID: '0x0006', EndPoint ID: '0x01', Attribute ID: '0x0000'**
I (77041) app-devicecallbacks: Current free heap: 64556


--------FAILED READ 1-----------
E (79551) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038033 on exchange 39841r
E (79551) chip[EM]: Handling via exchange: 39841r, Delegate: 0x3ffcf318
E (79561) chip[IM]: Received Read request
E (79571) chip[DMG]: IM RH moving to [GeneratingReports]
E (79581) chip[DMG]: ----- PRE SCHEDULE RUN -------
**E (79581) chip[DMG]: mRunScheduled already True!**
E (79581) chip[DMG]: ----- POST SCHEDULE RUN -------
E (79761) chip[EM]: Sending Standalone Ack for MessageCounter:6038033 on exchange 39841r
E (79771) chip[IN]: Prepared secure message 0x3ffe87b4 to 0x73A5D360049A40D8 (1)  of type 0x10 and protocolId (0, 0) on exchange 39841r with MessageCounter:9293458.
E (79781) chip[IN]: Sending encrypted msg 0x3ffe87b4 with MessageCounter:9293458 to 0x73A5D360049A40D8 (1) at monotonic time: 79010 msec

--------FAILED READ 2-----------
E (89581) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038034 on exchange 39842r
E (89591) chip[EM]: Handling via exchange: 39842r, Delegate: 0x3ffcf318
E (89601) chip[IM]: Received Read request
E (89601) chip[DMG]: IM RH moving to [GeneratingReports]
E (89611) chip[DMG]: ----- PRE SCHEDULE RUN -------
**E (89611) chip[DMG]: mRunScheduled already True!**
E (89611) chip[DMG]: ----- POST SCHEDULE RUN -------
E (89801) chip[EM]: Sending Standalone Ack for MessageCounter:6038034 on exchange 39842r
E (89801) chip[IN]: Prepared secure message 0x3ffe87b4 to 0x73A5D360049A40D8 (1)  of type 0x10 and protocolId (0, 0) on exchange 39842r with MessageCounter:9293459.
E (89811) chip[IN]: Sending encrypted msg 0x3ffe87b4 with MessageCounter:9293459 to 0x73A5D360049A40D8 (1) at monotonic time: 89045 msec

--------FAILED READ 3-----------
E (99621) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038035 on exchange 39843r
E (99621) chip[EM]: Handling via exchange: 39843r, Delegate: 0x3ffcf318
E (99631) chip[IM]: Received Read request
E (99631) chip[DMG]: IM RH moving to [GeneratingReports]
E (99641) chip[DMG]: ----- PRE SCHEDULE RUN -------
**E (99651) chip[DMG]: mRunScheduled already True!**
E (99651) chip[DMG]: ----- POST SCHEDULE RUN -------
E (99831) chip[EM]: Sending Standalone Ack for MessageCounter:6038035 on exchange 39843r
E (99831) chip[IN]: Prepared secure message 0x3ffe87b4 to 0x73A5D360049A40D8 (1)  of type 0x10 and protocolId (0, 0) on exchange 39843r with MessageCounter:9293460.
E (99851) chip[IN]: Sending encrypted msg 0x3ffe87b4 with MessageCounter:9293460 to 0x73A5D360049A40D8 (1) at monotonic time: 99078 msec
E (105561) chip[DIS]: Broadcasting mDns reply for query from 192.168.1.30


--------FAILED READ 4-----------
E (109691) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038036 on exchange 39844r
E (109691) chip[EM]: Handling via exchange: 39844r, Delegate: 0x3ffcf318
E (109711) chip[IM]: Received Read request
E (109711) chip[DMG]: IM RH moving to [GeneratingReports]
E (109721) chip[DMG]: ----- PRE SCHEDULE RUN -------
**E (109721) chip[DMG]: mRunScheduled already True!**
E (109721) chip[DMG]: ----- POST SCHEDULE RUN -------
E (109911) chip[EM]: Sending Standalone Ack for MessageCounter:6038036 on exchange 39844r
E (109911) chip[IN]: Prepared secure message 0x3ffe87b4 to 0x73A5D360049A40D8 (1)  of type 0x10 and protocolId (0, 0) on exchange 39844r with MessageCounter:9293461.
E (109921) chip[IN]: Sending encrypted msg 0x3ffe87b4 with MessageCounter:9293461 to 0x73A5D360049A40D8 (1) at monotonic time: 109155 msec
E (110061) chip[DIS]: Broadcasting mDns reply for query from 192.168.1.30


--------FAILED READ 5 (RESOURCE EXHAUSTION)-----------
E (119521) chip[EM]: Received message of type 0x2 with protocolId (0, 1) and MessageCounter:6038037 on exchange 39845r
E (119521) chip[EM]: Handling via exchange: 39845r, Delegate: 0x3ffcf318
E (119531) chip[IM]: Received Read request
**E (119531) chip[IM]: no resource for Read interaction**
E (119541) chip[EM]: Piggybacking Ack for MessageCounter:6038037 on exchange: 39845r
E (119551) chip[IN]: Prepared secure message 0x3ffca074 to 0x73A5D360049A40D8 (1)  of type 0x1 and protocolId (0, 1) on exchange 39845r with MessageCounter:9293462.
E (119561) chip[IN]: Sending encrypted msg 0x3ffca074 with MessageCounter:9293462 to 0x73A5D360049A40D8 (1) at monotonic time: 118792 msec
E (119591) chip[EM]: Received message of type 0x10 with protocolId (0, 0) and MessageCounter:6038038 on exchange 39845r
E (119591) chip[EM]: Found matching exchange: 39845r, Delegate: 0x0
E (119601) chip[EM]: Rxd Ack; Removing MessageCounter:9293462 from Retrans Table on exchange 39845r
E (119611) chip[EM]: Removed CHIP MessageCounter:9293462 from RetransTable on exchange 39845r
E

@bzbarsky-apple
Copy link
Contributor

Attached are the M5 debug logs

It looks like they did not quite succeed at attaching?

@z-michel Trying your situation now.

@z-michel
Copy link

z-michel commented Feb 11, 2022

@bzbarsky-apple FYI, I should have also mentioned that our Controller's SDK (and therefore the corresponding M5 stack's SDK) is based off SHA from about a week ago: 4267a9b

@bzbarsky-apple
Copy link
Contributor

@z-michel I suspect you are running into what I just filed as #15065....

@kean-apple
Copy link
Author

in the past 2 days this seems to be working better..not sure what fixed it

@dhrishi
Copy link
Contributor

dhrishi commented Feb 16, 2022

@kean-apple Boris added the fix #15066 (merged) for the issue he mentioned above. Probably that fixed it? Closing this for now. Feel free to re-open in case you see it again

@dhrishi dhrishi closed this as completed Feb 16, 2022
@kean-apple
Copy link
Author

kean-apple commented Feb 16, 2022

Agreed. Not seeing this issue anymore

@bzbarsky-apple
Copy link
Contributor

Looking at the original steps here, it does look like we could have ended up in the racy state, mis-set the boolean, then never completed any of our reads until we ran out of ReadHandlers.... So yes, fixing the race could have fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants