Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.0.9 - Enviro Urban - Exception while uploading - Caused board to hang fro 2hrs #119

Open
dave-ct opened this issue Nov 26, 2022 · 76 comments
Labels
bug Something isn't working

Comments

@dave-ct
Copy link
Contributor

dave-ct commented Nov 26, 2022

@ZodiusInfuser have tried 0.0.9 on the Urban today (also on 2 x enviro indoors which dont seem to have any issues so far). On the Urban it worked for about 4hrs then stopped. Looking at the logs an error occured while uploading vi mqtt which then tried to go back to sleep but looks like it failed (last entry at 15:20) as the red LED was flashing probably for about two hours and this was at about 17:00. When I bought it in took a picture before disconnecting the battery as also seems the white LED is partially on.

Log Files

2022-11-26 15:05:16 [info     / 118kB] > uploading cached readings to MQTT broker: atmos.local
2022-11-26 15:05:16 [info     / 110kB]   - uploaded 2022-11-26T14_55_09Z.json
2022-11-26 15:05:17 [info     / 103kB]   - uploaded 2022-11-26T15_00_09Z.json
2022-11-26 15:05:17 [info     /  97kB]   - uploaded 2022-11-26T15_05_09Z.json
2022-11-26 15:05:17 [info     /  95kB] > going to sleep
2022-11-26 15:05:17 [debug    /  93kB]   - clearing and disabling previous alarm
2022-11-26 15:05:17 [info     /  90kB]   - setting alarm to wake at 15:10pm
2022-11-26 15:05:17 [info     /  88kB]   - shutting down
2022-11-26 15:10:03 [debug    / 115kB] > performing startup
2022-11-26 15:10:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:10:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:10:03 [debug    / 119kB] > 102 blocks free out of 212
2022-11-26 15:10:03 [debug    / 116kB] > taking new reading
2022-11-26 15:10:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:10:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:10:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:10:08 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:10:08 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:10:09 [debug    / 122kB] > caching reading for upload
2022-11-26 15:10:09 [info     / 117kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-11-26 15:10:09 [info     / 115kB] > going to sleep
2022-11-26 15:10:09 [debug    / 112kB]   - clearing and disabling previous alarm
2022-11-26 15:10:09 [info     / 110kB]   - setting alarm to wake at 15:15pm
2022-11-26 15:10:10 [info     /  94kB]   - shutting down
2022-11-26 15:15:03 [debug    / 115kB] > performing startup
2022-11-26 15:15:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:15:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:15:03 [debug    / 119kB] > 103 blocks free out of 212
2022-11-26 15:15:03 [debug    / 116kB] > taking new reading
2022-11-26 15:15:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:15:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:15:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:15:09 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:15:09 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:15:09 [debug    / 120kB] > caching reading for upload
2022-11-26 15:15:09 [info     / 115kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-11-26 15:15:09 [info     / 113kB] > going to sleep
2022-11-26 15:15:09 [debug    / 111kB]   - clearing and disabling previous alarm
2022-11-26 15:15:10 [info     / 109kB]   - setting alarm to wake at 15:20pm
2022-11-26 15:15:10 [info     / 107kB]   - shutting down
2022-11-26 15:20:03 [debug    / 115kB] > performing startup
2022-11-26 15:20:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-11-26 15:20:03 [debug    / 121kB]   - turn on activity led
2022-11-26 15:20:03 [debug    / 119kB] > 102 blocks free out of 212
2022-11-26 15:20:03 [debug    / 116kB] > taking new reading
2022-11-26 15:20:03 [info     / 112kB]   - seconds since last reading: 300
2022-11-26 15:20:03 [debug    / 109kB]   - starting sensor
2022-11-26 15:20:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-11-26 15:20:08 [debug    /  88kB]   - taking pms5003i reading
2022-11-26 15:20:08 [debug    / 121kB]   - taking microphone reading
2022-11-26 15:20:09 [debug    / 121kB] > caching reading for upload
2022-11-26 15:20:09 [info     / 116kB] > 3 cache file(s) need uploading
2022-11-26 15:20:09 [info     / 113kB] > connecting to wifi network 'VM2878914'
2022-11-26 15:20:15 [info     /  91kB]   - ip address:  192.168.0.73
2022-11-26 15:20:16 [info     / 118kB] > uploading cached readings to MQTT broker: atmos.local
2022-11-26 15:20:23 [debug    /  88kB]   - an exception occurred when uploading
2022-11-26 15:20:23 [error    /  86kB]   ! failed to upload '2022-11-26T15_10_09Z.json' to mqtt
2022-11-26 15:20:23 [error    /  84kB] ! reading upload failed
2022-11-26 15:20:23 [info     / 121kB] > going to sleep
2000-01-01 00:00:04 [debug    / 115kB] > performing startup
2000-01-01 00:00:04 [info     / 123kB]   - wake reason: usb_powered
2000-01-01 00:00:04 [debug    / 121kB]   - turn on activity led
2000-01-01 00:00:04 [info     / 119kB] > clock not set, synchronise from ntp server

Photo of board LEDs when I bought it in (red LED was flashing and white LED was part lit).

IMG_3167

@ZodiusInfuser
Copy link
Member

Thank you for raising this. It looks like it got stuck during the 2022-11-26 15:20:23 [info / 121kB] > going to sleep phase. As such, those two LEDs stayed on which probably drained your battery, hence the logs once you plugged USB in having the default time.

Why it got stuck during that phase, I do not know yet. I'll fire my own Enviro Urban up and see if I can reproduce.

@dave-ct
Copy link
Contributor Author

dave-ct commented Nov 27, 2022

Ah, the logs time reset to zero as I unplugged the battery before connecting to USB to view the logs.

@ZodiusInfuser
Copy link
Member

I had my own Enviro Urban lock up overnight (solid white light, but no red light). Unfortunately I needed it to be logging again so just reset it back to life. Hopefully the issue with appear again.

@ZodiusInfuser
Copy link
Member

Just to update that I am still investigating this. It's a hard one to diagnose, as it's so infrequent, even when logging every minute.

Best I can gather though, it's not an issue with the board going to sleep, but rather the board waking up. We do some shenanigans during bootup before Micropython kicks in, and somewhere in that is the issue. Likely in this file, if you're curious https://github.com/pimoroni/pimoroni-pico/blob/main/micropython/_board/picow_enviro/wakeup_gpio.patch

@TechWilk
Copy link

TechWilk commented Nov 30, 2022

I too had a solid white light on enviro indoor a day or two back, but after power cycling several hours later (having spotted the issue) the board has been working fine since.
I didn't get around to saving the logs before they rolled over, but sounds related.

I'm using a custom webhook rather than mqtt, if that matters.

@ZodiusInfuser
Copy link
Member

@TechWilk Thanks for the confirmation. I have not experienced this in Enviro Indoor myself yet, which has been running fine for over a week. Only issue I had was the RTC slowly going out of sync, causing an issue with Adafruit.io uploads.

This makes me wonder if it's the underlying micropython version at fault, as my Enviro Indoor and Weather are using v1.19.7 as their base, with the v0.0.9 enviro firmware onboard. Whereas my Enviro Grow and Urban are v1.19.10 with v0.0.9. 🤔

@ZodiusInfuser ZodiusInfuser added the bug Something isn't working label Dec 1, 2022
@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 3, 2022

Have been to a little experiment (2 x urbans and 2 indoors) all running at readings every 2 minutes and uploading evey 3 readings. I wanted to see how the readings varied from them when they are all in the same conditions. After about 3 hrs one urban has experienced the issue.

IMG-3183

Again has the red light and white LED on. Note this is a differnet urban to the original issue that was raised. To get it to work again had to remove power and the seems OK again. Will continue to monitor.

IMG-3192

They all running indentical 0.0.9 https://github.com/pimoroni/enviro/releases/download/v0.0.9/pimoroni-picow_enviro-v1.19.10-micropython-v0.0.9.uf2

The only thing differnet to v0.0.9 is I am sending over secure mqtt #122 on these ones but that should not have any impact.

Logs File see events at 14:48:

2022-12-03 14:42:03 [debug    / 115kB] > performing startup
2022-12-03 14:42:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:42:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:42:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:42:03 [debug    / 116kB] > taking new reading
2022-12-03 14:42:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:42:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:42:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:42:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-03 14:42:09 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:42:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:42:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-03 14:42:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-03 14:42:14 [info     /  94kB]   - ip address:  192.168.0.238
2022-12-03 14:42:14 [info     / 118kB] > uploading cached readings to MQTT broker: <removed>
2022-12-03 14:42:23 [info     / 103kB]   - uploaded 2022-12-03T14_38_09Z.json
2022-12-03 14:42:32 [info     / 113kB]   - uploaded 2022-12-03T14_40_09Z.json
2022-12-03 14:42:42 [info     / 102kB]   - uploaded 2022-12-03T14_42_09Z.json
2022-12-03 14:42:42 [info     / 100kB] > going to sleep
2022-12-03 14:42:42 [debug    /  97kB]   - clearing and disabling previous alarm
2022-12-03 14:42:42 [info     /  95kB]   - setting alarm to wake at 14:44pm
2022-12-03 14:42:42 [info     /  93kB]   - shutting down
2022-12-03 14:44:03 [debug    / 115kB] > performing startup
2022-12-03 14:44:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:44:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:44:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-03 14:44:03 [debug    / 116kB] > taking new reading
2022-12-03 14:44:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:44:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:44:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:44:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-03 14:44:09 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:44:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:44:09 [info     / 114kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-03 14:44:09 [info     / 112kB] > going to sleep
2022-12-03 14:44:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-03 14:44:09 [info     / 107kB]   - setting alarm to wake at 14:46pm
2022-12-03 14:44:10 [info     / 105kB]   - shutting down
2022-12-03 14:46:03 [debug    / 115kB] > performing startup
2022-12-03 14:46:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:46:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:46:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:46:03 [debug    / 116kB] > taking new reading
2022-12-03 14:46:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:46:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:46:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:46:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-03 14:46:08 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:46:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:46:09 [info     / 114kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-03 14:46:09 [info     / 112kB] > going to sleep
2022-12-03 14:46:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-03 14:46:09 [info     / 108kB]   - setting alarm to wake at 14:48pm
2022-12-03 14:46:09 [info     / 105kB]   - shutting down
2022-12-03 14:48:03 [debug    / 115kB] > performing startup
2022-12-03 14:48:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-03 14:48:03 [debug    / 121kB]   - turn on activity led
2022-12-03 14:48:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-03 14:48:03 [debug    / 116kB] > taking new reading
2022-12-03 14:48:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-03 14:48:03 [debug    / 109kB]   - starting sensor
2022-12-03 14:48:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-03 14:48:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-03 14:48:08 [debug    / 120kB]   - taking microphone reading
2022-12-03 14:48:09 [debug    / 119kB] > caching reading for upload
2022-12-03 14:48:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-03 14:48:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-03 14:48:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-03 14:48:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-03 14:48:34 [debug    /  75kB]   - an exception occurred when uploading
2022-12-03 14:48:35 [error    /  73kB]   ! failed to upload '2022-12-03T14_44_09Z.json' to mqtt
2022-12-03 14:48:35 [error    /  71kB] ! reading upload failed
2022-12-03 14:48:35 [info     /  69kB] > going to sleep

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 4, 2022

This moring had 1 x indoor and 1 x urban in a failed state with the red and white light on.

IMG-3195

For the Urban it looks like once one of the uploads failed it stopped, from the logs doe snot look like it even tried to go back to sleep, see entries at 09:08. AT 10:25 was when I removed the power and plugged in to USB to get logs:

 2022-12-04 09:06:09 [info     / 112kB] > going to sleep
2022-12-04 09:06:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-04 09:06:09 [info     / 107kB]   - setting alarm to wake at 09:08am
2022-12-04 09:06:10 [info     / 105kB]   - shutting down
2022-12-04 09:08:03 [debug    / 115kB] > performing startup
2022-12-04 09:08:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 09:08:03 [debug    / 121kB]   - turn on activity led
2022-12-04 09:08:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-04 09:08:03 [debug    / 116kB] > taking new reading
2022-12-04 09:08:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 09:08:03 [debug    / 109kB]   - starting sensor
2022-12-04 09:08:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 09:08:08 [debug    / 123kB]   - taking pms5003i reading
2022-12-04 09:08:08 [debug    / 120kB]   - taking microphone reading
2022-12-04 09:08:09 [debug    / 119kB] > caching reading for upload
2022-12-04 09:08:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-04 09:08:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-04 09:08:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-04 09:08:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 09:08:21 [info     / 109kB]   - uploaded 2022-12-04T09_04_09Z.json
2022-12-04 09:08:44 [info     / 106kB]   - uploaded 2022-12-04T09_06_09Z.json
2022-12-04 09:08:58 [debug    /  96kB]   - an exception occurred when uploading
2022-12-04 09:08:58 [error    /  94kB]   ! failed to upload '2022-12-04T09_08_09Z.json' to mqtt
2022-12-04 09:08:58 [error    /  92kB] ! reading upload failed
2022-12-04 10:25:33 [debug    / 115kB] > performing startup
2022-12-04 10:25:33 [info     / 123kB]   - wake reason: usb_powered
2022-12-04 10:25:33 [debug    / 121kB]   - turn on activity led
2022-12-04 10:25:33 [debug    / 118kB] > 99 blocks free out of 212 

For the Indoor the logs are slightly differnet but again had to be powered off to get the logs and kick it back into life. See time 06:00 but this one did appear to try to go to sleep after a uplaod failure, the entries after that are when I powered off and plugged back into USB to get the logs.

2022-12-04 05:58:03 [debug    / 115kB] > performing startup
2022-12-04 05:58:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 05:58:03 [debug    / 121kB]   - turn on activity led
2022-12-04 05:58:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-04 05:58:03 [debug    / 117kB] > taking new reading
2022-12-04 05:58:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 05:58:04 [debug    / 107kB] > caching reading for upload
2022-12-04 05:58:04 [info     / 102kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-04 05:58:04 [info     / 100kB] > going to sleep
2022-12-04 05:58:04 [debug    /  98kB]   - clearing and disabling previous alarm
2022-12-04 05:58:04 [info     /  95kB]   - setting alarm to wake at 06:00am
2022-12-04 05:58:04 [info     /  93kB]   - shutting down
2022-12-04 06:00:03 [debug    / 115kB] > performing startup
2022-12-04 06:00:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 06:00:03 [debug    / 121kB]   - turn on activity led
2022-12-04 06:00:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-04 06:00:03 [debug    / 117kB] > taking new reading
2022-12-04 06:00:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-04 06:00:04 [debug    / 107kB] > caching reading for upload
2022-12-04 06:00:04 [info     / 102kB] > 3 cache file(s) need uploading
2022-12-04 06:00:04 [info     / 100kB] > connecting to wifi network '<removed>'
2022-12-04 06:00:10 [info     /  96kB]   - ip address:  192.168.0.181
2022-12-04 06:00:11 [info     / 119kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 06:00:18 [debug    /  88kB]   - an exception occurred when uploading
2022-12-04 06:00:18 [error    /  86kB]   ! failed to upload '2022-12-04T05_56_04Z.json' to mqtt
2022-12-04 06:00:18 [error    /  84kB] ! reading upload failed
2022-12-04 06:00:18 [info     /  82kB] > going to sleep
2022-12-04 06:00:18 [debug    /  80kB]   - clearing and disabling previous alarm
2022-12-04 06:00:18 [info     /  78kB]   - setting alarm to wake at 06:02am
2000-01-01 00:00:05 [debug    / 115kB] > performing startup
2000-01-01 00:00:05 [info     / 123kB]   - wake reason: usb_powered
2000-01-01 00:00:05 [debug    / 121kB]   - turn on activity led

The common theme seems to be that it only occurs after an upload failure. So perhaps might be able to simulate by disabling the remote destination (turn mqtt f) and see if it forces the error. On the logs for both also never shows a "shutting down" down entry so so not sure if it due to not waking up or it is just not shutting down properly after a upload failure.

Note that until these failures they had been uploading fine taking readings evey 2 minutes and uploading every 3 readings.

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

Indoor & urban had the same issue again today:

IMG-3207

For the Indoor this may be down to the battery level this time which was showing about 4% with my power reader but volatage should still be in tolerance. Urban had 84% battery so dont think be a reason this time,.

IMG-3208

Indoor Logs, see 05:55. Last log entry is when i disconnected battery and reconnected on USB to get logs.

2022-12-05 05:52:03 [debug    / 115kB] > performing startup
2022-12-05 05:52:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 05:52:03 [debug    / 121kB]   - turn on activity led
2022-12-05 05:52:03 [debug    / 119kB] > 99 blocks free out of 212
2022-12-05 05:52:03 [debug    / 117kB] > taking new reading
2022-12-05 05:52:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 05:52:04 [debug    / 107kB] > caching reading for upload
2022-12-05 05:52:04 [info     / 102kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 05:52:04 [info     / 100kB] > going to sleep
2022-12-05 05:52:04 [debug    /  98kB]   - clearing and disabling previous alarm
2022-12-05 05:52:04 [info     /  96kB]   - setting alarm to wake at 05:54am
2022-12-05 05:52:04 [info     /  94kB]   - shutting down
2022-12-05 05:54:03 [debug    / 115kB] > performing startup
2022-12-05 05:54:03 [info     / 108kB]   - wake reason: rtc_alarm
2022-12-05 05:54:03 [debug    / 106kB]   - turn on activity led
2022-12-05 05:54:03 [debug    / 104kB] > 100 blocks free out of 212
2022-12-05 05:54:03 [debug    / 102kB] > taking new reading
2022-12-05 05:54:04 [info     /  97kB]   - seconds since last reading: 121
2022-12-05 05:54:04 [debug    /  92kB] > caching reading for upload
2022-12-05 05:54:04 [info     / 122kB] > 3 cache file(s) need uploading
2022-12-05 05:54:04 [info     / 120kB] > connecting to wifi network <removed>
2022-12-05 05:54:10 [info     /  97kB]   - ip address:  192.168.0.181
2022-12-05 05:54:11 [info     / 119kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 05:54:16 [info     / 109kB]   - uploaded 2022-12-05T05_50_04Z.json
2022-12-05 05:55:48 [debug    /  50kB]   - an exception occurred when uploading
2022-12-05 05:55:48 [error    /  48kB]   ! failed to upload '2022-12-05T05_52_04Z.json' to mqtt
2022-12-05 05:55:48 [error    /  46kB] ! reading upload failed
2022-12-05 05:55:48 [info     /  44kB] > going to sleep
2022-12-05 05:55:48 [debug    /  42kB]   - clearing and disabling previous alarm
2000-01-01 00:00:02 [debug    / 115kB] > performing startup

Urban logs, see 06:48. Have included the previosu logs so can see it worked before. Last log entry is when i disconnected battery and reconnected on USB to get logs.

2022-12-05 06:42:03 [debug    / 115kB] > performing startup
2022-12-05 06:42:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:42:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:42:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:42:03 [debug    / 116kB] > taking new reading
2022-12-05 06:42:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 06:42:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:42:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 06:42:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:42:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:42:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:42:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 06:42:09 [info     / 112kB] > connecting to wifi network <removed>
2022-12-05 06:42:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 06:42:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 06:42:24 [info     / 101kB]   - uploaded 2022-12-05T06_38_09Z.json
2022-12-05 06:42:29 [info     / 111kB]   - uploaded 2022-12-05T06_40_10Z.json
2022-12-05 06:42:33 [info     / 109kB]   - uploaded 2022-12-05T06_42_09Z.json
2022-12-05 06:42:33 [info     / 107kB] > going to sleep
2022-12-05 06:42:33 [debug    / 105kB]   - clearing and disabling previous alarm
2022-12-05 06:42:34 [info     / 102kB]   - setting alarm to wake at 06:44am
2022-12-05 06:42:34 [info     / 100kB]   - shutting down
2022-12-05 06:44:03 [debug    / 115kB] > performing startup
2022-12-05 06:44:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:44:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:44:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:44:03 [debug    / 116kB] > taking new reading
2022-12-05 06:44:03 [info     / 111kB]   - seconds since last reading: 120
2022-12-05 06:44:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:44:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 06:44:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:44:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:44:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:44:09 [info     / 114kB] > 1 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 06:44:09 [info     / 112kB] > going to sleep
2022-12-05 06:44:09 [debug    / 110kB]   - clearing and disabling previous alarm
2022-12-05 06:44:09 [info     / 108kB]   - setting alarm to wake at 06:46am
2022-12-05 06:44:10 [info     / 105kB]   - shutting down
2022-12-05 06:46:03 [debug    / 115kB] > performing startup
2022-12-05 06:46:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:46:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:46:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 06:46:03 [debug    / 116kB] > taking new reading
2022-12-05 06:46:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 06:46:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:46:04 [debug    /  92kB]   - wait 5 seconds for airflow
2022-12-05 06:46:09 [debug    / 106kB]   - taking pms5003i reading
2022-12-05 06:46:09 [debug    / 104kB]   - taking microphone reading
2022-12-05 06:46:09 [debug    / 102kB] > caching reading for upload
2022-12-05 06:46:09 [info     /  97kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-05 06:46:10 [info     /  95kB] > going to sleep
2022-12-05 06:46:10 [debug    /  93kB]   - clearing and disabling previous alarm
2022-12-05 06:46:10 [info     /  90kB]   - setting alarm to wake at 06:48am
2022-12-05 06:46:10 [info     /  88kB]   - shutting down
2022-12-05 06:48:03 [debug    / 115kB] > performing startup
2022-12-05 06:48:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 06:48:03 [debug    / 121kB]   - turn on activity led
2022-12-05 06:48:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-05 06:48:03 [debug    / 116kB] > taking new reading
2022-12-05 06:48:03 [info     / 111kB]   - seconds since last reading: 120
2022-12-05 06:48:03 [debug    / 109kB]   - starting sensor
2022-12-05 06:48:03 [debug    / 106kB]   - wait 5 seconds for airflow
2022-12-05 06:48:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 06:48:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 06:48:09 [debug    / 119kB] > caching reading for upload
2022-12-05 06:48:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 06:48:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-05 06:48:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 06:48:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 06:48:35 [debug    /  74kB]   - an exception occurred when uploading
2022-12-05 06:48:35 [error    /  72kB]   ! failed to upload '2022-12-05T06_44_09Z.json' to mqtt
2022-12-05 06:48:35 [error    /  70kB] ! reading upload failed
2000-01-01 00:00:04 [debug    / 115kB] > performing startup

@ZodiusInfuser
Copy link
Member

Hi @dave-ct, Thank you so much for this thorough testing!

I can clearly see there being some correlation between the mqtt upload failing and the boards locking up, and it does seem to be during shutdown rather than startup (meaning this may be is a different issue to the one I and others have reported, which is definitely during startup).

I will have a look in the code and see if there is something specific to do with uploads failing that would cause this kind of lock-up. Hopefully I can reproduce without setting up an mqtt server.

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

@ZodiusInfuser if there is any extra logging/tests you want me to try let me know. Both Urbans have failed today just now with the same issue on after a upload failure, see another example below at 11:24 and then connected to USB at 12:04. .

2022-12-05 11:24:03 [debug    / 115kB] > performing startup
2022-12-05 11:24:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 11:24:03 [debug    / 121kB]   - turn on activity led
2022-12-05 11:24:03 [debug    / 118kB] > 100 blocks free out of 212
2022-12-05 11:24:03 [debug    / 116kB] > taking new reading
2022-12-05 11:24:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 11:24:03 [debug    / 109kB]   - starting sensor
2022-12-05 11:24:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 11:24:09 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 11:24:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 11:24:09 [debug    / 119kB] > caching reading for upload
2022-12-05 11:24:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 11:24:09 [info     / 111kB] > connecting to wifi network '<removed>'
2022-12-05 11:24:15 [info     /  89kB]   - ip address:  192.168.0.236
2022-12-05 11:24:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 11:24:27 [info     / 101kB]   - uploaded 2022-12-05T11_20_09Z.json
2022-12-05 11:24:39 [debug    / 100kB]   - an exception occurred when uploading
2022-12-05 11:24:39 [error    /  98kB]   ! failed to upload '2022-12-05T11_22_09Z.json' to mqtt
2022-12-05 11:24:39 [error    /  96kB] ! reading upload failed
2022-12-05 11:24:39 [info     /  94kB] > going to sleep
2022-12-05 11:24:39 [debug    /  92kB]   - clearing and disabling previous alarm
2022-12-05 12:04:27 [debug    / 115kB] > performing startup

@ZodiusInfuser
Copy link
Member

@dave-ct to help with debugging, could you update the local mqtt.py file on your boards from this:

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]

  try:
    # attempt to publish reading
    mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
    mqtt_client.connect()
    mqtt_client.publish(f"enviro/{nickname}", ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS
  except:
    logging.debug(f"  - an exception occurred when uploading")

  return UPLOAD_FAILED

to this:

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]

  try:
    # attempt to publish reading
    mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
    mqtt_client.connect()
    mqtt_client.publish(f"enviro/{nickname}", ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS
  except Exception as exc:
    import sys, io
    buf = io.StringIO()
    sys.print_exception(exc, buf)
    logging.debug(f"  - an exception occurred when uploading.", buf.getvalue())

  return UPLOAD_FAILED

This will log the actual exception that occurs when the MQTT broker fails.

Meanwhile I have set up an Enviro Urban to log and send data to a MQTT broker that doesn't exist, to see if I can get some reproduction. This always causes an exception to occur (red flashing light), but is perhaps not the same exception you are experiencing, hence the need for extra logging.

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

@ZodiusInfuser Just wanted to confirm, its is nothing to do with the secure mqtt I have setup as my urban which is physically outside also failed today and this just uses the default mqtt destination with no changes. Seems like the same issue at 23:05.

2022-12-04 23:00:03 [debug    / 115kB] > performing startup
2022-12-04 23:00:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 23:00:03 [debug    / 121kB]   - turn on activity led
2022-12-04 23:00:03 [debug    / 119kB] > 100 blocks free out of 212
2022-12-04 23:00:03 [debug    / 116kB] > taking new reading
2022-12-04 23:00:03 [info     / 112kB]   - seconds since last reading: 300
2022-12-04 23:00:03 [debug    / 109kB]   - starting sensor
2022-12-04 23:00:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 23:00:08 [debug    /  88kB]   - taking pms5003i reading
2022-12-04 23:00:08 [debug    / 121kB]   - taking microphone reading
2022-12-04 23:00:09 [debug    / 122kB] > caching reading for upload
2022-12-04 23:00:09 [info     / 117kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2022-12-04 23:00:09 [info     / 115kB] > going to sleep
2022-12-04 23:00:09 [debug    / 112kB]   - clearing and disabling previous alarm
2022-12-04 23:00:09 [info     / 110kB]   - setting alarm to wake at 23:05pm
2022-12-04 23:00:09 [info     / 108kB]   - shutting down
2022-12-04 23:05:03 [debug    / 115kB] > performing startup
2022-12-04 23:05:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-04 23:05:03 [debug    / 121kB]   - turn on activity led
2022-12-04 23:05:03 [debug    / 119kB] > 100 blocks free out of 212
2022-12-04 23:05:03 [debug    / 116kB] > taking new reading
2022-12-04 23:05:03 [info     / 112kB]   - seconds since last reading: 300
2022-12-04 23:05:03 [debug    / 109kB]   - starting sensor
2022-12-04 23:05:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-04 23:05:08 [debug    /  88kB]   - taking pms5003i reading
2022-12-04 23:05:08 [debug    / 121kB]   - taking microphone reading
2022-12-04 23:05:09 [debug    / 121kB] > caching reading for upload
2022-12-04 23:05:09 [info     / 116kB] > 3 cache file(s) need uploading
2022-12-04 23:05:09 [info     / 114kB] > connecting to wifi network '<removed>'
2022-12-04 23:05:15 [info     /  91kB]   - ip address:  192.168.0.73
2022-12-04 23:05:16 [info     / 118kB] > uploading cached readings to MQTT broker: <removed>
2022-12-04 23:05:35 [debug    /  85kB]   - an exception occurred when uploading
2022-12-04 23:05:35 [error    /  83kB]   ! failed to upload '2022-12-04T22_55_09Z.json' to mqtt
2022-12-04 23:05:35 [error    /  81kB] ! reading upload failed
2022-12-05 12:18:12 [debug    / 115kB] > performing startup
2022-12-05 12:18:12 [info     / 123kB]   - wake reason: usb_powered
2022-12-05 12:18:12 [debug    / 121kB]   - turn on activity led
2022-12-05 12:18:12 [debug    / 118kB] > 101 blocks free out of 212
2022-12-05 12:18:12 [debug    / 116kB] > taking new reading
2022-12-05 12:18:12 [info     / 112kB]   - seconds since last reading: 47589
2022-12-05 12:18:12 [debug    / 109kB]   - starting sensor
2022-12-05 12:18:12 [debug    / 107kB]   - wait 5 seconds for airflow

I will add the extra logging you mentioned above on the urbans and indoors to see if I can gte more details. Altough sometimes it seems lik eit gets a bit further than othe rother times based on the log messages.

@ZodiusInfuser
Copy link
Member

Thanks for that. The fact it fails at slightly different points after the exception could be a red herring, in that it's either still the exception at fault, or it's not the exception at all but some other factor. If the latter, then I have no idea how to begin diagnosing that! 😢

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

@ZodiusInfuser as these four boards all need ot use secure mqtt have modfied it slightly but should captcher the results as per the logging you suggested.

This is the mqtt.py I am using with the extra logging added in.

from enviro import logging
from enviro.constants import UPLOAD_SUCCESS, UPLOAD_FAILED
from enviro.mqttsimple import MQTTClient
import ujson
import config

def log_destination():
  logging.info(f"> uploading cached readings to MQTT broker: {config.mqtt_broker_address}")

def upload_reading(reading):
  server = config.mqtt_broker_address
  username = config.mqtt_broker_username
  password = config.mqtt_broker_password
  nickname = reading["nickname"]
  
  try:
    # attempt to publish reading
    
    ##### SSL Change Start
    # two options, with or without SSL
    if config.mqtt_broker_ca_file:
      f = open("ca.crt")
      ssl_data = f.read()
      f.close()
      mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60,
                               ssl=True, ssl_params={'cert': ssl_data})
      print('Using SSL')
    else:
    ##### SSL Change finish so do what the non SSL version does instead
      mqtt_client = MQTTClient(reading["uid"], server, user=username, password=password, keepalive=60)
      print('not ssl')
    mqtt_client.connect()
    mqtt_client.publish('envirotest', ujson.dumps(reading), retain=True)
    mqtt_client.disconnect()
    return UPLOAD_SUCCESS

  except Exception as exc:
    import sys, io
    buf = io.StringIO()
    sys.print_exception(exc, buf)
    logging.debug(f"  - an exception occurred when uploading.", buf.getvalue())

  return UPLOAD_FAILED

@ZodiusInfuser
Copy link
Member

Ah of course, thank you. By the way, for that SSL change, if you have tested it and are confident it works, could you raise a Pull Request for it please? There's a few things that may need tweaking, but that can be discussed in the PR

@ZodiusInfuser
Copy link
Member

I've just had the solid light issue (with flashing red led) occur. Best I can tell from the logs it's the same startup issue I was already aware of as there were going to sleep messages, whereas yous logs clearly show that not happening.

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

@ZodiusInfuser Just got a failure on an urban, did work a few time before hand:

2022-12-05 13:06:03 [debug    / 115kB] > performing startup
2022-12-05 13:06:03 [info     / 123kB]   - wake reason: rtc_alarm
2022-12-05 13:06:03 [debug    / 121kB]   - turn on activity led
2022-12-05 13:06:03 [debug    / 118kB] > 99 blocks free out of 212
2022-12-05 13:06:03 [debug    / 116kB] > taking new reading
2022-12-05 13:06:03 [info     / 112kB]   - seconds since last reading: 120
2022-12-05 13:06:03 [debug    / 109kB]   - starting sensor
2022-12-05 13:06:03 [debug    / 107kB]   - wait 5 seconds for airflow
2022-12-05 13:06:08 [debug    / 122kB]   - taking pms5003i reading
2022-12-05 13:06:09 [debug    / 120kB]   - taking microphone reading
2022-12-05 13:06:09 [debug    / 119kB] > caching reading for upload
2022-12-05 13:06:09 [info     / 114kB] > 3 cache file(s) need uploading
2022-12-05 13:06:09 [info     / 112kB] > connecting to wifi network '<removed>'
2022-12-05 13:06:15 [info     /  90kB]   - ip address:  192.168.0.238
2022-12-05 13:06:16 [info     / 117kB] > uploading cached readings to MQTT broker: <removed>
2022-12-05 13:06:35 [debug    /  73kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 32, in upload_reading
  File "enviro/mqttsimple.py", line 70, in connect
OSError: [Errno 103] ECONNABORTED

2022-12-05 13:06:35 [error    /  70kB]   ! failed to upload '2022-12-05T13_02_09Z.json' to mqtt
2022-12-05 13:06:35 [error    / 121kB] ! reading upload failed

Is it possible to do a build with the micropython version from 0.0.8 with the 0.0.9 firmware. Might proove if its the updated micropython version.

@ZodiusInfuser
Copy link
Member

ZodiusInfuser commented Dec 5, 2022

Thanks for that. I'll have a dig and see what that error means.

In the meantime, yes you can try different micropython versions. Easiest way is to install the v0.0.8 micropython using this https://github.com/pimoroni/enviro/releases/download/v0.0.8/enviro-v0.0.8.uf2

Then put the v0.0.9 FW on using this https://github.com/pimoroni/enviro/releases/download/v0.0.9/enviro-v0.0.9-filesystem-only.uf2.

Save a local copy of your config and mqtt files first though

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

Have updated 2 x Urbans and 2 x indoors to the 0.0.8 miropython with the 0.0.9 Firmware using the above method, restored the saved config/mqtt/ca.crt files fro each one and will monitor and see how they get on......

@ZodiusInfuser
Copy link
Member

Thanks @dave-ct. It will be interesting to hear.

By the way, I noticed that in the v0.0.9 mqttsimple.py file there used to be a timeout set but it has since been commented out. It could be worth re-adding that as a second step.

    def connect(self, clean_session=True):
    #def connect(self, clean_session=True, timeout=30): # TODO this was added to 0.0.8
        self.sock = socket.socket()
        #self.sock.settimeout(timeout) # TODO this was added to 0.0.8
        addr = socket.getaddrinfo(self.server, self.port)[0][-1]

This may resolve the ECONNABORTED exception, which the internet suggests is caused by the server rejecting the connection, or at least change it into an ECONNTIMEDOUT exception.

In either case, I should update the logic for mqtt.py so it checks for that specific exception and performs the same "reattempt upload" that I added to adafruit.io.

@ZodiusInfuser
Copy link
Member

@dave-ct I just wanted to let you know that I too tested with v0.0.8's Micropython and the v0.0.9 firmware, but still had the issue occur. Also this is the first time I have your exact issue of needing to remove power to the board in order to fix things.

I have left a board at the office in this error state so I can diagnose it further, but let me explain some of my understanding thus far. Apologies, it's rather long as I've kind of used this to air my own thoughts on what is going on.


When Enviro is on battery and not doing anything, it is in (lets call it) "super sleep", where the Pico W literally has it's 3.3V power supply disabled. For all intents and purposes it is offline in this mode, with the only things getting power on Enviro being the RTC and user button. When either the RTC triggers its alarm, the user button is poked, or USB power is provided, the Pico W receives 3.3V and starts booting, turning the white LED on, setting HOLD_VSYS_EN to high keep the board alive, and disabling the RTC_ALARM (and this red flashing light). When Enviro it wants to go to back to "super sleep" it sets HOLD_VSYS_EN to low, and as long as there's no external source preventing it (such as USB power), it's 3.3V power gets deactivated.

The situation you (and others) are experiencing is a lockup during boot (we'll get to possible reasons later), sometime after the white LED is enabled but before Micropython has fully loaded, where the white LED would start pulsing. This point is likely before the RTC_ALARM has been reset, as indicated by the red light still flashing on your boards. That means that 3.3V power is constantly provided to the Pico W. This is significant because normally on Enviro, when reset is pressed whilst the board is awake, it will cause the board to turn off, requiring a "Poke" to start it running again. What we are now both experiencing is the reset button just resetting the Pico W. Normally this would be fine, but the white LED just turns on and nothing happens. This should not be the case with a board reset, so there must be something persistently stalling the boot process of the RP2040.

Incidentally this explains why you need to remove power from the board, as that is the only way to fully reset the state of everything on the board.

For Enviro Urban the things this could be are: the RTC, the BME280, the analog microphone, or the Pico W's WiFi chip itself.
I'm yet to rule any of these options, but the fact you are getting upload failed exceptions right before these boot lockup's could be an indication of the WiFi chip getting into a weird state that either causes an immediate lockup (as some of your logs show) or causes the lock-up during boot. Note, this is irrespective of if you use the WiFi in your program, as I've had enviro's that locally log also lock-up.

If it is indeed the WiFi causing these lock-ups, then that's a much deeper problem to solve that ventures outside of the Enviro hardware and software and over to Raspberry Pi themselves.


So that's where we're at... I wish I had an easy answer for you, but it's either one of those other internal components of Enviro that is causing the issue or the Pico W's WiFi itself. I will delve deeper over the next few days, and see if there's been any general Pico W reports that corroborate this theory.

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 5, 2022

@ZodiusInfuser , thanks for the detail and efforts to find the solution. Since using v0.0.8's Micropython and the v0.0.9 firmware I have had no issues so far on any of the devices, even when there have been upload failures it recoveres itself when it wakes up next. I will monitor over the next couple of days as previosuly with the v0.0.9 Micropython is was happening regularly.

@ZodiusInfuser
Copy link
Member

Interesting. Perhaps my method of flashing my boards wasn't optimal. After all, I too have had an Enviro Weather and Indoor running for a few weeks now and they had v0.0.8's Micropython with the v0.0.9 (pre-release) Firmware on 🤔

Actually. Let me try it properly and get the system to generate a build with the right MP (v1.19.7) and v0.0.9 firmware 😆 : https://github.com/pimoroni/enviro/actions/runs/3624876714

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 6, 2022

No issues this morning, will also try the system build above later tonight to make sure.

@ZodiusInfuser
Copy link
Member

Thanks for the update. I've loaded an Enviro grow up with the firmware, so will leave that running through the work day.
If it is indeed the difference between 1.19.7 and 1.19.10 that's to blame then that at least gives me a direction to explore with my debugging.

@ZodiusInfuser
Copy link
Member

Putting this here for my own reference. It looks like there is nothing of note different between v1.19.7 and v1.19.10 that would explain the lock-ups 😕 pimoroni/pimoroni-pico@1.19.7...v1.19.10

@dave-ct
Copy link
Contributor Author

dave-ct commented Dec 6, 2022

@ZodiusInfuser could there be any changes in the upstream version of micropython that the Pimornoi build uses. Not sure how it works but does it incoporate the offical micropython release from the 18 June (1.19.1) or does it grab latest updates to core micropython or one of the nightly builds which might have some changes in it?

@ZodiusInfuser
Copy link
Member

ZodiusInfuser commented Dec 6, 2022

That was the first thing I looked for, but the commit code we're using hasn't changed between our release versions:
https://github.com/pimoroni/pimoroni-pico/blob/b1e8ed086400cfe8f471fd9e35577cfc3b12d4e9/.github/workflows/micropython-picow.yml#L10

From what I recall, 1.19.1 predates the Pico W, so we are using a newer build (but not the absolute latest). We would prefer to use a stable numbered release, but there has yet to be one. The last time we updated was for our 1.19.6.


An update on my Enviro Grow. It has locked up several times today. Both times pressing the reset button recovered it from the state, so whatever happened with the Urban is seriously bad.

I tried looking at the Urban with a scope to see if there was any merit to the theory of WiFi comms locking things up, but did not see any activity on the lines. So instead I tried scoping one of the Pico W's flash chip lines and it consistently fails at the same point.

Normal Enviro during boot:
image

Locked Enviro Urban during boot:
image

Don't know how useful this information is yet, so I'll need to correlate this data with different points in the code. Or try and get in touch with someone at Pi to help as this is starting to get beyond me :sad:

@Julia7676
Copy link

The technique of recovering from a processor hang using a timer implemented# in the PIO section of the Pico appears to work. The PIO code uses a 32 bit count down timer which negates HOLD_VSYS_EN_PIN when the count reaches zero. If the code has not reached the end of sleep within 4 minutes the power removed. Note I duplicated the code to set the RTC alarm to just before arming the watchdog.

# This extract inserted in __init__.py just before def startup():

# define DELAYOFF used to ensure enviro will power down even if it hangs
# ----------------------------------------------------------------------
from machine import Pin
from rp2 import PIO, StateMachine, asm_pio

@asm_pio(sideset_init=PIO.OUT_HIGH)
def delayoff_prog():   
    label('d_loop')
    jmp(y_dec, 'd_loop') [1]
    label('done')
    jmp('done').side(0)

class DELAYOFF:
    def __init__(self, pin, delay, sm_id=0):
        delay_ms=int(delay*1000)
        self._sm = StateMachine(sm_id, delayoff_prog, freq=2000, sideset_base=Pin(pin))
        self._sm.put(delay_ms)
        self._sm.exec('pull()')
        self._sm.exec("mov(y, osr)") #load max count into y
        self._sm.active(1)
        logging.debug(f'> delayoff set on gpio{pin:} for {delay_ms:} ms')

def arm_watchdog():
  # set default alarm now in case processor hangs.  Normally ths is overwritten by sleep()

  if helpers.file_exists("watchdog_live.txt"):
    os.remove("watchdog_live.txt")
    logging.warn("> * * Processor recovered by watchdog * *")
  
  # this code extracted from sleep TODO make into routine shared by both -----------------
  dt = rtc.datetime()
  hour, minute = dt[3:5]
  
  # calculate how many minutes into the day we are
  minute = math.floor(minute / config.reading_frequency) * config.reading_frequency
  minute += config.reading_frequency

  while minute >= 60:      
    minute -= 60
    hour += 1
  if hour >= 24:
    hour -= 24
  ampm = "am" if hour < 12 else "pm"

  logging.info(f"  - setting default alarm to wake at {hour:02}:{minute:02}{ampm}")

  # sleep until next scheduled reading
  rtc.set_alarm(0, minute, hour)
  rtc.enable_alarm_interrupt(True)
  #------------------------------------------------------------------end copied from sleep

  # power will be pulled if frame does not complete in 4 Miniutes
  delayoff = DELAYOFF(HOLD_VSYS_EN_PIN, 4*1000) 
  with open("watchdog_live.txt", "w") as hangfile:
    hangfile.write("")
    
def startup():

#- lines not shown... insert call to arm_watchdog after lines shown

  # log the wake reason
  logging.info("  - wake reason:", wake_reason_name(reason))
  arm_watchdog()
  

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 18, 2023

@Julia7676 Thanks for sharing, will give it a test

@ZodiusInfuser
Copy link
Member

That's brilliant @Julia7676! I'm embarressed to say I never thought of using the PIO as a watchdog! I was too focused on using the system watchdog, but the timeout wasn't really long enough to be useful. I even tried ways of feeding the watchdog within the micropython C code itself, like in between it executing each line of user Python code, but that's not how their system is written.

Just to get my understanding correct, this code initialises the watchdog during startup and sets the RTC in case code never reaches that part of the sleep() function? I also see you use the same text file trick as I do, to record that an issue did occur :)

Regarding the comment about moving the RTC setting to a common function, note that there is this incoming pull request that modifies some of the code surrounding the RTC that may impact your logic: https://github.com/pimoroni/enviro/pull/132/files
I should really get that merged, as I think it's been tested to work now, right @dave-ct?

One possible extension of this, that could benefit users running their Enviro's off USB, would be to have the PIO trigger an interrupt (assuming Python lets you create an interrupt handler). That function could then call reset(). Sadly that also assumes that the main core is not in a state that prevents it from processing the interrupt so perhaps we don't worry about the USB use-case for now 🤔

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 22, 2023

@ZodiusInfuser I had not tested all the RTC parts as did not have issues with that (only the mqtt bits we did). But have updated 2 x Urbans, 2 x Indoors and 1 x weather with the code form https://github.com/pimoroni/enviro/pull/132/files
and have also added in @Julia7676 code for the PIO, will leave to test over the next few days and see if any issue pop up

@Julia7676
Copy link

I have been experimenting with the watchdog this past week. I am recovering about 1 to 2 hangs a day. I increased the number of hangs by generating a lot of WIFI traffic after each upload.

I should of have mentioned that the watchdog_live file needs to be deleted just before normal shutdown

rtc.enable_alarm_interrupt(True)

if helpers.file_exists("watchdog_live.txt"):
    os.remove("watchdog_live.txt")

# disable the vsys hold, causing us to turn off
logging.info("  - shutting down")
hold_vsys_en_pin.init(Pin.IN)

There is small issue that will prevent the watchdog being effective when the RTC chip has it's time changed such as the first power up from cold. This probably can be ignored but might catch you out when testing. One way of testing is to remove the hold_vsys_en_pin.init(Pin.IN) line from sleep() while on USB power and then watching the vsys_en_pin go low.

I have also been looking at reinstating battery monitoring. (It is a very good way of crashing the processor!) Can some one point me at the issue # so I can report my progress. After a lot of false starts I am close to a possible solution.

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 23, 2023

@Julia7676 thanks for the pointer, that may be why when one of them froze it did not recover. Have updated and will see how they get on.

@ZodiusInfuser
Copy link
Member

Hi @Julia7676, thank you for your experimenting. Any chance you can raise a pull-request with your watchdog in? That way it will be easier for people to grab and test, and for code changes to be tracked. In the meantime I will merge the adafruit-io-fixes branch, to avoid later conflicts.

Regarding battery monitoring, there is no specific issue listed for the battery monitoring being disabled (because it was disabled and noted by the last release). You are more than welcome to create one though.

My attempt to resolve it has been to inject ADC reading within Micropython itself before the WiFi gets initialised. This seemed to work for me when I tested locally, but getting our CI system to generate a usable customer build is proving to be a challenge:
pimoroni/pimoroni-pico@main...patch/enviro_wakeup_battery

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 23, 2023

Have given it ago with the follwoing init.py but still getting hangs.
The logs says delayoff set on gpio2 for 4000000 ms , is that correct? See below for log file.

# keep the power rail alive by holding VSYS_EN high as early as possible
# ===========================================================================
from enviro.constants import *
from machine import Pin
hold_vsys_en_pin = Pin(HOLD_VSYS_EN_PIN, Pin.OUT, value=True)

# detect board model based on devices on the i2c bus and pin state
# ===========================================================================
from pimoroni_i2c import PimoroniI2C
i2c = PimoroniI2C(I2C_SDA_PIN, I2C_SCL_PIN, 100000)
i2c_devices = i2c.scan()
model = None
if 56 in i2c_devices: # 56 = colour / light sensor and only present on Indoor
  model = "indoor"
elif 35 in i2c_devices: # 35 = ltr-599 on grow & weather
  pump3_pin = Pin(12, Pin.IN, Pin.PULL_UP)
  model = "grow" if pump3_pin.value() == False else "weather"    
  pump3_pin.init(pull=None)
else:    
  model = "urban" # otherwise it's urban..

# return the module that implements this board type
def get_board():
  if model == "indoor":
    import enviro.boards.indoor as board
  if model == "grow":
    import enviro.boards.grow as board
  if model == "weather":
    import enviro.boards.weather as board
  if model == "urban":
    import enviro.boards.urban as board
  return board
  
# set up the activity led
# ===========================================================================
from machine import PWM, Timer
import math
activity_led_pwm = PWM(Pin(ACTIVITY_LED_PIN))
activity_led_pwm.freq(1000)
activity_led_pwm.duty_u16(0)

# set the brightness of the activity led
def activity_led(brightness):
  brightness = max(0, min(100, brightness)) # clamp to range
  # gamma correct the brightness (gamma 2.8)
  value = int(pow(brightness / 100.0, 2.8) * 65535.0 + 0.5)
  activity_led_pwm.duty_u16(value)
  
activity_led_timer = Timer(-1)
activity_led_pulse_speed_hz = 1
def activity_led_callback(t):
  # updates the activity led brightness based on a sinusoid seeded by the current time
  brightness = (math.sin(time.ticks_ms() * math.pi * 2 / (1000 / activity_led_pulse_speed_hz)) * 40) + 60
  value = int(pow(brightness / 100.0, 2.8) * 65535.0 + 0.5)
  activity_led_pwm.duty_u16(value)

# set the activity led into pulsing mode
def pulse_activity_led(speed_hz = 1):
  global activity_led_timer, activity_led_pulse_speed_hz
  activity_led_pulse_speed_hz = speed_hz
  activity_led_timer.deinit()
  activity_led_timer.init(period=50, mode=Timer.PERIODIC, callback=activity_led_callback)

# turn off the activity led and disable any pulsing animation that's running
def stop_activity_led():
  global activity_led_timer
  activity_led_timer.deinit()
  activity_led_pwm.duty_u16(0)

# check whether device needs provisioning
# ===========================================================================
import time
from phew import logging
button_pin = Pin(BUTTON_PIN, Pin.IN, Pin.PULL_DOWN)
needs_provisioning = False
start = time.time()
while button_pin.value(): # button held for 3 seconds go into provisioning
  if time.time() - start > 3:
    needs_provisioning = True
    break

try:
  import config # fails to import (missing/corrupt) go into provisioning
  if not config.provisioned: # provisioned flag not set go into provisioning
    needs_provisioning = True
except Exception as e:
  logging.error("> missing or corrupt config.py", e)
  needs_provisioning = True

if needs_provisioning:
  logging.info("> entering provisioning mode")
  import enviro.provisioning
  # control never returns to here, provisioning takes over completely

# all the other imports, so many shiny modules
import machine, sys, os, ujson
from machine import RTC, ADC
import phew
from pcf85063a import PCF85063A
import enviro.helpers as helpers

# read the state of vbus to know if we were woken up by USB
vbus_present = Pin("WL_GPIO2", Pin.IN).value()

#BUG Temporarily disabling battery reading, as it seems to cause issues when connected to Thonny
"""
# read battery voltage - we have to toggle the wifi chip select
# pin to take the reading - this is probably not ideal but doesn't
# seem to cause issues. there is no obvious way to shut down the
# wifi for a while properly to do this (wlan.disonnect() and
# wlan.active(False) both seem to mess things up big style..)
old_state = Pin(WIFI_CS_PIN).value()
Pin(WIFI_CS_PIN, Pin.OUT, value=True)
sample_count = 10
battery_voltage = 0
for i in range(0, sample_count):
  battery_voltage += (ADC(29).read_u16() * 3.3 / 65535) * 3
battery_voltage /= sample_count
battery_voltage = round(battery_voltage, 3)
Pin(WIFI_CS_PIN).value(old_state)
"""

# set up the button, external trigger, and rtc alarm pins
rtc_alarm_pin = Pin(RTC_ALARM_PIN, Pin.IN, Pin.PULL_DOWN)
# BUG This should only be set up for Enviro Camera
# external_trigger_pin = Pin(EXTERNAL_INTERRUPT_PIN, Pin.IN, Pin.PULL_DOWN)

# intialise the pcf85063a real time clock chip
rtc = PCF85063A(i2c)
i2c.writeto_mem(0x51, 0x00, b'\x00') # ensure rtc is running (this should be default?)
rtc.enable_timer_interrupt(False)

t = rtc.datetime()
# BUG ERRNO 22, EINVAL, when date read from RTC is invalid for the pico's RTC.
RTC().datetime((t[0], t[1], t[2], t[6], t[3], t[4], t[5], 0)) # synch PR2040 rtc too

# jazz up that console! toot toot!
print("       ___            ___            ___          ___          ___            ___       ")
print("      /  /\          /__/\          /__/\        /  /\        /  /\          /  /\      ")
print("     /  /:/_         \  \:\         \  \:\      /  /:/       /  /::\        /  /::\     ")
print("    /  /:/ /\         \  \:\         \  \:\    /  /:/       /  /:/\:\      /  /:/\:\    ")
print("   /  /:/ /:/_    _____\__\:\    ___  \  \:\  /__/::\      /  /:/~/:/     /  /:/  \:\   ")
print("  /__/:/ /:/ /\  /__/::::::::\  /___\  \__\:\ \__\/\:\__  /__/:/ /:/___  /__/:/ \__\:\  ")
print("  \  \:\/:/ /:/  \  \:\~~~__\/  \  \:\ |  |:|    \  \:\/\ \  \:\/:::::/  \  \:\ /  /:/  ")
print("   \  \::/ /:/    \  \:\         \  \:\|  |:|     \__\::/  \  \::/~~~`    \  \:\  /:/   ")
print("    \  \:\/:/      \  \:\         \  \:\__|:|     /  /:/    \  \:\         \  \:\/:/    ")
print("     \  \::/        \  \:\         \  \::::/     /__/:/      \  \:\         \  \::/     ")
print("      \__\/          \__\/          `~~~~~`      \__\/        \__\/          \__\/      ")
print("")
print("    -  --  ---- -----=--==--===  hey enviro, let's go!  ===--==--=----- ----  --  -     ")
print("")


import network # TODO this was removed from 0.0.8
def connect_to_wifi():
  """ TODO what it was changed to
  if phew.is_connected_to_wifi():
    logging.info(f"> already connected to wifi")
    return True
  """

  wifi_ssid = config.wifi_ssid
  wifi_password = config.wifi_password

  logging.info(f"> connecting to wifi network '{wifi_ssid}'")
  """ TODO what it was changed to
  ip = phew.connect_to_wifi(wifi_ssid, wifi_password, timeout_seconds=30)

  if not ip:
    logging.error(f"! failed to connect to wireless network {wifi_ssid}")
    return False

  logging.info("  - ip address: ", ip)
  """
  import rp2
  rp2.country("GB") 
  wlan = network.WLAN(network.STA_IF)
  wlan.active(True)
  wlan.connect(wifi_ssid, wifi_password)

  start = time.ticks_ms()
  while time.ticks_diff(time.ticks_ms(), start) < 30000:
    if wlan.status() < 0 or wlan.status() >= 3:
      break
    time.sleep(0.5)

  seconds_to_connect = int(time.ticks_diff(time.ticks_ms(), start) / 1000)

  if wlan.status() != 3:
    logging.error(f"! failed to connect to wireless network {wifi_ssid}")
    return False

  # a slow connection time will drain the battery faster and may
  # indicate a poor quality connection
  if seconds_to_connect > 5:
    logging.warn("  - took", seconds_to_connect, "seconds to connect to wifi")

  ip_address = wlan.ifconfig()[0]
  logging.info("  - ip address: ", ip_address)

  return True

# log the error, blink the warning led, and go back to sleep
def halt(message):
  logging.error(message)
  warn_led(WARN_LED_BLINK)
  sleep()

# log the exception, blink the warning led, and go back to sleep
def exception(exc):
  import sys, io
  buf = io.StringIO()
  sys.print_exception(exc, buf)
  logging.exception("! " + buf.getvalue())
  warn_led(WARN_LED_BLINK)
  sleep()

# returns True if we've used up 90% of the internal filesystem
def low_disk_space():
  if not phew.remote_mount: # os.statvfs doesn't exist on remote mounts
    return (os.statvfs(".")[3] / os.statvfs(".")[2]) < 0.1   
  return False

# returns True if the rtc clock has been set recently 
def is_clock_set():
  # is the year on or before 2020?
  if rtc.datetime()[0] <= 2020:
    return False

  if helpers.file_exists("sync_time.txt"):
    now_str = helpers.datetime_string()
    now = helpers.timestamp(now_str)

    time_entries = []
    with open("sync_time.txt", "r") as timefile:
      time_entries = timefile.read().split("\n")

    # read the first line from the time file
    sync = now
    for entry in time_entries:
      if entry:
        sync = helpers.timestamp(entry)
        break

    seconds_since_sync = now - sync
    if seconds_since_sync >= 0:  # there's the rare chance of having a newer sync time than what the RTC reports
      try:
        if seconds_since_sync < (config.resync_frequency * 60 * 60):
          return True

        logging.info(f"  - rtc has not been synched for {config.resync_frequency} hour(s)")
      except AttributeError:
        return True

  return False

# connect to wifi and attempt to fetch the current time from an ntp server
def sync_clock_from_ntp():
  from phew import ntp
  if not connect_to_wifi():
    return False
  #TODO Fetch only does one attempt. Can also optionally set Pico RTC (do we want this?)
  timestamp = ntp.fetch()
  if not timestamp:
    logging.error("  - failed to fetch time from ntp server")
    return False  

  # fixes an issue where sometimes the RTC would not pick up the new time
  i2c.writeto_mem(0x51, 0x00, b'\x10') # reset the rtc so we can change the time
  rtc.datetime(timestamp) # set the time on the rtc chip
  i2c.writeto_mem(0x51, 0x00, b'\x00') # ensure rtc is running
  rtc.enable_timer_interrupt(False)

  # read back the RTC time to confirm it was updated successfully
  dt = rtc.datetime()
  if dt != timestamp[0:7]:
    logging.error("  - failed to update rtc")
    if helpers.file_exists("sync_time.txt"):
      os.remove("sync_time.txt")
    return False

  logging.info("  - rtc synched")
  
  # write out the sync time log
  with open("sync_time.txt", "w") as syncfile:
    syncfile.write("{0:04d}-{1:02d}-{2:02d}T{3:02d}:{4:02d}:{5:02d}Z".format(*timestamp))  

  return True

# set the state of the warning led (off, on, blinking)
def warn_led(state):
  if state == WARN_LED_OFF:
    rtc.set_clock_output(PCF85063A.CLOCK_OUT_OFF)
  elif state == WARN_LED_ON:
    rtc.set_clock_output(PCF85063A.CLOCK_OUT_1024HZ)
  elif state == WARN_LED_BLINK:
    rtc.set_clock_output(PCF85063A.CLOCK_OUT_1HZ)
    
# the pcf85063a defaults to 32KHz clock output so need to explicitly turn off
warn_led(WARN_LED_OFF)


# returns the reason the board woke up from deep sleep
def get_wake_reason():
  import wakeup
  
  wake_reason = None
  if wakeup.get_gpio_state() & (1 << BUTTON_PIN):
    wake_reason = WAKE_REASON_BUTTON_PRESS
  elif wakeup.get_gpio_state() & (1 << RTC_ALARM_PIN):
    wake_reason = WAKE_REASON_RTC_ALARM
  # TODO Temporarily removing this as false reporting on non-camera boards
  #elif not external_trigger_pin.value():
  #  wake_reason = WAKE_REASON_EXTERNAL_TRIGGER
  elif vbus_present:
    wake_reason = WAKE_REASON_USB_POWERED
  return wake_reason

# convert a wake reason into it's name
def wake_reason_name(wake_reason):
  names = {
    None: "unknown",
    WAKE_REASON_PROVISION: "provisioning",
    WAKE_REASON_BUTTON_PRESS: "button",
    WAKE_REASON_RTC_ALARM: "rtc_alarm",
    WAKE_REASON_EXTERNAL_TRIGGER: "external_trigger",
    WAKE_REASON_RAIN_TRIGGER: "rain_sensor",
    WAKE_REASON_USB_POWERED: "usb_powered"
  }
  return names.get(wake_reason)

# get the readings from the on board sensors
def get_sensor_readings():
  seconds_since_last = 0
  now_str = helpers.datetime_string()
  if helpers.file_exists("last_time.txt"):
    now = helpers.timestamp(now_str)

    time_entries = []
    with open("last_time.txt", "r") as timefile:
      time_entries = timefile.read().split("\n")

    # read the first line from the time file
    last = now
    for entry in time_entries:
      if entry:
        last = helpers.timestamp(entry)
        break

    seconds_since_last = now - last
    logging.info(f"  - seconds since last reading: {seconds_since_last}")


  readings = get_board().get_sensor_readings(seconds_since_last)
  # readings["voltage"] = 0.0 # battery_voltage #Temporarily removed until issue is fixed

  # write out the last time log
  with open("last_time.txt", "w") as timefile:
    timefile.write(now_str)  

  return readings

# save the provided readings into a todays readings data file
def save_reading(readings):
  # open todays reading file and save readings
  helpers.mkdir_safe("readings")
  readings_filename = f"readings/{helpers.datetime_file_string()}.txt"
  new_file = not helpers.file_exists(readings_filename)
  with open(readings_filename, "a") as f:
    if new_file:
      # new readings file so write out column headings first
      f.write("timestamp," + ",".join(readings.keys()) + "\r\n")

    # write sensor data
    row = [helpers.datetime_string()]
    for key in readings.keys():
      row.append(str(readings[key]))
    f.write(",".join(row) + "\r\n")


# save the provided readings into a cache file for future uploading
def cache_upload(readings):
  payload = {
    "nickname": config.nickname,
    "timestamp": helpers.datetime_string(),
    "readings": readings,
    "model": model,
    "uid": helpers.uid()
  }

  uploads_filename = f"uploads/{helpers.datetime_file_string()}.json"
  helpers.mkdir_safe("uploads")
  with open(uploads_filename, "w") as upload_file:
    #json.dump(payload, upload_file) # TODO what it was changed to
    upload_file.write(ujson.dumps(payload))

# return the number of cached results waiting to be uploaded
def cached_upload_count():
  try:
    return len(os.listdir("uploads"))
  except OSError:
    return 0

# returns True if we have more cached uploads than our config allows
def is_upload_needed():
  return cached_upload_count() >= config.upload_frequency

# upload cached readings to the configured destination
def upload_readings():
  if not connect_to_wifi():
    logging.error(f"  - cannot upload readings, wifi connection failed")
    return False

  destination = config.destination
  try:
    exec(f"import enviro.destinations.{destination}")
    destination_module = sys.modules[f"enviro.destinations.{destination}"]
    destination_module.log_destination()

    for cache_file in os.ilistdir("uploads"):
      try:
        with open(f"uploads/{cache_file[0]}", "r") as upload_file:
          status = destination_module.upload_reading(ujson.load(upload_file))
          if status == UPLOAD_SUCCESS:
            os.remove(f"uploads/{cache_file[0]}")
            logging.info(f"  - uploaded {cache_file[0]}")
          elif status == UPLOAD_RATE_LIMITED:
            # write out that we want to attempt a reupload
            with open("reattempt_upload.txt", "w") as attemptfile:
              attemptfile.write("")

            logging.info(f"  - cannot upload '{cache_file[0]}' - rate limited")
            sleep(1)
          elif status == UPLOAD_LOST_SYNC:
            # remove the sync time file to trigger a resync on next boot
            if helpers.file_exists("sync_time.txt"):
              os.remove("sync_time.txt")
             
            # write out that we want to attempt a reupload
            with open("reattempt_upload.txt", "w") as attemptfile:
              attemptfile.write("")

            logging.info(f"  - cannot upload '{cache_file[0]}' - rtc has become out of sync")
            sleep(1)
          elif status == UPLOAD_SKIP_FILE:
            logging.error(f"  ! cannot upload '{cache_file[0]}' to {destination}. Skipping file")
            warn_led(WARN_LED_BLINK)
            continue
          else:
            logging.error(f"  ! failed to upload '{cache_file[0]}' to {destination}")
            return False

      except OSError:
        logging.error(f"  ! failed to open '{cache_file[0]}'")
        return False

      except KeyError:
        logging.error(f"  ! skipping '{cache_file[0]}' as it is missing data. It was likely created by an older version of the enviro firmware")
        
  except ImportError:
    logging.error(f"! cannot find destination {destination}")
    return False

  return True




# define DELAYOFF used to ensure enviro will power down even if it hangs
# ----------------------------------------------------------------------
from machine import Pin
from rp2 import PIO, StateMachine, asm_pio

@asm_pio(sideset_init=PIO.OUT_HIGH)
def delayoff_prog():   
    label('d_loop')
    jmp(y_dec, 'd_loop') [1]
    label('done')
    jmp('done').side(0)

class DELAYOFF:
    def __init__(self, pin, delay, sm_id=0):
        delay_ms=int(delay*1000)
        self._sm = StateMachine(sm_id, delayoff_prog, freq=2000, sideset_base=Pin(pin))
        self._sm.put(delay_ms)
        self._sm.exec('pull()')
        self._sm.exec("mov(y, osr)") #load max count into y
        self._sm.active(1)
        logging.debug(f'> delayoff set on gpio{pin:} for {delay_ms:} ms')

def arm_watchdog():
  # set default alarm now in case processor hangs.  Normally ths is overwritten by sleep()

  if helpers.file_exists("watchdog_live.txt"):
    os.remove("watchdog_live.txt")
    logging.warn("> * * Processor recovered by watchdog * *")
  
  # this code extracted from sleep TODO make into routine shared by both -----------------
  dt = rtc.datetime()
  hour, minute = dt[3:5]
  
  # calculate how many minutes into the day we are
  minute = math.floor(minute / config.reading_frequency) * config.reading_frequency
  minute += config.reading_frequency

  while minute >= 60:      
    minute -= 60
    hour += 1
  if hour >= 24:
    hour -= 24
  ampm = "am" if hour < 12 else "pm"

  logging.info(f"  - setting default alarm to wake at {hour:02}:{minute:02}{ampm}")

  # sleep until next scheduled reading
  rtc.set_alarm(0, minute, hour)
  rtc.enable_alarm_interrupt(True)
  #------------------------------------------------------------------end copied from sleep

  # power will be pulled if frame does not complete in 4 Miniutes
  delayoff = DELAYOFF(HOLD_VSYS_EN_PIN, 4*1000) 
  with open("watchdog_live.txt", "w") as hangfile:
    hangfile.write("")


def startup():
  import sys

  # write startup info into log file
  logging.info("> performing startup")
  logging.debug(f"  - running Enviro {ENVIRO_VERSION}, {sys.version.split('; ')[1]}")

  # get the reason we were woken up
  reason = get_wake_reason()

  # give each board a chance to perform any startup it needs
  # ===========================================================================
  board = get_board()
  if hasattr(board, "startup"):
    continue_startup = board.startup(reason)
    # put the board back to sleep if the startup doesn't need to continue
    # and the RTC has not triggered since we were awoken
    if not continue_startup and not rtc.read_alarm_flag():
      logging.debug("  - wake reason: trigger")
      sleep()

  # log the wake reason
  logging.info("  - wake reason:", wake_reason_name(reason))
  arm_watchdog()

  # also immediately turn on the LED to indicate that we're doing something
  logging.debug("  - turn on activity led")
  pulse_activity_led(0.5)

  # see if we were woken to attempt a reupload
  if helpers.file_exists("reattempt_upload.txt"):
    upload_count = cached_upload_count()
    if upload_count == 0:
      os.remove("reattempt_upload.txt")
      return

    logging.info(f"> {upload_count} cache file(s) still to upload")
    if not upload_readings():
      halt("! reading upload failed")

    os.remove("reattempt_upload.txt")

    # if it was the RTC that woke us, go to sleep until our next scheduled reading
    # otherwise continue with taking new readings etc
    # Note, this *may* result in a missed reading
    if reason == WAKE_REASON_RTC_ALARM:
      sleep()

def sleep(time_override=None):
  if time_override is not None:
    logging.info(f"> going to sleep for {time_override} minute(s)")
  else:
    logging.info("> going to sleep")

  # make sure the rtc flags are cleared before going back to sleep
  logging.debug("  - clearing and disabling previous alarm")
  rtc.clear_timer_flag() # TODO this was removed from 0.0.8
  rtc.clear_alarm_flag()

  # set alarm to wake us up for next reading
  dt = rtc.datetime()
  hour, minute, second = dt[3:6]

  # calculate how many minutes into the day we are
  if time_override is not None:
    minute += time_override
  else:
    # if the time is very close to the end of the minute, advance to the next minute
    # this aims to fix the edge case where the board goes to sleep right as the RTC triggers, thus never waking up
    if second > 55:
      minute += 1
    minute = math.floor(minute / config.reading_frequency) * config.reading_frequency
    minute += config.reading_frequency

  while minute >= 60:      
    minute -= 60
    hour += 1
  if hour >= 24:
    hour -= 24
  ampm = "am" if hour < 12 else "pm"

  logging.info(f"  - setting alarm to wake at {hour:02}:{minute:02}{ampm}")

  # sleep until next scheduled reading
  rtc.set_alarm(0, minute, hour)
  rtc.enable_alarm_interrupt(True)
  
  # delete watchdog file
  if helpers.file_exists("watchdog_live.txt"):
    os.remove("watchdog_live.txt")

  # disable the vsys hold, causing us to turn off
  logging.info("  - shutting down")
  hold_vsys_en_pin.init(Pin.IN)

  # if we're still awake it means power is coming from the USB port in which
  # case we can't (and don't need to) sleep.
  stop_activity_led()

  # if running via mpremote/pyboard.py with a remote mount then we can't
  # reset the board so just exist
  if phew.remote_mount:
    sys.exit()

  # we'll wait here until the rtc timer triggers and then reset the board
  logging.debug("  - on usb power (so can't shutdown). Halt and wait for alarm or user reset instead")
  board = get_board()
  while not rtc.read_alarm_flag():
    if hasattr(board, "check_trigger"):
      board.check_trigger()

    #time.sleep(0.25)

    if button_pin.value(): # allow button to force reset
      break

  logging.debug("  - reset")

  # reset the board
  machine.reset()

The logs shows as follows

2023-01-23 11:58:04 [info     / 111kB] > performing startup
2023-01-23 11:58:04 [debug    / 122kB]   - running Enviro 0.0.9, MicroPython 9dfabcd-dirty on 2022-08-10
2023-01-23 11:58:04 [info     / 116kB]   - wake reason: rtc_alarm
2023-01-23 11:58:04 [info     / 113kB]   - setting default alarm to wake at 12:00pm
2023-01-23 11:58:04 [debug    / 110kB] > delayoff set on gpio2 for 4000000 ms
2023-01-23 11:58:04 [debug    / 107kB]   - turn on activity led
2023-01-23 11:58:05 [debug    / 103kB] > 99 blocks free out of 212
2023-01-23 11:58:05 [debug    / 101kB] > taking new reading
2023-01-23 11:58:05 [info     /  96kB]   - seconds since last reading: 241
2023-01-23 11:58:05 [debug    /  93kB]   - starting sensor
2023-01-23 11:58:05 [debug    /  91kB]   - wait 5 seconds for airflow
2023-01-23 11:58:10 [debug    / 101kB]   - taking pms5003i reading
2023-01-23 11:58:10 [debug    /  98kB]   - taking microphone reading
2023-01-23 11:58:11 [debug    /  91kB] > caching reading for upload
2023-01-23 11:58:11 [info     /  86kB] > 2 cache file(s) not being uploaded. Waiting until there are 3 file(s)
2023-01-23 11:58:11 [info     /  84kB] > going to sleep
2023-01-23 11:58:11 [debug    /  82kB]   - clearing and disabling previous alarm
2023-01-23 11:58:11 [info     /  80kB]   - setting alarm to wake at 12:00pm
2023-01-23 11:58:11 [info     /  77kB]   - shutting down
2023-01-23 12:00:04 [info     / 111kB] > performing startup
2023-01-23 12:00:04 [debug    / 122kB]   - running Enviro 0.0.9, MicroPython 9dfabcd-dirty on 2022-08-10
2023-01-23 12:00:04 [info     / 116kB]   - wake reason: rtc_alarm
2023-01-23 12:00:04 [info     / 113kB]   - setting default alarm to wake at 12:02pm
2023-01-23 12:00:04 [debug    / 110kB] > delayoff set on gpio2 for 4000000 ms
2023-01-23 12:00:04 [debug    / 107kB]   - turn on activity led
2023-01-23 12:00:04 [debug    / 103kB] > 98 blocks free out of 212
2023-01-23 12:00:04 [debug    / 101kB] > taking new reading
2023-01-23 12:00:04 [info     /  96kB]   - seconds since last reading: 119
2023-01-23 12:00:05 [debug    /  93kB]   - starting sensor
2023-01-23 12:00:05 [debug    /  91kB]   - wait 5 seconds for airflow
2023-01-23 12:00:10 [debug    / 101kB]   - taking pms5003i reading
2023-01-23 12:00:10 [debug    /  98kB]   - taking microphone reading
2023-01-23 12:00:10 [debug    /  92kB] > caching reading for upload
2023-01-23 12:00:11 [info     /  87kB] > 3 cache file(s) need uploading
2023-01-23 12:00:11 [info     /  84kB] > connecting to wifi network 'XXXXXXX'
2023-01-23 12:00:20 [warning  /  98kB]   - took 8 seconds to connect to wifi
2023-01-23 12:00:20 [info     /  96kB]   - ip address:  192.168.0.236
2023-01-23 12:00:21 [info     / 115kB] > uploading cached readings to MQTT broker: XXXXXXX
2023-01-23 12:00:29 [info     /  95kB]   - uploaded 2023-01-23T11_54_11Z.json
2023-01-23 12:02:06 [debug    /  48kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 110, in connect
IndexError: bytes index out of range

2023-01-23 12:02:06 [error    /  73kB]   ! failed to upload '2023-01-23T11_58_11Z.json' to mqtt
2023-01-23 12:02:06 [error    /  71kB] ! reading upload failed
2023-01-23 12:02:06 [info     /  69kB] > going to sleep
2023-01-23 12:02:07 [debug    /  66kB]   - clearing and disabling previous alarm
2023-01-23 12:02:07 [info     /  64kB]   - setting alarm to wake at 12:04pm

@Julia7676
Copy link

Hi @dave-ct, Just used PYCHARM to compare your code with mine, took me a little while to spot that you had put the watchdog stuff in as your code is verbatim with mine in that part of the file. After pondering why it was not working I realised that you are correct in questioning the 4000000ms delay. this equates to a little over 1hr 6 minutes. The 2nd value in the DELAYOFF constructor is the required delay is seconds. This needs to be more than the expected execution time but must not be so long that it happens after the alarm. I was think 4 * 60, but my finger typed 4 * 1000. Sorry. With that delay value, the watchdog had no chance of working. Did my testing with a delay of 90 seconds and a reading interval of 2 minutes.

The simple fix is to replace 4 * 1000 with 4 * 60 but if you know how long it is to the alarm the delay valve should be just less than this by a safety margin to ensure they occur in the correct order. My thinking is that If you can get the RTC code to work out seconds to he alarm alarm you can give the maximum time for execution without compromising the operation of the watchdog.

My job now is to go a retrieve my live weather enviro and change it so that it works.

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 23, 2023

@Julia7676 thanks for confirming and the explanation.
One edge case I have exereinced is that if you have a wifi outage then you just keep building up readings, once wifi re-establishes and starts the upload it can take many minutes, I had this happen today and took about 20 minutes to upload the 100's of readings. Not sure how it can take that scenario into account.
But for noraml operation could just use config.reading_frequency to calcuate the delay but to make it slightly less. Will try: delayoff = DELAYOFF(HOLD_VSYS_EN_PIN, config.reading_frequency*45)

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 23, 2023

@ZodiusInfuser have created a pull request with the code from @Julia7676, might need some further testing but will allow others to test before it is merged. Pull request #144.
@Julia7676 have mentioned it is your contribution in the description, but feel free to do another one then this one can be ignored.

@Julia7676
Copy link

@dave-ct Yes long updates after outage are possible. A design descision is needed to decide what should happen. We could allow the destination module to request an extension to the watchdog and RTC or force the destination module to yeald to allow the processor to shut down so that the next scheduled reading is not delayed.

@Julia7676
Copy link

@dave-ct Another option would be to set the contingency alarm to say plus 30 miniutes and the watchdog to 29 miniutes. Normally the contingency alarm would be over ridden by sleep(). Down side of this is recover from processor hangs will take a bit longer.

My prefered fix would be to add a break to the loop in upload_readings(). This could limit the number of readings per Go or test to see how close we are to the next alarm.

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 23, 2023

@Julia7676 I think the simplest solution is just add two setting to the config file for each timer so each user can tune to their own preferences. So far my external weather and urban ones have been going strong for about 1-2 months (readings every 10 minutes and uploads every 3) and only the ones inside, that are more frequent readings, have the issue. The main aim for me is that I don’t want to retrieve the weather enviro from the top of a pole every time it freezes and would be happy if it takes 30 minutes to recover and not take readings. But I see you preferred approach is much more elegant.

@Julia7676
Copy link

@dave-ct In that case I would put two advanced setting in the config file. The contingency alarm set at 30 (miniutes) and the watchdog delay set at 29*60. with an option of say zero to disable the watchdog.

It would make sense to change the number sent to delay of to be in minutes vis

class DELAYOFF:
    def __init__(self, pin, delay, sm_id=0):
        delay_ms = int(delay  * 60 * 1000)
        self._sm = StateMachine(sm_id, delayoff_prog, freq=2000, sideset_base=Pin(pin)) 

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 24, 2023

@Julia7676 @ZodiusInfuser have made changes so you can configure the watchdog timer in the config file and also disable if needed, set to default of 20 minutes on config file or can set to 0 to disable. Instead of two settings for the alarm and watchdog delay, calacuted the alarm by adding 1 minute to the watchdog timer. 7aa4858

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 26, 2023

@Julia7676 @ZodiusInfuser I have been testing the new PIO watchdog code for a few days and tonight after a failure it look slike the delayoff did not correct the issue. It was an enviroUrban and after and upload issue it hung with the falshing red light and sold white light for about 3hrs even though the delayoff was set for 5 minutes and should have kicked in. Checked the logs and even though the watchdog was set it did not remove power. The logs are below. This has only happened on the urban device so far. Any thoughts on this or could there be a mistake in the code in th e pull requests for PIO i used?

2023-01-26 19:04:11 [info     /  74kB]   - shutting down
2023-01-26 19:06:04 [info     / 122kB] > performing startup
2023-01-26 19:06:04 [debug    / 120kB]   - running Enviro 0.0.9, MicroPython 9dfabcd-dirty on 2022-08-10
2023-01-26 19:06:04 [info     / 114kB]   - wake reason: rtc_alarm
2023-01-26 19:06:04 [info     / 111kB]   - setting default alarm to wake at 19:12pm
2023-01-26 19:06:04 [debug    / 108kB] > delayoff set on gpio2 for 300000 ms
2023-01-26 19:06:04 [debug    / 105kB]   - turn on activity led
2023-01-26 19:06:04 [debug    / 101kB] > 98 blocks free out of 212
2023-01-26 19:06:04 [debug    /  98kB] > taking new reading
2023-01-26 19:06:04 [info     /  94kB]   - seconds since last reading: 119
2023-01-26 19:06:05 [debug    /  91kB]   - starting sensor
2023-01-26 19:06:05 [debug    / 118kB]   - wait 5 seconds for airflow
2023-01-26 19:06:10 [debug    /  98kB]   - taking pms5003i reading
2023-01-26 19:06:10 [debug    /  96kB]   - taking microphone reading
2023-01-26 19:06:11 [debug    /  89kB] > caching reading for upload
2023-01-26 19:06:11 [info     / 114kB] > 3 cache file(s) need uploading
2023-01-26 19:06:11 [info     / 111kB] > connecting to wifi network 'XXXXXXXX'
2023-01-26 19:06:17 [info     /  89kB]   - ip address:  XXXXXXXXXX
2023-01-26 19:06:17 [info     / 114kB] > uploading cached readings to MQTT broker: XXXXXXXXXXX
2023-01-26 19:06:36 [info     /  92kB]   - uploaded 2023-01-26T19_02_11Z.json
2023-01-26 19:08:10 [debug    /  78kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 74, in connect
OSError: (-29312, 'MBEDTLS_ERR_SSL_CONN_EOF')

2023-01-26 19:08:10 [error    /  75kB]   ! failed to upload '2023-01-26T19_04_11Z.json' to mqtt
2023-01-26 19:08:10 [error    /  73kB] ! reading upload failed
2023-01-26 19:08:10 [info     /  70kB] > going to sleep
2023-01-26 19:08:10 [debug    /  68kB]   - clearing and disabling previous alarm
2023-01-26 19:08:10 [info     /  66kB]   - setting alarm to wake at 19:10pm
****below is 3 hours later when I disconnected from battery and plugged in to USB to get Logs******
2000-01-01 00:00:04 [info     / 122kB] > performing startup
2000-01-01 00:00:04 [debug    / 120kB]   - running Enviro 0.0.9, MicroPython 9dfabcd-dirty on 2022-08-10
2000-01-01 00:00:04 [info     / 114kB]   - wake reason: usb_powered
2000-01-01 00:00:04 [info     / 112kB]   - setting default alarm to wake at 00:06am

@dave-ct
Copy link
Contributor Author

dave-ct commented Jan 29, 2023

Have had the same issue happen on another urban device plus indoor and weather version. Logs show they all hang when setting the RTC wake alarm after an upload error. They are all seem to be doing setting default alarm to wake at XXXXX and delayoff set on gpio2 for 600000 ms OK prior the issue.

Urban Logs

2023-01-27 07:26:26 [info     /  96kB]   - uploaded 2023-01-27T07_22_11Z.json
2023-01-27 07:26:32 [info     / 105kB]   - uploaded 2023-01-27T07_24_11Z.json
2023-01-27 07:28:15 [debug    /  49kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 110, in connect
IndexError: bytes index out of range

2023-01-27 07:28:15 [error    /  74kB]   ! failed to upload '2023-01-27T07_26_11Z.json' to mqtt
2023-01-27 07:28:15 [error    /  72kB] ! reading upload failed
2023-01-27 07:28:15 [info     /  70kB] > going to sleep
2023-01-27 07:28:15 [debug    /  68kB]   - clearing and disabling previous alarm
2023-01-27 07:28:15 [info     /  65kB]   - setting alarm to wake at 07:30am

Weather Logs

2023-01-27 06:10:28 [info     /  76kB]   - uploaded 2023-01-27T06_06_06Z.json
2023-01-27 06:10:34 [info     / 104kB]   - uploaded 2023-01-27T06_08_06Z.json
2023-01-27 06:12:08 [debug    /  47kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 110, in connect
IndexError: bytes index out of range

2023-01-27 06:12:09 [error    /  71kB]   ! failed to upload '2023-01-27T06_10_06Z.json' to mqtt
2023-01-27 06:12:09 [error    /  70kB] ! reading upload failed
2023-01-27 06:12:09 [info     /  67kB] > going to sleep
2023-01-27 06:12:09 [debug    /  65kB]   - clearing and disabling previous alarm
2023-01-27 06:12:09 [info     /  63kB]   - setting alarm to wake at 06:14am

Indoor Logs

2023-01-28 18:18:29 [debug    /  82kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 70, in connect
OSError: [Errno 103] ECONNABORTED

2023-01-28 18:18:29 [debug    /  79kB]   - an exception occurred when disconnecting mqtt client. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 46, in upload_reading
  File "enviro/mqttsimple.py", line 116, in disconnect
OSError: [Errno 103] ECONNABORTED

2023-01-28 18:18:29 [error    /  76kB]   ! failed to upload '2023-01-28T18_12_05Z.json' to mqtt
2023-01-28 18:18:29 [error    /  74kB] ! reading upload failed
2023-01-28 18:18:29 [info     /  72kB] > going to sleep
2023-01-28 18:18:29 [debug    /  70kB]   - clearing and disabling previous alarm
2023-01-28 18:18:29 [info     /  67kB]   - setting alarm to wake at 18:20pm

@lornova
Copy link

lornova commented Feb 12, 2023

Hi, I'm having huge issues with my Enviro Urban, too. I'm currently running 2 Enviro Indoors flawlessly, while the Enviro Urban works for a while, then it stops, usually remaining with the red led or the pulsing white. But in my case the log ends with the log "connecting to wifi network":

2023-02-12 19:15:09 [info     / 108kB] > connecting to wifi network '*******'
2023-02-12 19:15:15 [info     /  86kB]   - ip address:  10.0.0.172
2023-02-12 19:15:15 [info     / 118kB] > uploading cached readings to Adafruit.io: *******
2023-02-12 19:15:17 [info     /  94kB]   - uploaded 2023-02-12T18_36_39Z.json
2023-02-12 19:15:19 [info     / 114kB]   - cannot upload '2023-02-12T18_45_09Z.json' - rate limited
2023-02-12 19:15:19 [info     / 111kB] > going to sleep for 1 minute(s)
2023-02-12 19:15:19 [debug    / 109kB]   - clearing and disabling previous alarm
2023-02-12 19:15:19 [info     / 107kB]   - setting alarm to wake at 19:16pm
2023-02-12 19:15:19 [info     / 105kB]   - shutting down
2023-02-12 19:16:03 [info     / 113kB] > performing startup
2023-02-12 19:16:03 [debug    / 111kB]   - running Enviro 0.0.9, MicroPython 699477d-dirty on 2023-01-16
2023-02-12 19:16:03 [info     / 121kB]   - wake reason: rtc_alarm
2023-02-12 19:16:03 [debug    / 119kB]   - turn on activity led
2023-02-12 19:16:03 [info     / 117kB] > 3 cache file(s) still to upload
2023-02-12 19:16:03 [info     / 114kB] > connecting to wifi network '*******'
2023-02-12 19:16:07 [error    /  99kB] ! failed to connect to wireless network *******
2023-02-12 19:16:07 [error    /  97kB]   - cannot upload readings, wifi connection failed
2023-02-12 19:16:07 [error    /  95kB] ! reading upload failed
2023-02-12 19:16:07 [info     /  93kB] > going to sleep
2023-02-12 19:16:07 [debug    / 109kB]   - clearing and disabling previous alarm
2023-02-12 19:16:07 [info     / 106kB]   - setting alarm to wake at 19:30pm
2023-02-12 19:16:08 [info     / 104kB]   - shutting down
2000-01-01 00:00:03 [info     / 113kB] > performing startup
2000-01-01 00:00:03 [debug    / 111kB]   - running Enviro 0.0.9, MicroPython 699477d-dirty on 2023-01-16
2000-01-01 00:00:03 [info     / 121kB]   - wake reason: usb_powered
2000-01-01 00:00:03 [debug    / 119kB]   - turn on activity led
2000-01-01 00:00:03 [info     / 117kB] > 3 cache file(s) still to upload
2000-01-01 00:00:03 [info     / 115kB] > connecting to wifi network '*******'

I've tried with new batteries, with a different wifi router and network, I tried the stock 0.0.9 firmware and also the latest master branch with the latest MicroPython... but the issue is still there...

@ZodiusInfuser
Copy link
Member

HI @lornova. Lockups are sadly an ongoing issue with Enviro, that seem to stem from the Pico's WiFi in some form. Such lockups can either occur during Micropython boot-up (for which there is currently no fix) or during Enviro's shutdown.

It is unclear from your log which of the two you are experiencing, but lockups following a WiFi issue do tend to be shutdown lockups. Also the white LED pulsing is another sign of it being a shutdown lockup. The fact it only happens for your Urban and not indoor is puzzling.

One thing you can try, is to grab the code from this Pull Request (#144). It aims to deal with shutdown lockups by using the Pico's PIO to trigger a board reset (when on battery) after certain time. If this lets your Urban keep working, then that confirms it was a shutdown lockup. If not then it is a startup lock-up, which I cannot offer a solution for at this time.

@lornova
Copy link

lornova commented Feb 12, 2023

Thank you, I'll try with that patch.

It is unclear from your log which of the two you are experiencing, but lockups following a WiFi issue do tend to be shutdown lockups.

I believe that I'm experiencing both lookups, as sometimes the board doesn't boot at all and just has the red led as soon as I power it. I also sometimes have issues when connecting to the PC, with the board not being recognized as an USB device (I've seen that in another issue, I increased the sleep at startup and I'll keep monitoring the issue).

The fact it only happens for your Urban and not indoor is puzzling.

I have kept my Urban on the windowsill, inside the Stevenson screen, for a couple of week, and back then I had almost no lockups. Then it suddenly stopped working and my fear was that it broke due to the dew. Then I used flash_nuke.uf2 and started from scratch (I had corrupted time in the RTC and I had to manually fix it via the interactive Python interpreter).

Since then I fear placing the board outside...

@MrDrem
Copy link

MrDrem commented Feb 14, 2023

I seem to be seeing this issue with my Urban now. The Weather and Indoor don't seem to be affected. Power cycling fixes it. I'll try pulling the fix from #144 and see if it helps.

I'm taking readings every 5 minutes, and sending every 6 readings.

2023-02-11 11:45:03 [debug / 115kB] > performing startup
2023-02-11 11:45:03 [info / 123kB] - wake reason: rtc_alarm
2023-02-11 11:45:03 [debug / 107kB] - turn on activity led
2023-02-11 11:45:03 [debug / 104kB] > 101 blocks free out of 212
2023-02-11 11:45:03 [debug / 102kB] > taking new reading
2023-02-11 11:45:03 [info / 98kB] - seconds since last reading: 300
2023-02-11 11:45:04 [debug / 95kB] - starting sensor
2023-02-11 11:45:04 [debug / 93kB] - wait 5 seconds for airflow
2023-02-11 11:45:09 [debug / 106kB] - taking pms5003i reading
2023-02-11 11:45:09 [debug / 104kB] - taking microphone reading
2023-02-11 11:45:09 [debug / 103kB] > caching reading for upload
2023-02-11 11:45:10 [info / 97kB] > 6 cache file(s) need uploading
2023-02-11 11:45:10 [info / 95kB] > connecting to wifi network 'Celestia'
2023-02-11 11:45:17 [warning / 116kB] - took 6 seconds to connect to wifi
2023-02-11 11:45:17 [info / 114kB] - ip address: 192.168.1.213
2023-02-11 11:45:18 [info / 107kB] > uploading cached readings to Influxdb bucket: home
2023-02-11 11:45:18 [debug / 94kB] - an exception occurred when uploading
2023-02-11 11:45:18 [error / 92kB] ! failed to upload '2023-02-11T11_20_09Z.json' to influxdb
2023-02-11 11:45:18 [error / 89kB] ! reading upload failed
2023-02-11 11:45:18 [info / 123kB] > going to sleep
2023-02-11 11:45:18 [debug / 120kB] - clearing and disabling previous alarm
2023-02-11 11:45:18 [info / 118kB] - setting alarm to wake at 11:50am
2000-01-01 00:00:03 [debug / 115kB] > performing startup

@sjefferson99
Copy link
Contributor

I ran the PR #144 on my weather board that has been experiencing the crash where it fails to wake up and hangs with the solid white light and no log entries after going to sleep as mentioned in #87. This watchdog failed to help there and the logs looked identical, the RTC was correct time, so the RP2040 just appears to be in a state where it can't be awoken and a reset is needed, I'm unsure if the watchdog fired or if that was also hung. I flashed the board to micropython 1.19.17 beforehand.

I ran the board for a couple of days successfully on USB, but moving the board out the weather station I got a couple of crashes with I think a red light flashing or solid, requiring power reset where the last log is about failing to upload to influxdb (local database server on the LAN). I am hoping #144 mitigates that while the cause is hunted down, I will try it outside again with the watchdog running and see if the watchdog catches it.

@pur1337
Copy link

pur1337 commented Sep 14, 2023

Late to the party but wanted to at least chime in on what I am seeing. I too am getting the solid white light. I have to cut power, turn it back on and hit the poke before it works right again.

Sometimes I get my data, sometimes it's just gone. I see it getting posted in the log, but checking the local server log I can see the POST request never made it to the server.

Now I have to reset it pretty much every day. From what I have read, this is failure to wake up, so I will try having it run continuously. I've never gotten an error in the log.

2023-09-10 21:45:03 [info / 138kB] > performing startup
2023-09-10 21:45:03 [debug / 135kB] - running Enviro 0.0.9, MicroPython 856e08b1931b88271816a2f60648f6ff332235b2, enviro v1.20.4 on 2023-08-04
2023-09-10 21:45:03 [info / 129kB] - wake reason: rtc_alarm
2023-09-10 21:45:03 [debug / 127kB] - turn on activity led
2023-09-10 21:45:03 [debug / 123kB] > 98 blocks free out of 212
2023-09-10 21:45:03 [debug / 106kB] > taking new reading
2023-09-10 21:45:03 [info / 133kB] - seconds since last reading: 900
2023-09-10 21:45:04 [debug / 131kB] - starting sensor
2023-09-10 21:45:04 [debug / 128kB] - wait 5 seconds for airflow
2023-09-10 21:45:09 [debug / 109kB] - taking pms5003i reading
2023-09-10 21:45:09 [debug / 107kB] - taking microphone reading
2023-09-10 21:45:09 [debug / 137kB] > caching reading for upload
2023-09-10 21:45:09 [info / 132kB] > 2 cache file(s) not being uploaded. Waiting until there are 5 file(s)
2023-09-10 21:45:09 [info / 130kB] > going to sleep
2023-09-10 21:45:10 [debug / 128kB] - clearing and disabling previous alarm
2023-09-10 21:45:10 [info / 126kB] - setting alarm to wake at 22:00pm
2023-09-10 21:45:10 [info / 124kB] - shutting down
2023-09-10 23:23:09 [info / 138kB] > performing startup
2023-09-10 23:23:09 [debug / 135kB] - running Enviro 0.0.9, MicroPython 856e08b1931b88271816a2f60648f6ff332235b2, enviro v1.20.4 on 2023-08-04
2023-09-10 23:23:09 [info / 129kB] - wake reason: rtc_alarm
2023-09-10 23:23:09 [debug / 127kB] - turn on activity led
2023-09-10 23:23:09 [debug / 123kB] > 98 blocks free out of 212
2023-09-10 23:23:09 [debug / 120kB] > taking new reading
2023-09-10 23:23:09 [info / 116kB] - seconds since last reading: 5886
2023-09-10 23:23:09 [debug / 113kB] - starting sensor
2023-09-10 23:23:09 [debug / 111kB] - wait 5 seconds for airflow
2023-09-10 23:23:14 [debug / 124kB] - taking pms5003i reading
2023-09-10 23:23:15 [debug / 122kB] - taking microphone reading
2023-09-10 23:23:15 [debug / 103kB] > caching reading for upload
2023-09-10 23:23:15 [info / 98kB] > 3 cache file(s) not being uploaded. Waiting until there are 5 file(s)
2023-09-10 23:23:15 [info / 96kB] > going to sleep
2023-09-10 23:23:15 [debug / 94kB] - clearing and disabling previous alarm
2023-09-10 23:23:15 [info / 92kB] - setting alarm to wake at 23:30pm
2023-09-10 23:23:15 [info / 137kB] - shutting down
2023-09-10 23:23:16 [debug / 135kB] - on usb power (so can't shutdown). Halt and wait for alarm or user reset instead

2023-09-13 17:50:03 [debug / 120kB] > taking new reading
2023-09-13 17:50:03 [info / 116kB] - seconds since last reading: 300
2023-09-13 17:50:03 [debug / 113kB] - starting sensor
2023-09-13 17:50:03 [debug / 111kB] - wait 5 seconds for airflow
2023-09-13 17:50:08 [debug / 124kB] - taking pms5003i reading
2023-09-13 17:50:08 [debug / 122kB] - taking microphone reading
2023-09-13 17:50:09 [debug / 103kB] > caching reading for upload
2023-09-13 17:50:09 [info / 98kB] > 3 cache file(s) not being uploaded. Waiting until there are 5 file(s)
2023-09-13 17:50:09 [info / 96kB] > going to sleep
2023-09-13 17:50:09 [debug / 94kB] - clearing and disabling previous alarm
2023-09-13 17:50:09 [info / 91kB] - setting alarm to wake at 17:55pm
2023-09-13 17:50:10 [info / 135kB] - shutting down
2000-01-01 00:00:04 [info / 138kB] > performing startup
2000-01-01 00:00:04 [debug / 135kB] - running Enviro 0.0.9, MicroPython 856e08b1931b88271816a2f60648f6ff332235b2, enviro v1.20.4 on 2023-08-04
2000-01-01 00:00:04 [info / 129kB] - wake reason: usb_powered
2000-01-01 00:00:04 [debug / 127kB] - turn on activity led
2000-01-01 00:00:04 [info / 126kB] > clock not set, synchronise from ntp server
2000-01-01 00:00:04 [info / 123kB] > connecting to wifi network '--'
2000-01-01 00:00:07 [info / 110kB] - ip address: 192.168.1.40
2023-09-13 19:05:29 [info / 107kB] - rtc synched
2023-09-13 19:05:29 [debug / 103kB] > 98 blocks free out of 212
2023-09-13 19:05:29 [debug / 136kB] > taking new reading
2023-09-13 19:05:29 [info / 131kB] - seconds since last reading: 4526
2023-09-13 19:05:29 [debug / 128kB] - starting sensor
2023-09-13 19:05:29 [debug / 126kB] - wait 5 seconds for airflow
2023-09-13 19:05:34 [debug / 107kB] - taking pms5003i reading
2023-09-13 19:05:34 [debug / 105kB] - taking microphone reading
2023-09-13 19:05:35 [debug / 135kB] > caching reading for upload
2023-09-13 19:05:35 [info / 129kB] > 4 cache file(s) not being uploaded. Waiting until there are 5 file(s)
2023-09-13 19:05:35 [info / 127kB] > going to sleep
2023-09-13 19:05:35 [debug / 125kB] - clearing and disabling previous alarm
2023-09-13 19:05:35 [info / 123kB] - setting alarm to wake at 19:10pm
2023-09-13 19:05:35 [info / 120kB] - shutting down
2023-09-13 19:05:35 [debug / 118kB] - on usb power (so can't shutdown). Halt and wait for alarm or user reset instead

@sjefferson99
Copy link
Contributor

I fixed nearly all of my issues by applying fixes to the wifi handling presented by another user. I captured these into PR #199 and have been running stable since then. Worth a try to see if your issues is caused by the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants