-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random WDT after periodically connecting and disconnecting Wifi. #6172
Comments
First non-wdt with a stack trace appeared just now, perhaps it would be useful - crashes in malloc:
|
The below info might be of interest: With what might be a similar issue, for my sketches I get Hardware Watchdog Reboots, which if I add a "delay(2000)" immediately after the "WiFi.mode(WIFI_OFF)", then the Hardware Watchdog always triggers during this delay. However, without this "delay(2000)", the Hardware Watchdog triggers some dozen or more program steps after the "WiFi.mode(WIFI_OFF)" and subsequent "WiFi.forceSleepBegin()". I reconnect over SSL every 8 minutes or so (and my SSL data exchanges always succeed and look the same as far as I can tell), and the subsequent crashes are intermittent (sometimes 3 in an hour, sometimes none for 24 hours). My free-Heap does not appear to drop below approx 21000 bytes, and my free-Stack does not appear to drop below 1360 bytes. [I am using an ESP8266 D1mini with the "ESP8266 Arduino Github software" (as at 31May19) with "BearSSL" and "IwIP variant v2 lower memory"]. I have not yet found a solution!.. |
@Rob58329 , thanks for the information. |
@JiriBilek: My original ESP8266-sketch used the “ESP8266 Arduino Github Software core” from about 18 months ago (ie. which used axTLS and IwIP v1.4 (or perhaps earlier)), and my ESP8266 units were running for 3++ months without issue. Annoyingly I have not yet been able to work out exactly which version-date of the Github Software I was using from 18 month ago... But if the same sketch is compiled on the current (v31May19) “ESP8266 Arduino Github Software” (or in-fact using any of the Github versions from the last couple of months), it generates the above detailed intermittent Hardware Watchdog crashes. I still get the same crashes if I use the current Github software (v31May19) with axTLS, or BearSSL(which the sketch needed slight modification for), and with IwIP v1.4 or with IwIP v2. I note that the current Github core compiles my sketches to use a bit more RAM when running (vs. 18months ago), but I currently don't think this is the issue. |
What's interesting here is that you're turning off WiFi without ever closing the SSL connection. Would you be able to check if the same thing happens if you add a It's possible that when you come back after the next time, or when the wifi power off event happens that some part of the LWIP closes everything, but the client still has a pointer to something that's no longer valid. Then when you reconnect the first thing the client will do is to try to ::close() itself and uses this pointer and boom, memory corruption (==crash, WDT, whatever). |
Thanks for an idea, but I am getting compilation error: class BearSSL::WiFiClientSecure' has no member named 'close' |
Oops, it's |
I see. I am stopping the client in |
There goes the easy bit. :( Can you make the WiFiClientSecure a local variable in your send routine? If that works reliably then it means there is a data lifetime issue in the client that needs looking at. If it fails, too, then there's something very strange going on. |
I replaced It still crashed. I don't think this is related in any way to the WiFiClientSecure. I think it's something in the SDK blob at this point going weird. |
I replaced WiFiClientSecure with plain WiFiClient. Even plain WiFiClient has the same crash in the waiting portion, so it's not HTTPS related, even. Either WiFi powerdown/up causes the crashes or LWIP does (maybe trying something that's not valid to close a TCP connection buffer or something?)
|
Using the base WiFiClient and only doing the WiFiClient.connect() and .WiFiClient.stop() in the loop (all data transmission is removed) also results in occasional WDTs.
|
Thanks, not being related to the SSL it is even more annoying. |
It's not in LWIP, either. It's in the core blob. I commented everything related to the WiFiClient:
I still got a WDT overnight:
There was another issue, same problem, but I can't seem to find it now. WDT's handled by the RTC block, so I don't think you can get any info on where it happened. To the CPU it looks like a simple reset. |
Just to be clear, the actual guts of WiFi power off/power on stuff is in the Espressif binary-only blob. So there is no way to debug or anything we can do here about it. |
@earlephilhower , thanks a lot for your effort. I changed the title of this issue, it is misleading now. |
can you try change see if it fixed. |
Yesterday, I compiled the original test sketch with an old core (2.5.0 dev, commit 641c5cd) and BearSSL. |
Dear:
Please send questions to this email address:[email protected] doesn't deal with technical issues
------------------ 原始邮件 ------------------
发件人: "Jiri Bilek"<[email protected]>;
发送时间: 2019年6月10日(星期一) 下午2:20
收件人: "esp8266/Arduino"<[email protected]>;
抄送: "Subscribed"<[email protected]>;
主题: Re: [esp8266/Arduino] Random WDT after periodically connecting anddisconnecting Wifi. (#6172)
Yesterday, I compiled the original test sketch with an old core (2.5.0 dev, commit 641c5cd) and BearSSL.
It run for more than 18 hours without a problem. This confirms that the problem is not in the ssl library.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@tbdltee : I tried what you suggested, unfortunately the wdts and even exceptions are not gone, although appear much less frequent (once per 2 hours approximately). The exception stack I received:
|
@JiriBilek, general speaking, I used to have the wdt-reset situation and I've found out that I need to call the ESP function in the strict order, so wdt_reset never happended to me again. Here are my general code:
3.Sequence to turn-off WiFi, in this case your wifiDisconnect() function the delay (1); must be added. I don't know why but ESP seems not working without it. See if it helps? |
@tbdltee: Thanks for the information but unfortunately, the proposed changes didn't work either. The device fires wdt in ca 1.5 hour frequency. The relevant part of the code:
|
@JiriBilek. Thanks for update. |
I can own this problem too. I spent ages trying everything. The issue is intermittent and I found my 'solution' was just to comment out my WiFi.disconnect(); line, and then it runs perfectly stably. I have not tested any effect on power consumption. Note the crash site is variable intermittent and some lines after the disconnect call. As I tinkered the problem got worse, not better and removing that line made all the difference. I can't explain it, but it is a simple thing to try and report back on. PaulS. 12TA. |
@JiriBilek: I note that [earlephilhower] above said that the guts of the "WiFi.mode(WIFI_OFF)" and similar commands are actually in the Espressif SDK. Also that the latest esp8266 github (eg v31May19) for the Arduino IDE has a "Generic ESP8266 Module" with the options to select 3 different versions of the SDK, namely: "nonos-sdk 2.2.1 (legacy)=v2.1.0-10-g509eae8", "2.2.2-190313 (testing)=v2.2.1-61-gc7b580c" and "sdk pre-3 (known issues)=v2.2.0-28-g89920dc". For my current sketch which has the "Hardware Watchdog being intermittently triggered by (shortly after) WiFi.mode(WIFI_OFF)" issue:
So, as earlier github software using “sdk pre-3” works fine, I do not think the SDK version is causing my Hardware Watchdog crash issue. Instead, I note that:
Update (28Jul19):
Specifically, I edited the file “libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp” and commented out relevant 2 lines so they say:
And this solved my intermittent Hardware WDT crashes (I have now been running 8 sensors connecting to a remote server every 10 minutes approx for over 4 weeks using the 31May19 github software without a single WDT crash). I hope this info will be useful to someone. |
@Rob58329: I think I am using the legacy core in all my tests. Not sure because now, I don't have my computer with me. Will be back in July. |
I could reproduce the bug, and it vanished with a call to sketch (same as above without any form of WiFiClient like @earlephilhower did), with the fix:
|
Sorry, my only experience w/PIO is through the ATOM IDE for building Marlin. However, you can manually run GDB and get the same info That's what the ESPExceptionDecoder is doing, anyway. A CLI utility might be handy if there really is no way to debug stack traces with PIO. As for OOM, it doesn't actually crash the machine when a malloc/new fails, but with debugging enabled it logs the address of the caller so that later, if you don't check the new/malloc return value and use the pointer (to 0) the resulting crash will be easier to debug. |
example GDB for one of the failing mallocs:
And, of course, only the xtensa-*gdb distributed with the Arduino code can be used because your native gdb on Linux or Mac will only understand x86 instructions. |
The failure to allocate 620 bytes was when running this macro in my code: typedef std::shared_ptr<ControllerSettingsStruct> ControllerSettingsStruct_ptr_type;
#define MakeControllerSettings(T) ControllerSettingsStruct_ptr_type ControllerSettingsStruct_ptr(new ControllerSettingsStruct());\
ControllerSettingsStruct& T = *ControllerSettingsStruct_ptr; This is called from: MakeControllerSettings(ControllerSettings);
LoadControllerSettings(event->ControllerIndex, ControllerSettings); That's something that may be happening now, with the core debug strings active. |
I really don't get it. This is really frustrating. The only things changed are:
|
What if you enable core & OOM debug back again ? (Heisenberg effect) |
I was curious if in my setup the OOM debug will tell anything. It may be you are chasing another bug because I ran my node for one day and no OOM messages appeared. There were WDT as usual, though. |
The OOM was happening on my system with the core debug enabled. @d-a-v I did enable CORE and OOM debug last night and was still not able to connect, so I was tracking down some of the other changes to get something which makes it reproducible. I stopped at 2am for obvious reasons :) Just to be sure I am not missing anything else, I do clean builds for every attempt, so it may take a while to track all, even when doing a binary search on the files checked out. |
The workarhack I was using for the failing connection was to |
I do have the 1000 msec delay, but what do you mean by "after a timeout" ? |
(finished editing # 1) (that's not python :) |
I removed all code related to WIFI_OFF and then it was capable of connecting with a very small custom build (only including a few plugins in ESPeasy), but the same code running a "normal build" (just more plugins) cannot connect to WiFi anymore. (Least amount or reboots was 55 until it finally succeeded) Tomorrow I will strip all fancy WiFi related code and just use something related to WiFiMulti class (and will make a pull request to allow working with hidden SSIDs and allow to do wifi off between reconnects) |
I also mentioned it here: platformio/platform-espressif8266#166 (comment) Just to be sure, since it does often result in WiFi connect issues.
|
Should not be needed
No, they execute in SYS
Yes, the entire call tree needs to be in iram, which is why ISRs should be kept simple and isolated
Depends on the callback. Functions that execute in CONT are the most relaxed. Functions that execute in SYS, such as Ticker and the wifi events, can't call delay, yield, blocking functions, etc. and are subject to stricter timing requirements vs. CONT. |
Check. |
Here's an attempt to work on a common basis #6356 |
Just to confirm that the @d-a-v commit of 5Sept19 “Experimental: add new WiFi (pseudo) modes: WIFI_SHUTDOWN & WIFI_RESUME #6356” fixes my issue of “intermittent Hardware WDT crashes after "WiFi.mode(WIFI_OFF)". (I have now been running with the github version of 6Sep19 for nearly 3 weeks without a crash!) Many thanks! |
Haven't been here for a while, sorry. |
Basic Infos
Platform
Settings in IDE
Problem Description
I am observing unstable behavior of the ESP8266 when I am turning on and off the Wifi frequently.
The example sketch is taken from my code and is greatly simplified. The point is to run ESP8266 with wifi turned off (low power consumption), accumulate data and send them to a server periodically. Here, the period is 30 seconds, in reality it is much longer, of course.
With axTLS the sketch runs easily overnight (e. g. 12 hours without reset, didn't test longer time), with BearSSL it does not survive 15 minutes, generally.
If you uncomment the prints in
loop()
, you can see that the wdt fires outside the sketch, the last character printed is always '-'. In this case use putty as serial monitor, the arduino serial monitor does not interpret BS character and you will get full screen of dots and hyphens :)I tried to add delays in
wifiDisconnect
function, but no change.I know your time is precious, so I tried to be as specific as I could. I am lost in debugging wdt outside the
loop
function, though.Do you have any idea what to test? I hope the sketch is ok.
The same mechanism with disconnecting wifi is running in my devices based on v 2.5.0 dated 8-2018 (commit 641c5cd) with axTLS for months without an issue.
MCVE Sketch
Debug Messages
Note that the 404 response is ok, we are using www.example.com server to test.
The text was updated successfully, but these errors were encountered: