Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random WDT after periodically connecting and disconnecting Wifi. #6172

Closed
5 tasks done
JiriBilek opened this issue Jun 1, 2019 · 98 comments
Closed
5 tasks done

Random WDT after periodically connecting and disconnecting Wifi. #6172

JiriBilek opened this issue Jun 1, 2019 · 98 comments
Assignees
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@JiriBilek
Copy link
Contributor

JiriBilek commented Jun 1, 2019

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • [n/a ] If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: NodeMCU 1.0 or ESP-12E
  • Core Version: 2.5.1, git 2.6.0-dev #455583b from 5-30-2019
  • Development Env: Arduino IDE
  • Operating System: Windows

Settings in IDE

  • Module: NodeMCU or Generic ESP8266 Module
  • Flash Mode: qio
  • Flash Size: 4MB (1MB SPIFFS)
  • lwip Variant: v2 Lower Memory
  • Reset Method: nodemcu
  • Flash Frequency: 40Mhz
  • CPU Frequency: 80Mhz or 160MHz
  • Upload Using: SERIAL
  • Upload Speed: 921600
  • SSL Support: either all or basic SSL ciphers

Problem Description

I am observing unstable behavior of the ESP8266 when I am turning on and off the Wifi frequently.
The example sketch is taken from my code and is greatly simplified. The point is to run ESP8266 with wifi turned off (low power consumption), accumulate data and send them to a server periodically. Here, the period is 30 seconds, in reality it is much longer, of course.
With axTLS the sketch runs easily overnight (e. g. 12 hours without reset, didn't test longer time), with BearSSL it does not survive 15 minutes, generally.
If you uncomment the prints in loop(), you can see that the wdt fires outside the sketch, the last character printed is always '-'. In this case use putty as serial monitor, the arduino serial monitor does not interpret BS character and you will get full screen of dots and hyphens :)
I tried to add delays in wifiDisconnect function, but no change.

I know your time is precious, so I tried to be as specific as I could. I am lost in debugging wdt outside the loop function, though.

Do you have any idea what to test? I hope the sketch is ok.
The same mechanism with disconnecting wifi is running in my devices based on v 2.5.0 dated 8-2018 (commit 641c5cd) with axTLS for months without an issue.

MCVE Sketch

//#define USING_AXTLS

#include <ESP8266WiFi.h>
#ifdef USING_AXTLS
    #include "WiFiClientSecureAxTLS.h"
    using namespace axTLS;
#endif

#define SSID "***"
#define PASSWORD "***"

static uint32_t MILLIS_TO_WAKE_UP = 30*1000;  // wake up after 30 seconds
const char* SERVERNAME = "www.example.com";
uint16_t PORT = 443;

#ifdef USING_AXTLS
    axTLS::WiFiClientSecure wifiClient;
#else
    BearSSL::WiFiClientSecure wifiClient;
#endif

uint32_t lastSleep = 0;  // millis of the last time when forced to sleep

void setup() {
    WiFi.setAutoConnect(false);
    Serial.begin(115200);

    pinMode(2, OUTPUT);
    digitalWrite(2, LOW);  // Start indicator

    // Wait for GPIO0 down as a start condition (we want to stop here after a wdt reset)
/*    Serial.println(F("\nConnect GPIO0 to GND to start"));
    pinMode(0, INPUT_PULLUP);
    while (digitalRead(0) == HIGH)
        delay(100);
*/
    pinMode(2, INPUT);  // Back to default

    Serial.print(ESP.getSdkVersion());
#ifdef USING_AXTLS
    Serial.println(F(", axTLS"));
#else
    Serial.println(F(", BearSSL"));
#endif

    wifiConnect();
    wifiSend(3);
    wifiDisconnect();
}

void loop() {
    //Serial.print("\x08.");
    
    if (millis() - lastSleep > MILLIS_TO_WAKE_UP) {
        wifiConnect();
        wifiSend(2);
        wifiDisconnect();
    }
    
    //Serial.print("\x08-");
}

void wifiConnect(void) {
    WiFi.mode(WIFI_STA);
    delay(100);

    WiFi.begin(SSID, PASSWORD);
    Serial.printf_P(PSTR("Connecting to %s "), SSID);

    while (WiFi.status() == WL_DISCONNECTED) {
        Serial.write('.');
        delay(500);
    }
    Serial.println();

    if (WiFi.status() == WL_CONNECTED) {
        Serial.printf_P(PSTR("WiFi connected (RSSI %d), IP address: %s, "), WiFi.RSSI(), WiFi.localIP().toString().c_str());
        Serial.printf_P(PSTR("mem: %d\r\n"), ESP.getFreeHeap());

        if (time(nullptr) < 100000000)
            readTime();
    }
}

void wifiDisconnect(void) {
    // Disconnecting wifi
    Serial.print(F("Disconnecting client"));
    wifiClient.stop();

    Serial.print(F(", wifi"));
    WiFi.disconnect();
    WiFi.mode(WIFI_OFF);
    delay(100);  // FIXME

    Serial.println(F(", sleeping"));
    WiFi.forceSleepBegin();  // turn off ESP8266 RF
    delay(100);  // FIXME

    lastSleep = millis();
}

boolean wifiSend(int8_t status) {
    // Check the wifi
    if (WiFi.status() != WL_CONNECTED) {
        Serial.println(F("[WiFi] Not connected to AP"));
        return false;
    }

#ifndef USING_AXTLS
    wifiClient.setInsecure();  // for testing ok
#endif    

   if (wifiClient.connect(SERVERNAME, PORT)) {
        Serial.println(F("[WiFi] Connected to server"));
    }
    else {
        Serial.println(F("[WiFi] Connection to server failed"));
        return false;
    }
    
    if (wifiClient.connected()) {
        // GET /test HTTP/1.1
        wifiClient.printf_P(PSTR("GET /test HTTP/1.1\nHost: %s\n\n"), SERVERNAME);
        
        Serial.print(F("[WiFi] Data sent, waiting for response ... "));

        // Wait max 5 seconds for server response
        long m = millis();
        while (millis() - m < 5000 && !wifiClient.available()) {
            delay(100);
        }

        // Read the response header
        Serial.println();
        while (wifiClient.connected()) {
            String line = wifiClient.readStringUntil('\n');
            Serial.println(line);
            if (line == "\r") {
//                Serial.println("headers received");
                break;
            }
            yield();
        }
        
        // Read and discard the data
        while (wifiClient.available() && wifiClient.connected()) {
            String line = wifiClient.readStringUntil('\n');
            yield();
        }
    }
    return true;
}

void readTime(void) {
    if (WiFi.status() != WL_CONNECTED) {
        return;
    }

    Serial.print(F("Setting time using SNTP "));

    configTime(1 * 3600, 0, "tik.cesnet.cz", "pool.ntp.org");

    // Read time, wait 5 seconds
    uint32_t m = millis();
    time_t now = time(nullptr);
    while (now < 100000000 && millis() - m < 5000) {
        delay(100);
        Serial.write('.');
        now = time(nullptr);
    }
    Serial.println();

    if (now < 100000000) {
        Serial.println(F("Time was not set."));
    }
    else {
        Serial.print(F("Current time: "));
        Serial.println(ctime(&now));
    }
}

Debug Messages

Note that the 404 response is ok, we are using www.example.com server to test.

Connecting to BILNet ..........
WiFi connected (RSSI -52), IP address: 192.168.1.102, mem: 44784
[WiFi] Connected to server
[WiFi] Data sent, waiting for response ... 
HTTP/1.1 404 Not Found
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 01 Jun 2019 09:14:29 GMT
Expires: Sat, 08 Jun 2019 09:14:29 GMT
Last-Modified: Tue, 28 May 2019 06:46:04 GMT
Server: ECS (dcb/7EA6)
Vary: Accept-Encoding
X-Cache: 404-HIT
Content-Length: 1270

Disconnecting client, wifi, sleeping
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
vffffffff
~ld
2.2.1(cfd48f3), BearSSL

@JiriBilek
Copy link
Contributor Author

JiriBilek commented Jun 1, 2019

First non-wdt with a stack trace appeared just now, perhaps it would be useful - crashes in malloc:

Exception (2):
epc1=0x3ffeec3c epc2=0x00000000 epc3=0x00000000 excvaddr=0x3ffeec3c depc=0x00000000

Exception 2: InstructionFetchError: Processor internal physical address or data error during instruction fetch
Decoding 85 results
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x4010458c: lmacProcessAckTimeout at ?? line ?
0x4020f763: new_linkoutput at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 235
0x4020fb54: ethernet_output at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 312
0x40105159: ets_timer_disarm at ?? line ?
0x402054c3: loop_task(ETSEventTag*) at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_main.cpp line 140
0x40216d7f: etharp_raw at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1161
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x40211264: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40216f7a: etharp_request at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1202
0x4020fba8: do_memp_malloc_pool at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 254
0x40216fe4: etharp_tmr at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 203
0x40211264: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40211264: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40211274: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 243
0x4020fc0e: memp_free at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 447
0x4021140c: sys_check_timeouts at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 390
0x4023b434: ets_timer_handler_isr at ?? line ?
0x4023b441: ets_timer_handler_isr at ?? line ?
0x4023b486: ets_timer_handler_isr at ?? line ?
0x402054c3: loop_task(ETSEventTag*) at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_main.cpp line 140
0x40104980: call_user_start_local at ?? line ?
0x40104986: call_user_start_local at ?? line ?
0x4010000d: call_user_start at ?? line ?
0x4024a9e8: node_remove_from_list at ?? line ?
0x401026ee: wDev_ProcessFiq at ?? line ?
0x4021b5c7: sha2small_out at /home/earle/Arduino/hardware/esp8266com/esp8266/tools/sdk/ssl/bearssl/src/hash/sha2small.c line 249
0x4024a8b0: node_remove_from_list at ?? line ?
0x4021b608: br_sha256_out at /home/earle/Arduino/hardware/esp8266com/esp8266/tools/sdk/ssl/bearssl/src/hash/sha2small.c line 305
0x4024a8b0: node_remove_from_list at ?? line ?
0x40221f55: br_hmac_out at /home/earle/Arduino/hardware/esp8266com/esp8266/tools/sdk/ssl/bearssl/src/mac/hmac.c line 120
0x4024a8b0: node_remove_from_list at ?? line ?
0x4023545b: pp_attach at ?? line ?
0x402354aa: pp_attach at ?? line ?
0x402355b6: pp_attach at ?? line ?
0x4023545b: pp_attach at ?? line ?
0x402354aa: pp_attach at ?? line ?
0x402355b6: pp_attach at ?? line ?
0x40101482: pp_post at ?? line ?
0x40234567: ppTxPkt at ?? line ?
0x40227947: ieee80211_output_pbuf at ?? line ?
0x40104eff: wdt_feed at ?? line ?
0x40101482: pp_post at ?? line ?
0x40104877: lmacRxDone at ?? line ?
0x40102199: trc_NeedRTS at ?? line ?
0x40101482: pp_post at ?? line ?
0x4010236a: trc_NeedRTS at ?? line ?
0x4020f763: new_linkoutput at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 235
0x401027aa: wDev_ProcessFiq at ?? line ?
0x40102544: wDev_ProcessFiq at ?? line ?
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x40101482: pp_post at ?? line ?
0x40100daf: pp_soft_wdt_feed_local at ?? line ?
0x4010065c: _umm_free at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1304
0x40102522: wDev_MacTim1Arm at ?? line ?
0x40102586: wDev_ProcessFiq at ?? line ?
0x4020f4e1: glue2esp_linkoutput at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/glue-esp/lwip-esp.c line 299
0x402133f6: pbuf_free_LWIP2 at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 786 (discriminator 1)
0x40102544: wDev_ProcessFiq at ?? line ?
0x402187d4: mem_malloc at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 210
0x40249f70: node_remove_from_list at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x4010065c: _umm_free at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1304
0x40100aa4: free at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1764
0x401001c0: millis at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_wiring.cpp line 186
0x40205540: esp_yield at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_main.cpp line 97
0x40205561: esp_schedule at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_main.cpp line 102
0x402055f9: loop_wrapper() at D:\Programy\arduino\hardware\esp8266com-git_version\esp8266\cores\esp8266/core_esp8266_main.cpp line 134

@Rob58329
Copy link

Rob58329 commented Jun 4, 2019

The below info might be of interest:

With what might be a similar issue, for my sketches I get Hardware Watchdog Reboots, which if I add a "delay(2000)" immediately after the "WiFi.mode(WIFI_OFF)", then the Hardware Watchdog always triggers during this delay. However, without this "delay(2000)", the Hardware Watchdog triggers some dozen or more program steps after the "WiFi.mode(WIFI_OFF)" and subsequent "WiFi.forceSleepBegin()".

I reconnect over SSL every 8 minutes or so (and my SSL data exchanges always succeed and look the same as far as I can tell), and the subsequent crashes are intermittent (sometimes 3 in an hour, sometimes none for 24 hours).

My free-Heap does not appear to drop below approx 21000 bytes, and my free-Stack does not appear to drop below 1360 bytes.

[I am using an ESP8266 D1mini with the "ESP8266 Arduino Github software" (as at 31May19) with "BearSSL" and "IwIP variant v2 lower memory"].

I have not yet found a solution!..

@JiriBilek
Copy link
Contributor Author

@Rob58329 , thanks for the information.
Have you tried using axTLS? My devices with old core and axTLS work quite reliably (they connect to wifi once in 10 minutes). But merely switching from axTLS to BearSSL brings the issue.

@Rob58329
Copy link

Rob58329 commented Jun 5, 2019

@JiriBilek: My original ESP8266-sketch used the “ESP8266 Arduino Github Software core” from about 18 months ago (ie. which used axTLS and IwIP v1.4 (or perhaps earlier)), and my ESP8266 units were running for 3++ months without issue. Annoyingly I have not yet been able to work out exactly which version-date of the Github Software I was using from 18 month ago...

But if the same sketch is compiled on the current (v31May19) “ESP8266 Arduino Github Software” (or in-fact using any of the Github versions from the last couple of months), it generates the above detailed intermittent Hardware Watchdog crashes. I still get the same crashes if I use the current Github software (v31May19) with axTLS, or BearSSL(which the sketch needed slight modification for), and with IwIP v1.4 or with IwIP v2. I note that the current Github core compiles my sketches to use a bit more RAM when running (vs. 18months ago), but I currently don't think this is the issue.

@earlephilhower
Copy link
Collaborator

What's interesting here is that you're turning off WiFi without ever closing the SSL connection.

Would you be able to check if the same thing happens if you add a wifiClient.close() at the end of your send routine?

It's possible that when you come back after the next time, or when the wifi power off event happens that some part of the LWIP closes everything, but the client still has a pointer to something that's no longer valid. Then when you reconnect the first thing the client will do is to try to ::close() itself and uses this pointer and boom, memory corruption (==crash, WDT, whatever).

@earlephilhower earlephilhower self-assigned this Jun 5, 2019
@JiriBilek
Copy link
Contributor Author

JiriBilek commented Jun 5, 2019

Thanks for an idea, but I am getting compilation error: class BearSSL::WiFiClientSecure' has no member named 'close'
I can't find close() function either in WiFiClientSecure, WiFiClient or Client.

@earlephilhower
Copy link
Collaborator

Oops, it's stop not close I meant to type there.

@JiriBilek
Copy link
Contributor Author

I see. I am stopping the client in wifiDisconnect(). It is called immediately after wifiSend().

@earlephilhower
Copy link
Collaborator

There goes the easy bit. :( Can you make the WiFiClientSecure a local variable in your send routine? If that works reliably then it means there is a data lifetime issue in the client that needs looking at. If it fails, too, then there's something very strange going on.

@earlephilhower
Copy link
Collaborator

I replaced wifiClient with an temporary object who was newed at the end of wifiConnect() and which was deleted immediately after its stop call. So there was 0 possibility of the object using something invalidated after WIFI was turned off.

It still crashed. I don't think this is related in any way to the WiFiClientSecure. I think it's something in the SDK blob at this point going weird.

@d-a-v, @devyte, anything suspicious in the code here?

@earlephilhower
Copy link
Collaborator

I replaced WiFiClientSecure with plain WiFiClient.

Even plain WiFiClient has the same crash in the waiting portion, so it's not HTTPS related, even. Either WiFi powerdown/up causes the crashes or LWIP does (maybe trying something that's not valid to close a TCP connection buffer or something?)

Disconnecting client, wifi, sleeping
Connecting to NOBABIES ...........
WiFi connected (RSSI -47), IP address: 192.168.1.154, mem: 50848
[WiFi] Connected to server
[WiFi] Data sent, waiting for response ... 
HTTP/1.1 404 Not Found

Server: nginx

Date: Wed, 05 Jun 2019 23:31:31 GMT

Content-Type: text/html

Content-Length: 162

Connection: keep-alive



Disconnecting client, wifi, sleeping

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v7d5343a6

@earlephilhower
Copy link
Collaborator

Using the base WiFiClient and only doing the WiFiClient.connect() and .WiFiClient.stop() in the loop (all data transmission is removed) also results in occasional WDTs.

...
WiFi connected (RSSI -43), IP address: 192.168.1.154, mem: 50024
[WiFi] Connected to server
Disconnecting client, wifi, sleeping
Connecting to NOBABIES .......
WiFi connected (RSSI -44), IP address: 192.168.1.154, mem: 50352
[WiFi] Connected to server
Disconnecting client, wifi, sleeping
Connecting to NOBABIES .......
WiFi connected (RSSI -45), IP address: 192.168.1.154, mem: 50024
[WiFi] Connected to server
Disconnecting client, wifi, sleeping

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v7d5343a6
~ld

@JiriBilek
Copy link
Contributor Author

Thanks, not being related to the SSL it is even more annoying.
Is there a chance to trace the wdt? I mean stack dump or any more information?

@earlephilhower
Copy link
Collaborator

It's not in LWIP, either. It's in the core blob. I commented everything related to the WiFiClient:

void wifiDisconnect(void) {
    // Disconnecting wifi
    Serial.print(F("Disconnecting client"));
    //wifiClient->stop();
    delete wifiClient;
    
    Serial.print(F(", wifi"));
    WiFi.disconnect();
    WiFi.mode(WIFI_OFF);
    delay(100);  // FIXME

    Serial.println(F(", sleeping"));
    WiFi.forceSleepBegin();  // turn off ESP8266 RF
    delay(100);  // FIXME

    lastSleep = millis();
}

boolean wifiSend(int8_t status) {
    // Check the wifi
    if (WiFi.status() != WL_CONNECTED) {
        Serial.println(F("[WiFi] Not connected to AP"));
        return false;
    }

#if 0
#ifndef USING_AXTLS
    //wifiClient->setInsecure();  // for testing ok
#endif    
   if (wifiClient->connect(SERVERNAME, PORT)) {
        Serial.println(F("[WiFi] Connected to server"));
    }
...
    }
    #endif
    return true;
}

I still got a WDT overnight:

g to NOBABIES .......
WiFi connected (RSSI -49), IP address: 192.168.1.154, mem: 51464
Disconnecting client, wifi, sleeping
Connecting to NOBABIES .......
WiFi connected (RSSI -49), IP address: 192.168.1.154, mem: 51464
Disconnecting client, wifi, sleeping

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v7d5343a6
~ld
2.2.1(cfd48f3), BearSSL
Connecting to NOBABIES ......

There was another issue, same problem, but I can't seem to find it now.

WDT's handled by the RTC block, so I don't think you can get any info on where it happened. To the CPU it looks like a simple reset.

@earlephilhower
Copy link
Collaborator

Just to be clear, the actual guts of WiFi power off/power on stuff is in the Espressif binary-only blob. So there is no way to debug or anything we can do here about it.

@JiriBilek JiriBilek changed the title Random WDT after disconnecting Wifi. Occurs only with BearSSL, axTLS is fine Random WDT after periodically connecting and disconnecting Wifi. Jun 6, 2019
@JiriBilek
Copy link
Contributor Author

@earlephilhower , thanks a lot for your effort. I changed the title of this issue, it is misleading now.

@tbdltee
Copy link

tbdltee commented Jun 9, 2019

can you try change
Serial.println(F(", sleeping"));
WiFi.forceSleepBegin(); // turn off ESP8266 RF
delay(100); // FIXME
to
Serial.println(F(", sleeping"));
delay(100); // FIXME
WiFi.forceSleepBegin(); // turn off ESP8266 RF

see if it fixed.
I suspect that the Serial.print is still executing in background while you turn-off RF. I add delay to ensure the Serial.print has succesfully executed, before turning the RF off.

@JiriBilek
Copy link
Contributor Author

Yesterday, I compiled the original test sketch with an old core (2.5.0 dev, commit 641c5cd) and BearSSL.
It run for more than 18 hours without a problem. This confirms that the problem is not in the ssl library.

@Ogauy
Copy link

Ogauy commented Jun 10, 2019 via email

@JiriBilek
Copy link
Contributor Author

JiriBilek commented Jun 10, 2019

@tbdltee : I tried what you suggested, unfortunately the wdts and even exceptions are not gone, although appear much less frequent (once per 2 hours approximately).
The testing setup was: the git version of library, BearSSL used to open a connection and send a GET request.

The exception stack I received:

Exception 2: InstructionFetchError: Processor internal physical address or data error during instruction fetch
Decoding 103 results
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x4010458c: lmacProcessAckTimeout at ?? line ?
0x4020fa23: new_linkoutput at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/glue-lwip/lwip-git.c line 235
0x4020fe14: ethernet_output at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/netif/ethernet.c line 312
0x40205593: loop_task(ETSEventTag*) at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_main.cpp line 140
0x40205670: loop_wrapper() at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_main.cpp line 124
0x4021703b: etharp_raw at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1161
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x40211524: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40217236: etharp_request at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1202
0x4020fe68: do_memp_malloc_pool at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 254
0x402172a0: etharp_tmr at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 203
0x40211524: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40211524: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 233
0x40211534: lwip_cyclic_timer at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 243
0x4020fece: memp_free at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 447
0x402116cc: sys_check_timeouts at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/timeouts.c line 390
0x40245974: ets_timer_handler_isr at ?? line ?
0x40245981: ets_timer_handler_isr at ?? line ?
0x402459c6: ets_timer_handler_isr at ?? line ?
0x40205593: loop_task(ETSEventTag*) at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_main.cpp line 140
0x40104980: call_user_start_local at ?? line ?
0x40104986: call_user_start_local at ?? line ?
0x4010000d: call_user_start at ?? line ?
0x40101482: pp_post at ?? line ?
0x40104877: lmacRxDone at ?? line ?
0x40102199: trc_NeedRTS at ?? line ?
0x4010236a: trc_NeedRTS at ?? line ?
0x401027aa: wDev_ProcessFiq at ?? line ?
0x40102544: wDev_ProcessFiq at ?? line ?
0x4021bc2c: br_sha2small_round at /home/earle/Arduino/hardware/esp8266com/esp8266/tools/sdk/ssl/bearssl/src/hash/sha2small.c line 101 (discriminator 2)
0x4021ba9c: br_sha2small_round at /home/earle/Arduino/hardware/esp8266com/esp8266/tools/sdk/ssl/bearssl/src/hash/sha2small.c line 85
0x40254f08: node_remove_from_list at ?? line ?
0x401037c5: lmacProcessTXStartData at ?? line ?
0x401037c2: lmacProcessTXStartData at ?? line ?
0x40101482: pp_post at ?? line ?
0x40104877: lmacRxDone at ?? line ?
0x40102199: trc_NeedRTS at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x4010236a: trc_NeedRTS at ?? line ?
0x401027aa: wDev_ProcessFiq at ?? line ?
0x40254fa8: node_remove_from_list at ?? line ?
0x40254df0: node_remove_from_list at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x4023f92f: pp_attach at ?? line ?
0x4023f97e: pp_attach at ?? line ?
0x4023fa8a: pp_attach at ?? line ?
0x40101482: pp_post at ?? line ?
0x4023ea27: ppTxPkt at ?? line ?
0x40231def: ieee80211_output_pbuf at ?? line ?
0x40104eff: wdt_feed at ?? line ?
0x40101482: pp_post at ?? line ?
0x40101482: pp_post at ?? line ?
0x40104877: lmacRxDone at ?? line ?
0x40101482: pp_post at ?? line ?
0x40101482: pp_post at ?? line ?
0x40104877: lmacRxDone at ?? line ?
0x40102199: trc_NeedRTS at ?? line ?
0x4010236a: trc_NeedRTS at ?? line ?
0x4010236a: trc_NeedRTS at ?? line ?
0x401027aa: wDev_ProcessFiq at ?? line ?
0x40102544: wDev_ProcessFiq at ?? line ?
0x40101482: pp_post at ?? line ?
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x40102199: trc_NeedRTS at ?? line ?
0x4010065c: _umm_free at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1304
0x40100aa4: free at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1764
0x401027aa: wDev_ProcessFiq at ?? line ?
0x40218ab8: mem_free at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 237
0x402136b6: pbuf_free_LWIP2 at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c line 786 (discriminator 1)
0x40218a90: mem_malloc at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 210
0x401008ac: malloc at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1677
0x40218a90: mem_malloc at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/mem.c line 210
0x4020fe68: do_memp_malloc_pool at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 254
0x4020fea4: memp_malloc at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/memp.c line 356
0x40217443: etharp_query at /home/gauchard/dev/esp8266/esp8266/tools/sdk/lwip2/builder/lwip2-src/src/core/ipv4/etharp.c line 1031
0x40253340: sleep_reset_analog_rtcreg_8266 at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x40105159: ets_timer_disarm at ?? line ?
0x4010018a: millis at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_wiring.cpp line 180
0x40100165: millis at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_wiring.cpp line 174
0x40100aa4: free at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266\umm_malloc/umm_malloc.cpp line 1764
0x401001c0: millis at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_wiring.cpp line 186
0x4020161a: ESP8266WiFiGenericClass::forceSleepBegin(unsigned int) at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\libraries\ESP8266WiFi\src/ESP8266WiFiGeneric.cpp line 484
0x40205601: esp_schedule at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_main.cpp line 102
0x40205699: loop_wrapper() at D:\Programy\arduino\hardware\esp8266com-git_455583b\esp8266\cores\esp8266/core_esp8266_main.cpp line 134

@tbdltee
Copy link

tbdltee commented Jun 10, 2019

@JiriBilek, general speaking, I used to have the wdt-reset situation and I've found out that I need to call the ESP function in the strict order, so wdt_reset never happended to me again.
I also turn-on/off wifi very often. When ESP wake-up from deepsleep, it execute setup(), acquire data, turn-on wifi, sending data, and goto sleep.

Here are my general code:

  1. After Serial.print, Serial.println, need to wait at least 80ms before calling any command of WiFi.xxxx
  2. Sequence to turn-on WiFi, in this case your wifiConnect() function
    void wifiConnect() {
    WiFi.forceSleepWake();
    delay (1);
    WiFi.mode (WIFI_STA);
    WiFi.begin (ssid, pass);
    ...}

3.Sequence to turn-off WiFi, in this case your wifiDisconnect() function
void wifiDisconnect(void) {
Serial.print(F("Disconnecting client"));
wifiClient.stop();
Serial.print(F(", wifi"));
delay (100);
WiFi.disconnect();
Serial.println(F(", sleeping"));
delay (100);
lastSleep = millis();
WiFi.mode (WIFI_OFF);
WiFi.forceSleepBegin();
delay (1);
}

the delay (1); must be added. I don't know why but ESP seems not working without it.
Actually, I'm using ESP.deepsleep(). so my code is a little bit different. But the sequence of turn-on/off wifi are the same.

See if it helps?

@JiriBilek
Copy link
Contributor Author

@tbdltee: Thanks for the information but unfortunately, the proposed changes didn't work either. The device fires wdt in ca 1.5 hour frequency.
The only solution I see now is to revert to the old core.

The relevant part of the code:

void wifiConnect(void) {
    WiFi.forceSleepWake();
    delay(100);

    WiFi.mode(WIFI_STA);
    //delay(100);

    WiFi.begin(SSID, PASSWORD);
    Serial.printf_P(PSTR("Connecting to %s "), SSID);

    while (WiFi.status() == WL_DISCONNECTED) {
        Serial.write('.');
        delay(500);
    }
    Serial.println();

    if (WiFi.status() == WL_CONNECTED) {
        Serial.printf_P(PSTR("WiFi connected (RSSI %d), IP address: %s, "), WiFi.RSSI(), WiFi.localIP().toString().c_str());
        Serial.printf_P(PSTR("mem: %d\r\n"), ESP.getFreeHeap());

        if (time(nullptr) < 100000000)
            readTime();
    }
}

void wifiDisconnect(void) {
    // Disconnecting wifi
    Serial.print(F("Disconnecting client"));
    wifiClient.stop();

    Serial.print(F(", wifi"));
    delay(100);  // FIXME

    WiFi.disconnect();

    Serial.println(F(", sleeping"));
    delay(100);  // FIXME
    
    lastSleep = millis();

    WiFi.mode(WIFI_OFF);
    WiFi.forceSleepBegin();  // turn off ESP8266 RF
    delay(1);
}

@tbdltee
Copy link

tbdltee commented Jun 11, 2019

@JiriBilek. Thanks for update.
Currently, I'm using core 2.4.2, SDK2.2.1. Core 2.5.x also have a lot of problem for me. I'll for a while until it's more stable.

@JiriBilek
Copy link
Contributor Author

JiriBilek commented Jun 11, 2019

But, after all, it is the best setup for the new core (2.6.0-dev). After two wdts in the evening it run overnight without problems (8.5 hours).

Just for information, the star in the graph is a reset and the gray circle is a successful transmittion. The first two stars in the picture are resets caused by firmware upload.
image

@12TA
Copy link

12TA commented Jun 11, 2019

I can own this problem too. I spent ages trying everything. The issue is intermittent and I found my 'solution' was just to comment out my WiFi.disconnect(); line, and then it runs perfectly stably. I have not tested any effect on power consumption. Note the crash site is variable intermittent and some lines after the disconnect call. As I tinkered the problem got worse, not better and removing that line made all the difference. I can't explain it, but it is a simple thing to try and report back on. PaulS. 12TA.

@Rob58329
Copy link

Rob58329 commented Jun 14, 2019

@JiriBilek: I note that [earlephilhower] above said that the guts of the "WiFi.mode(WIFI_OFF)" and similar commands are actually in the Espressif SDK.

Also that the latest esp8266 github (eg v31May19) for the Arduino IDE has a "Generic ESP8266 Module" with the options to select 3 different versions of the SDK, namely: "nonos-sdk 2.2.1 (legacy)=v2.1.0-10-g509eae8", "2.2.2-190313 (testing)=v2.2.1-61-gc7b580c" and "sdk pre-3 (known issues)=v2.2.0-28-g89920dc".

For my current sketch which has the "Hardware Watchdog being intermittently triggered by (shortly after) WiFi.mode(WIFI_OFF)" issue:

  • "nonos-sdk 2.2.1 (legacy)=v2.1.0-10-g509eae8" - causes intermittent Hardware Watchdog crashes.
  • "sdk pre-3 (known issues)=v2.2.0-28-g89920dc" - causes intermittent Hardware Watchdog crashes
  • "2.2.2-190313 (testing)=v2.2.1-61-gc7b580c" - still causes intermittent Hardware Watchdog crashes

So, as earlier github software using “sdk pre-3” works fine, I do not think the SDK version is causing my Hardware Watchdog crash issue.

Instead, I note that:

Update (28Jul19):
I note that my post below which detailed the solution I found to my specific issue (intermittent Hardware WDT crashes after "WiFi.mode(WIFI_OFF)") appears to have been hidden by admin). Therefore if anyone else has the same issue, the (“temporary”) solution which worked for me ( and I believe also worked for @JiriBilek ) was to reverse the following commit:

Specifically, I edited the file “libraries/ESP8266WiFi/src/ESP8266WiFiGeneric.cpp” and commented out relevant 2 lines so they say:

// if (m != WIFI_STA && m != WIFI_AP_STA) // commented out as causing intermittent WDT
// wifi_station_dhcpc_stop(); // commented out as causing intermittent Hardware WDT

And this solved my intermittent Hardware WDT crashes (I have now been running 8 sensors connecting to a remote server every 10 minutes approx for over 4 weeks using the 31May19 github software without a single WDT crash).

I hope this info will be useful to someone.

@JiriBilek
Copy link
Contributor Author

@Rob58329: I think I am using the legacy core in all my tests. Not sure because now, I don't have my computer with me. Will be back in July.

@d-a-v
Copy link
Collaborator

d-a-v commented Jun 18, 2019

WiFi.forceSleepBegin() must be accompanied by WiFi.forceSleepWake().

I could reproduce the bug, and it vanished with a call to WiFi.forceSleepWake() just before WiFi.begin().

sketch (same as above without any form of WiFiClient like @earlephilhower did), with the fix:

#include <ESP8266WiFi.h>

#define SSID STASSID
#define PASSWORD STAPSK

static uint32_t MILLIS_TO_WAKE_UP = 30 * 1000; // wake up after 30 seconds
uint32_t lastSleep = 0;  // millis of the last time when forced to sleep

void setup() {
  WiFi.setAutoConnect(false);
  Serial.begin(115200);

  pinMode(LED_BUILTIN, OUTPUT);
  digitalWrite(LED_BUILTIN, LOW);  // Start indicator

  // Wait for GPIO0 down as a start condition (we want to stop here after a wdt reset)
  /*    Serial.println(F("\nConnect GPIO0 to GND to start"));
      pinMode(0, INPUT_PULLUP);
      while (digitalRead(0) == HIGH)
          delay(100);
  */
  pinMode(LED_BUILTIN, INPUT);  // Back to default

  Serial.print(ESP.getSdkVersion());

  wifiConnect();
  wifiSend(3);
  wifiDisconnect();
}

void loop() {
  //Serial.print("\x08.");
  static int count = 0;
  if (((++count) % (1024 * 16)) == 0)
  {
    Serial.print("x");
    count = 0;
  }

  if (millis() - lastSleep > MILLIS_TO_WAKE_UP) {
    wifiConnect();
    wifiSend(2);
    wifiDisconnect();
  }
}

void wifiConnect(void) {
  Serial.println("start connecting");
  WiFi.mode(WIFI_STA);
  delay(100);
  WiFi.forceSleepWake();  // <============================================ FIX
  WiFi.begin(SSID, PASSWORD);
  Serial.printf_P(PSTR("Connecting to %s "), SSID);

  while (WiFi.status() == WL_DISCONNECTED) {
    Serial.write('.');
    delay(500);
  }
  Serial.println("not disconnected");

  if (WiFi.status() == WL_CONNECTED) {
    Serial.printf_P(PSTR("WiFi connected (RSSI %d), IP address: %s, "), WiFi.RSSI(), WiFi.localIP().toString().c_str());
    Serial.printf_P(PSTR("mem: %d\r\n"), ESP.getFreeHeap());

    if (time(nullptr) < 100000000)
      readTime();
  }
}

void wifiDisconnect(void) {
  // Disconnecting wifi

  Serial.print(F(", wifi"));
  WiFi.disconnect();
  WiFi.mode(WIFI_OFF);
  delay(100);  // FIXME

  Serial.println(F(", sleeping"));

  //WiFi.forceSleepBegin();  // turn off ESP8266 RF
  wifi_set_opmode_current(WIFI_OFF);
  //WiFi.forceSleepBegin(/*default*/0) equivalent:
  // sleep forever until wifi_fpm_do_wakeup() is called
  wifi_fpm_set_sleep_type(MODEM_SLEEP_T);
  wifi_fpm_open();
  wifi_fpm_do_sleep(0xFFFFFFF);

  delay(100);  // FIXME

  lastSleep = millis();
}

boolean wifiSend(int8_t status) {
  // Check the wifi
  if (WiFi.status() != WL_CONNECTED) {
    Serial.println(F("[WiFi] Not connected to AP"));
    return false;
  }
  return true;
}

void readTime(void) {
  if (WiFi.status() != WL_CONNECTED) {
    return;
  }

  Serial.print(F("Setting time using SNTP "));

  configTime(1 * 3600, 0, "tik.cesnet.cz", "pool.ntp.org");

  // Read time, wait 5 seconds
  uint32_t m = millis();
  time_t now = time(nullptr);
  while (now < 100000000 && millis() - m < 5000) {
    delay(100);
    Serial.write('.');
    now = time(nullptr);
  }
  Serial.println();

  if (now < 100000000) {
    Serial.println(F("Time was not set."));
  }
  else {
    Serial.print(F("Current time: "));
    Serial.println(ctime(&now));
  }
}

@earlephilhower
Copy link
Collaborator

Sorry, my only experience w/PIO is through the ATOM IDE for building Marlin.

However, you can manually run GDB and get the same info
From a command line, run GDB: xtensa-*-gdb
At the GDB prompt, run "file /full.path.to/sketch.elf"
Then, just "l *0x40....." (look for stack values and OOM addresses and the reported PC) and GDB will give you the line of the error.

That's what the ESPExceptionDecoder is doing, anyway. A CLI utility might be handy if there really is no way to debug stack traces with PIO.

As for OOM, it doesn't actually crash the machine when a malloc/new fails, but with debugging enabled it logs the address of the caller so that later, if you don't check the new/malloc return value and use the pointer (to 0) the resulting crash will be easier to debug.

@earlephilhower
Copy link
Collaborator

earlephilhower commented Jul 17, 2019

example GDB for one of the failing mallocs:

# xtensa-*gdb   (will need full path to the tools dir where xtensa-gcc and xtensa-gdb are stored)
file "/tmp/arduino_build/sketch.ino.elf"
l *0x4026F139
<gdb will print the offending line of code here>

And, of course, only the xtensa-*gdb distributed with the Arduino code can be used because your native gdb on Linux or Mac will only understand x86 instructions.

@TD-er
Copy link
Contributor

TD-er commented Jul 17, 2019

The failure to allocate 620 bytes was when running this macro in my code:

typedef std::shared_ptr<ControllerSettingsStruct> ControllerSettingsStruct_ptr_type;
#define MakeControllerSettings(T) ControllerSettingsStruct_ptr_type ControllerSettingsStruct_ptr(new ControllerSettingsStruct());\
                                    ControllerSettingsStruct& T = *ControllerSettingsStruct_ptr;

This is called from:

        MakeControllerSettings(ControllerSettings);
        LoadControllerSettings(event->ControllerIndex, ControllerSettings);

That's something that may be happening now, with the core debug strings active.
At reboot it does then show a "Software Watchdog" reboot.
Good to know these can be an issue, but not really the problem here I guess.
I will at least add some check in this macro to see if the pointer is valid.

@TD-er
Copy link
Contributor

TD-er commented Jul 18, 2019

I really don't get it.
Yesterday I was sure I had it working every time in every build I made the node could connect to WiFi right the first attempt.
Now I am using almost the same code, including 1000 msec wait after WIFI_OFF, but now with the debug stuff removed.
And it is now failing to connect even the first time. (waiting for wifi connect => WDT reboot)

This is really frustrating.

The only things changed are:

  • Remove core & OOM debug
  • change a bit of totally unrelated code (move the scope of a variable unrelated to WiFi code)

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 19, 2019

What if you enable core & OOM debug back again ? (Heisenberg effect)

@JiriBilek
Copy link
Contributor Author

I was curious if in my setup the OOM debug will tell anything. It may be you are chasing another bug because I ran my node for one day and no OOM messages appeared. There were WDT as usual, though.

@TD-er
Copy link
Contributor

TD-er commented Jul 19, 2019

The OOM was happening on my system with the core debug enabled.
And that's indeed another issue, totally unrelated to what we're discussing here.
But I had to mention it, since OOM stuff may clutter the log reports.

@d-a-v I did enable CORE and OOM debug last night and was still not able to connect, so I was tracking down some of the other changes to get something which makes it reproducible. I stopped at 2am for obvious reasons :)
So the only changes now present are some that just change the scope of variables in loops totally unrelated to WiFi code and the removal of a String allocation which appeared not to be used. (I was running CPPcheck on my code and followed its suggestions)
One of them may actually save some time when not yet performed, so indirectly may have an effect on WiFi connectivity. (reading settings from SPIFFS)

Just to be sure I am not missing anything else, I do clean builds for every attempt, so it may take a while to track all, even when doing a binary search on the files checked out.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 19, 2019

The workarhack I was using for the failing connection was to WiFi.mode(WIFI_OFF) after a timeout, then delay(1000) and retry connection. That, until we can find a way to understand when connection attempt is in a bad state.

@TD-er
Copy link
Contributor

TD-er commented Jul 19, 2019

I do have the 1000 msec delay, but what do you mean by "after a timeout" ?

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 19, 2019

    if (!started)
        WiFi.mode(WIFI_STA)
        WiFi.begin; start=millis; started=true
    if (started and !connected and (millis-start>timeout))
        WiFi.mode(WIFI_OFF)
        delay(1000)
        started=false

(finished editing # 1) (that's not python :)

@TD-er
Copy link
Contributor

TD-er commented Jul 19, 2019

I removed all code related to WIFI_OFF and then it was capable of connecting with a very small custom build (only including a few plugins in ESPeasy), but the same code running a "normal build" (just more plugins) cannot connect to WiFi anymore. (Least amount or reboots was 55 until it finally succeeded)
So I think things are now way too time critical to be useful.

Tomorrow I will strip all fancy WiFi related code and just use something related to WiFiMulti class (and will make a pull request to allow working with hidden SSIDs and allow to do wifi off between reconnects)
I am not sure what's going on here, but this just isn't usable anymore with it being Russian Roulette between builds whether wifi will connect.

@TD-er
Copy link
Contributor

TD-er commented Jul 21, 2019

I also mentioned it here: platformio/platform-espressif8266#166 (comment)
But I guess this may be the more appropriate place to ask...

Just to be sure, since it does often result in WiFi connect issues.

  • Variables used in (wifi) event callback functions, do they need to be declared volatile? (tested with it and does not seem to make any difference)
  • Callback functions for WiFi events, do they need IRAM attributes? (not tested yet) And do all functions called from IRAM attr marked function also need to be marked as such?
  • Are there functions that should not be called from callback functions? (e.g. millis())

@devyte
Copy link
Collaborator

devyte commented Jul 21, 2019

Variables used in (wifi) event callback functions, do they need to be declared volatile

Should not be needed

Callback functions for WiFi events, do they need IRAM attributes

No, they execute in SYS

do all functions called from IRAM attr marked function also need to be marked as such

Yes, the entire call tree needs to be in iram, which is why ISRs should be kept simple and isolated

Are there functions that should not be called from callback functions

Depends on the callback. Functions that execute in CONT are the most relaxed. Functions that execute in SYS, such as Ticker and the wifi events, can't call delay, yield, blocking functions, etc. and are subject to stricter timing requirements vs. CONT.

@TD-er
Copy link
Contributor

TD-er commented Jul 21, 2019

Check.
So if millis() can be called from sys, then we're fine.
I was just wondering, but since the examples were also not showing the volatile attributes, I also had not used them.
Since it already took again lots of hours debugging today I just want to make sure not to waste more on just things not right in the examples.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 29, 2019

Here's an attempt to work on a common basis #6356

adamm added a commit to adamm/esp8266-clockradio that referenced this issue Sep 4, 2019
@Rob58329
Copy link

Just to confirm that the @d-a-v commit of 5Sept19 “Experimental: add new WiFi (pseudo) modes: WIFI_SHUTDOWN & WIFI_RESUME #6356” fixes my issue of “intermittent Hardware WDT crashes after "WiFi.mode(WIFI_OFF)". (I have now been running with the github version of 6Sep19 for nearly 3 weeks without a crash!)

Many thanks!

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 25, 2019

@Rob58329 You instability was probably fixed by #6484, but anyway thanks for testing it.

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 1, 2019

Is this issue still relevant after #6484 ?
on d-a-v.github.io here is an installable snapshot including #6484.
Closing, please create a new issue if needed.

@d-a-v d-a-v closed this as completed Oct 1, 2019
@JiriBilek
Copy link
Contributor Author

Haven't been here for a while, sorry.
I checked out the git version of the library and my device is now running fine for 2 days. It seems to me the issue is fixed.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

9 participants