Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK appears to randomly allocate >3K stack [solved, high TX pwr + cheap/slow flash chip = random crash] #6366

Closed
ChocolateFrogsNuts opened this issue Aug 1, 2019 · 109 comments

Comments

@ChocolateFrogsNuts
Copy link
Contributor

I'm pretty sure this is an SDK problem, but posting here in case I'm wrong, or it's caused by something in the Arduino library - and I discovered/tested it using the Arduino IDE. I have also been looking on esp8266.com for similar issues.

Basic Infos

  • [X ] This issue complies with the issue POLICY doc.
  • [X ] I have read the documentation at readthedocs and the issue is not addressed there.
  • [X ] I have tested that the issue is present in current master branch (aka latest git).
  • [X ] I have searched the issue tracker for a similar issue.
  • [X ] If there is a stack dump, I have decoded it.
  • [X ] I have filled out all fields below.

Platform

  • Hardware: WEMOS D1 mini Pro v1
  • Core Version: latest git
  • Development Env: Arduino IDE
  • Operating System: Slackware Linux

Settings in IDE

  • Module: [Wemos D1 mini Pro]
  • Flash Mode: not sure - flashing from Arduino IDE over USB
  • Flash Size: 16MB
  • lwip Variant: v2 lower memory
  • Reset Method: [ck|nodemcu]
  • Flash Frequency: [40Mhz]
  • CPU Frequency: [80Mhz]
  • Upload Using: [SERIAL/USB]
  • Upload Speed: [921600] (serial upload only)

Problem Description

Exception 0 with a very long stack dump, sometimes more than 5500 bytes of stack used.
Dump contains a large area of stack (>3Kbytes) that has been allocated but never used (still set to feefeffe)

I tested with version 2.4.0 right up to git (1-Aug-2019) with the same results.
I also tried several D1 mini Pro boards with the same result.

It can be reliably reproduced by flashing the BareMinimum example once the wifi has been configured to connect to an available network with the default sleep mode (MODEM_SLEEP_T).
With debugging on it will crash within seconds of "pm open,type:2"

I decoded many stack dumps, and as far as I can tell the "randomly" allocated stack is happening in the SDK code, probably in a network interrupt handler as generally the trace leaves the Arduino library in esp_yield_within_cont() which calls run_scheduled_recurrent_functions(); - before the big chunk of stack is allocated - and re-enters the Arduino library at ethernet_input() - after the big chunk is allocated.
There are several stack frames that don't get decoded, presumably because they happen in the SDK.

The sketch below contains some additional code I have been using to try and diagnose the issue, but as mentioned above, the BareMinimum example will trigger the problem reliably once the wifi is configured. #includes are as per Sketch->Include->ESP8266WIFI.

It may be worth noting that only including ESP8266WiFi.h and no other headers in this sketch results in a crash much sooner for me.

MCVE Sketch

#include <WiFiServerSecure.h>
#include <ESP8266WiFiType.h>
#include <WiFiClient.h>
#include <BearSSLHelpers.h>
#include <ESP8266WiFiGeneric.h>
#include <ESP8266WiFiScan.h>
#include <ESP8266WiFiSTA.h>
#include <WiFiClientSecureBearSSL.h>
#include <ESP8266WiFiAP.h>
#include <CertStoreBearSSL.h>
#include <WiFiUdp.h>
#include <ESP8266WiFi.h>
#include <WiFiServer.h>
#include <WiFiServerSecureAxTLS.h>
#include <WiFiClientSecureAxTLS.h>
#include <WiFiClientSecure.h>
#include <ESP8266WiFiMulti.h>
#include <WiFiServerSecureBearSSL.h>

// 0=WiFi off, doesn't crash at all
// 1=WiFi on, crashes every time
#define ENABLE_WIFI 1

extern "C"{
#include "user_interface.h"
}

unsigned int last;
int led;

void setup() {
#if ENABLE_WIFI
  //wifi_set_sleep_type(NONE_SLEEP_T); // takes longer but still crashes
  wifi_set_sleep_type(MODEM_SLEEP_T); // this is the default and crashes well
  WiFi.mode(WIFI_STA);
  WiFi.begin("apname","password");
#else
  WiFi.disconnect();
#endif

  pinMode(LED_BUILTIN, OUTPUT);
  Serial.begin(115200);
  Serial.println("\nHello");
  last=millis();
  led=0;
}


void loop() {
  // put your main code here, to run repeatedly:
  delay(10);
  
  if (millis() > (last+1000)) {
     Serial.write(".");
     digitalWrite(LED_BUILTIN, led & 1 ? HIGH : LOW);
     led++;
     if (led>=40) {
        Serial.write("\n");
        led=0;
     }
     last = millis();
     wdt_reset();
     stack_test();
  }

}

void stack_test() {
  
 #define STACK_END (unsigned int *)0x3FFFFFB0
 #define STACK_START (STACK_END - (4080/4))
 
 // Test stack for large areas of uninitialized data
 unsigned int *p=STACK_START;
 unsigned int *cleared_start=NULL;
 int max_cleared=-1;
 static int last_max_cleared=-1;
 
 while (p<STACK_END) {
   if (*p==0xFEEFEFFE) {
      if ((cleared_start==NULL) && (max_cleared>=0)) {
         cleared_start=p;
      }
   } else {
      if (max_cleared<0) {
         max_cleared=0; // first modified word
         if (p<(STACK_START+50)) {
            Serial.printf("WARNING: first modified word @ %p (within 200 bytes of stack limit)\n", p);
         }
      }
      if (cleared_start!=NULL) {
         int len = p-cleared_start;
         if (len>4) {
            Serial.printf("UNUSED STACK Block %d bytes @ %p\n", len*4, cleared_start);
            if (len>max_cleared) max_cleared=len;
         }
         cleared_start=NULL;
      }
   }
   p++;
 }
 
 if ((max_cleared>8) && (max_cleared != last_max_cleared)) {
    Serial.print("Largest mis-allocated stack block:");
    Serial.print(max_cleared);
    Serial.print(" words (");
    Serial.print(max_cleared*4);
    Serial.println(" bytes)");
    last_max_cleared = max_cleared;
 }
}

Debug Messages

SDK:2.2.2-dev(38a443e)/Core:2.5.2-98-gd6973cd6=20502098/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af

Hello
.wifi evt: 2
.scandone
state: 0 -> 2 (b0)
.state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt 
.
connected with apname, channel 9
dhcp client start...
wifi evt: 0
.ip:192.168.10.60,mask:255.255.255.0,gw:192.168.10.1
wifi evt: 3
.......pm open,type:2 0
...Fatal exception 0(IllegalInstructionCause):
epc1=0x4022ce74, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Exception (0):
epc1=0x4022ce74 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffedb0 end: 3fffffb0 offset: 01a0
3fffef50:  40000f49 3fffdab0 3fffdab0 40000f49  
3fffef60:  40000e19 40001878 00000002 00000000  
3fffef70:  3fffff10 aa55aa55 0000000a 40104af1  
3fffef80:  40104af7 00000002 00000000 52ffe941  
3fffef90:  4010000d c11211c8 03542210 6800f00d  
3fffefa0:  40100dfc 3fffef3c 40100da9 3fffff68  
3fffefb0:  3fffffc0 00000000 00000000 feefeffe  
3fffefc0:  feefeffe feefeffe feefeffe feefeffe  
3fffefd0:  feefeffe feefeffe feefeffe feefeffe  
3fffefe0:  feefeffe feefeffe feefeffe feefeffe  
3fffeff0:  feefeffe feefeffe feefeffe feefeffe  
3ffff000:  feefeffe feefeffe feefeffe feefeffe  
3ffff010:  feefeffe feefeffe feefeffe feefeffe  
3ffff020:  feefeffe feefeffe feefeffe feefeffe  
3ffff030:  feefeffe feefeffe feefeffe feefeffe  
3ffff040:  feefeffe feefeffe feefeffe feefeffe  
3ffff050:  feefeffe feefeffe feefeffe feefeffe  
3ffff060:  feefeffe feefeffe feefeffe feefeffe  
3ffff070:  feefeffe feefeffe feefeffe feefeffe  
3ffff080:  feefeffe feefeffe feefeffe feefeffe  
3ffff090:  feefeffe feefeffe feefeffe feefeffe  
3ffff0a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff100:  feefeffe feefeffe feefeffe feefeffe  
3ffff110:  feefeffe feefeffe feefeffe feefeffe  
3ffff120:  feefeffe feefeffe feefeffe feefeffe  
3ffff130:  feefeffe feefeffe feefeffe feefeffe  
3ffff140:  feefeffe feefeffe feefeffe feefeffe  
3ffff150:  feefeffe feefeffe feefeffe feefeffe  
3ffff160:  feefeffe feefeffe feefeffe feefeffe  
3ffff170:  feefeffe feefeffe feefeffe feefeffe  
3ffff180:  feefeffe feefeffe feefeffe feefeffe  
3ffff190:  feefeffe feefeffe feefeffe feefeffe  
3ffff1a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff200:  feefeffe feefeffe feefeffe feefeffe  
3ffff210:  feefeffe feefeffe feefeffe feefeffe  
3ffff220:  feefeffe feefeffe feefeffe feefeffe  
3ffff230:  feefeffe feefeffe feefeffe feefeffe  
3ffff240:  feefeffe feefeffe feefeffe feefeffe  
3ffff250:  feefeffe feefeffe feefeffe feefeffe  
3ffff260:  feefeffe feefeffe feefeffe feefeffe  
3ffff270:  feefeffe feefeffe feefeffe feefeffe  
3ffff280:  feefeffe feefeffe feefeffe feefeffe  
3ffff290:  feefeffe feefeffe feefeffe feefeffe  
3ffff2a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff300:  feefeffe feefeffe feefeffe feefeffe  
3ffff310:  feefeffe feefeffe feefeffe feefeffe  
3ffff320:  feefeffe feefeffe feefeffe feefeffe  
3ffff330:  feefeffe feefeffe feefeffe feefeffe  
3ffff340:  feefeffe feefeffe feefeffe feefeffe  
3ffff350:  feefeffe feefeffe feefeffe feefeffe  
3ffff360:  feefeffe feefeffe feefeffe feefeffe  
3ffff370:  feefeffe feefeffe feefeffe feefeffe  
3ffff380:  feefeffe feefeffe feefeffe feefeffe  
3ffff390:  feefeffe feefeffe feefeffe feefeffe  
3ffff3a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff400:  feefeffe feefeffe feefeffe feefeffe  
3ffff410:  feefeffe feefeffe feefeffe feefeffe  
3ffff420:  feefeffe feefeffe feefeffe feefeffe  
3ffff430:  feefeffe feefeffe feefeffe feefeffe  
3ffff440:  feefeffe feefeffe feefeffe feefeffe  
3ffff450:  feefeffe feefeffe feefeffe feefeffe  
3ffff460:  feefeffe feefeffe feefeffe feefeffe  
3ffff470:  feefeffe feefeffe feefeffe feefeffe  
3ffff480:  feefeffe feefeffe feefeffe feefeffe  
3ffff490:  feefeffe feefeffe feefeffe feefeffe  
3ffff4a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff500:  feefeffe feefeffe feefeffe feefeffe  
3ffff510:  feefeffe feefeffe feefeffe feefeffe  
3ffff520:  feefeffe feefeffe feefeffe feefeffe  
3ffff530:  feefeffe feefeffe feefeffe feefeffe  
3ffff540:  feefeffe feefeffe feefeffe feefeffe  
3ffff550:  feefeffe feefeffe feefeffe feefeffe  
3ffff560:  feefeffe feefeffe feefeffe feefeffe  
3ffff570:  feefeffe feefeffe feefeffe feefeffe  
3ffff580:  feefeffe feefeffe feefeffe feefeffe  
3ffff590:  feefeffe feefeffe feefeffe feefeffe  
3ffff5a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff600:  feefeffe feefeffe feefeffe feefeffe  
3ffff610:  feefeffe feefeffe feefeffe feefeffe  
3ffff620:  feefeffe feefeffe feefeffe feefeffe  
3ffff630:  feefeffe feefeffe feefeffe feefeffe  
3ffff640:  feefeffe feefeffe feefeffe feefeffe  
3ffff650:  feefeffe feefeffe feefeffe feefeffe  
3ffff660:  feefeffe feefeffe feefeffe feefeffe  
3ffff670:  feefeffe feefeffe feefeffe feefeffe  
3ffff680:  feefeffe feefeffe feefeffe feefeffe  
3ffff690:  feefeffe feefeffe feefeffe feefeffe  
3ffff6a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff700:  feefeffe feefeffe feefeffe feefeffe  
3ffff710:  feefeffe feefeffe feefeffe feefeffe  
3ffff720:  feefeffe feefeffe feefeffe feefeffe  
3ffff730:  feefeffe feefeffe feefeffe feefeffe  
3ffff740:  feefeffe feefeffe feefeffe feefeffe  
3ffff750:  feefeffe feefeffe feefeffe feefeffe  
3ffff760:  feefeffe feefeffe feefeffe feefeffe  
3ffff770:  feefeffe feefeffe feefeffe feefeffe  
3ffff780:  feefeffe feefeffe feefeffe feefeffe  
3ffff790:  feefeffe feefeffe feefeffe feefeffe  
3ffff7a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff800:  feefeffe feefeffe feefeffe feefeffe  
3ffff810:  feefeffe feefeffe feefeffe feefeffe  
3ffff820:  feefeffe feefeffe feefeffe feefeffe  
3ffff830:  feefeffe feefeffe feefeffe feefeffe  
3ffff840:  feefeffe feefeffe feefeffe feefeffe  
3ffff850:  feefeffe feefeffe feefeffe feefeffe  
3ffff860:  feefeffe feefeffe feefeffe feefeffe  
3ffff870:  feefeffe feefeffe feefeffe feefeffe  
3ffff880:  feefeffe feefeffe feefeffe feefeffe  
3ffff890:  feefeffe feefeffe feefeffe feefeffe  
3ffff8a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff900:  feefeffe feefeffe feefeffe feefeffe  
3ffff910:  feefeffe feefeffe feefeffe feefeffe  
3ffff920:  feefeffe feefeffe feefeffe feefeffe  
3ffff930:  feefeffe feefeffe feefeffe feefeffe  
3ffff940:  feefeffe feefeffe feefeffe feefeffe  
3ffff950:  feefeffe feefeffe feefeffe feefeffe  
3ffff960:  feefeffe feefeffe feefeffe feefeffe  
3ffff970:  feefeffe feefeffe feefeffe feefeffe  
3ffff980:  feefeffe feefeffe feefeffe feefeffe  
3ffff990:  feefeffe feefeffe feefeffe feefeffe  
3ffff9a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9f0:  feefeffe feefeffe feefeffe feefeffe  
3ffffa00:  feefeffe feefeffe feefeffe feefeffe  
3ffffa10:  feefeffe feefeffe feefeffe feefeffe  
3ffffa20:  feefeffe feefeffe feefeffe feefeffe  
3ffffa30:  feefeffe feefeffe feefeffe feefeffe  
3ffffa40:  feefeffe feefeffe feefeffe feefeffe  
3ffffa50:  feefeffe feefeffe feefeffe feefeffe  
3ffffa60:  feefeffe feefeffe feefeffe feefeffe  
3ffffa70:  feefeffe feefeffe feefeffe feefeffe  
3ffffa80:  feefeffe feefeffe feefeffe feefeffe  
3ffffa90:  feefeffe feefeffe feefeffe feefeffe  
3ffffaa0:  feefeffe feefeffe feefeffe feefeffe  
3ffffab0:  feefeffe feefeffe feefeffe feefeffe  
3ffffac0:  feefeffe feefeffe feefeffe feefeffe  
3ffffad0:  feefeffe feefeffe feefeffe feefeffe  
3ffffae0:  feefeffe feefeffe feefeffe feefeffe  
3ffffaf0:  feefeffe feefeffe feefeffe feefeffe  
3ffffb00:  feefeffe feefeffe feefeffe feefeffe  
3ffffb10:  feefeffe feefeffe feefeffe feefeffe  
3ffffb20:  feefeffe feefeffe feefeffe feefeffe  
3ffffb30:  feefeffe feefeffe feefeffe feefeffe  
3ffffb40:  feefeffe feefeffe feefeffe feefeffe  
3ffffb50:  feefeffe feefeffe feefeffe feefeffe  
3ffffb60:  feefeffe feefeffe feefeffe feefeffe  
3ffffb70:  feefeffe feefeffe feefeffe feefeffe  
3ffffb80:  feefeffe feefeffe feefeffe feefeffe  
3ffffb90:  feefeffe feefeffe feefeffe feefeffe  
3ffffba0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbb0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbc0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbd0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbe0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbf0:  feefeffe feefeffe feefeffe feefeffe  
3ffffc00:  feefeffe feefeffe feefeffe feefeffe  
3ffffc10:  feefeffe feefeffe feefeffe feefeffe  
3ffffc20:  feefeffe feefeffe feefeffe feefeffe  
3ffffc30:  feefeffe feefeffe feefeffe feefeffe  
3ffffc40:  feefeffe feefeffe feefeffe feefeffe  
3ffffc50:  00000000 00000000 0000001f 40105295  
3ffffc60:  4000050c feefeffe feefeffe feefeffe  
3ffffc70:  400043e6 00000030 00000016 ffffffff  
3ffffc80:  400044ab 3fffc718 3ffffd70 08000000  
3ffffc90:  00000002 00000000 0000000a 00000000  
3ffffca0:  00000002 00000000 0000000a 00000000  
3ffffcb0:  4023b1c0 0000049c 003fd000 00000030  
3ffffcc0:  00000000 a0000000 00000000 0000001c  
3ffffcd0:  00002000 feefeffe 00002000 00000000  
3ffffce0:  3ffffe40 00000000 3ffffe40 4020a15e  
3ffffcf0:  0000a000 3ffffde3 3ffee630 00000000  
3ffffd00:  00000000 40203084 40205e6d 00000008  
3ffffd10:  3ffffe40 00000008 3ffffe40 4020a15e  
3ffffd20:  3ffffda0 3ffffddb 3ffffd50 00000000  
3ffffd30:  fffffffe 00000000 40101857 4020a098  
3ffffd40:  3ffffe40 3ffffddb 3ffffda0 40205f6c  
3ffffd50:  00000008 40226077 3ffef174 3ffecf1c  
3ffffd60:  3ffe8304 00000000 0000000a 4023b3a0  
3ffffd70:  3ffffde3 00000002 00000000 401006a8  
3ffffd80:  3ffef22e 3ffeece4 00000008 3ffe8704  
3ffffd90:  00000000 3ffe8703 3ffffe40 4020a56f  
3ffffda0:  00000000 ffffffff ffffffff 00000000  
3ffffdb0:  00000008 00000008 3f302064 00000000  
3ffffdc0:  00000005 00000000 00000020 40101712  
3ffffdd0:  3ffe8b45 401049eb 3ffec5d0 32303530  
3ffffde0:  401022b7 3ffec5d0 3ffed614 aa55aa55  
3ffffdf0:  fffffff4 00e449b6 3ffed000 40102494  
3ffffe00:  3ffe93e4 00000000 00000000 3ffe8304  
3ffffe10:  fffffff4 00e449b6 4010295a 00000100  
3ffffe20:  3ffe93e4 7fffffff 00000000 00000001  
3ffffe30:  00000001 00000080 00000000 00040000  
3ffffe40:  3ffe93e4 401038cc 00040000 00e449b6  
3ffffe50:  3ffe93f0 2c9f0300 4000050c 3fffc278  
3ffffe60:  4010267c 3fffc200 00000022 40203174  
3ffffe70:  4000e23d 00000030 00000014 ffffffff  
3ffffe80:  40105386 00000018 0001f400 00000007  
3ffffe90:  0000000a 0000cf08 000035af fffffffe  
3ffffea0:  ffffffff 3fffc6fc 00000001 0000000a  
3ffffeb0:  3ffee5f8 00000000 3ffede50 00000030  
3ffffec0:  ffffffff 3fffc6fc 00000036 00145ea0  
3ffffed0:  00000000 3fffdad0 3ffee598 00000030  
3ffffee0:  00000000 3fffdad0 3ffee598 00000030  
3ffffef0:  3ffe86ef 00000000 3ffe86ee 4020351a  
3fffff00:  3fffdad0 3fffff3c 3fffff48 3ffee598  
3fffff10:  3fffdad0 00000075 3ffee47c 402016fc  
3fffff20:  3ffe86ef 00000000 3ffe86ee 4020351a  
3fffff30:  40105155 004ff857 3ffe864a 3ffee598  
3fffff40:  40105202 3ffeceac 004ff857 00000000  
3fffff50:  401053db 005010f6 3ffee5f8 00000000  
3fffff60:  3ffede50 3ffee5f8 3ffe850c 3ffee5f8  
3fffff70:  3fffdad0 3ffee598 4020253b 3fffefa0  
3fffff80:  3ffee5f8 3fffdad0 0000000a 40202b3b  
3fffff90:  00000000 00000000 3ffee568 402011c0  
3fffffa0:  3fffdad0 00000000 3ffee568 40202688  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
vd6973cd6
~ld
@d-a-v
Copy link
Collaborator

d-a-v commented Aug 1, 2019

I see no issue on my side

11:04:41.957 -> SDK:2.2.2-dev(38a443e)/Core:2.5.2-115-gc298f001=20502115/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
11:04:41.957 -> 
11:04:41.957 -> Hello
11:04:41.957 -> 
11:04:41.957 -> sleep enable,type: 2
11:04:41.957 -> scandone
11:04:41.957 -> wifi evt: 2
11:04:42.090 -> scandone
11:04:42.090 -> state: 0 -> 2 (b0)
11:04:42.090 -> state: 2 -> 3 (0)
11:04:42.090 -> state: 3 -> 5 (10)
11:04:42.090 -> add 0
11:04:42.090 -> aid 1
11:04:42.090 -> cnt 
11:04:42.123 -> 
11:04:42.123 -> connected with xxxxxx, channel 1
11:04:42.123 -> dhcp client start...
11:04:42.123 -> wifi evt: 0
11:04:42.156 -> ip:10.0.1.166,mask:255.255.255.0,gw:10.0.1.254
11:04:42.156 -> wifi evt: 3
11:04:42.951 -> ..........pm open,type:2 0
11:04:52.993 -> ..............................
11:05:23.049 -> ........................................
11:06:03.175 -> ........................................
11:06:43.298 -> ........................................
11:07:23.420 -> ........................................
11:08:03.509 -> ...

Can you please try and update master, git submodule update --init, re-run ./get.py in tools, and check power supply ?
You can also

  • provide the decoded stack
  • try to use legacy stack (with disable_extra4k_at_link_time(); anywhere in your code)

@ChocolateFrogsNuts
Copy link
Contributor Author

Ran the commands requested:

mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266$ cd tools
mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266/tools$ ./get.py
Platform: x86_64-pc-linux-gnu
Tool python-placeholder.tar.gz already downloaded
Extracting dist/python-placeholder.tar.gz
Tool x86_64-linux-gnu.xtensa-lx106-elf-b40a506.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.xtensa-lx106-elf-b40a506.1563313032.tar.gz
Tool x86_64-linux-gnu.mkspiffs-7fefeac.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.mkspiffs-7fefeac.1563313032.tar.gz
Tool x86_64-linux-gnu.mklittlefs-7f77f2b.1563313032.tar.gz already downloaded
Extracting dist/x86_64-linux-gnu.mklittlefs-7f77f2b.1563313032.tar.gz
mnix@Mike-Laptop:~/Arduino/hardware/esp8266com/esp8266/tools$

It's plugged into a USB hub with a good 12v power supply (several amps available).
I have now tried two separate externally powered USB hubs with separate power supplies and different USB ports on my laptop, as well as feeding 5v from my bench supply (good for 5A) to the board and not using a hub. The board is drawing around 550-600mA @ 5v with or without USB connected once the wifi connects. I even added a 470uF electrolytic capacitor at the board for good measure.
All those tests crashed.
I also left the wifi-disabled build running for about 6 hours on the same esp8266 as a test - no crashes.

I added disable_extra4k_at_link_time(); as the first line of setup() (and included coredecls.h)... it has definitely changed the output. Debug and stack decode follow....

Legacy Stack Debug output:

SDK:2.2.2-dev(38a443e)/Core:2.5.2-98-gd6973cd6=20502098/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af

Hello
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
UNUSED STACK Block 28 bytes @ 0x3ffffaf4
UNUSED STACK Block 28 bytes @ 0x3ffffc28
UNUSED STACK Block 60 bytes @ 0x3ffffc48
Largest mis-allocated stack block:15 words (60 bytes)
wifi evt: 2
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
UNUSED STACK Block 28 bytes @ 0x3ffffaf4
UNUSED STACK Block 24 bytes @ 0x3ffffc28
scandone
state: 0 -> 2 (b0)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 6
cnt 

connected with apname, channel 9
dhcp client start...
wifi evt: 0
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
ip:192.168.10.60,mask:255.255.255.0,gw:192.168.10.1
wifi evt: 3
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
pm open,type:2 0
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
.WARNING: first modified word @ 0x3fffefc0 (within 200 bytes of stack limit)
Fatal exception 0(IllegalInstructionCause):
epc1=0x4022bc74, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Exception (0):
epc1=0x4022bc74 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3ffffb40 end: 3fffffb0 offset: 01a0
3ffffce0:  400005e1 402030f8 00000000 00000030  
3ffffcf0:  4022bc74 00000033 00000010 ffffffff  
3ffffd00:  40104cf1 04000102 00000000 00000001  
3ffffd10:  fbf8ffff 04000002 3feffe00 00000100  
3ffffd20:  0000001a 00000018 04000102 40104cd0  
3ffffd30:  3fffc100 00000000 00000000 00000000  
3ffffd40:  00000009 00000009 0000004e 00000000  
3ffffd50:  0000ffff 00000008 00000016 ffffff60  
3ffffd60:  401038c7 00040000 00000000 00040000  
3ffffd70:  00000000 401038c4 00040000 00000030  
3ffffd80:  3ffeccc0 40102823 3ffef660 00000000  
3ffffd90:  402030f8 3fff03dc 000000e0 40100b8b  
3ffffda0:  3ffea0c8 2c9f0300 4000050c 3fffc278  
3ffffdb0:  40102674 3fffc200 00000022 401006a0  
3ffffdc0:  401008ae 00000030 00000010 ffffffff  
3ffffdd0:  40100977 3ffefc14 00000004 000000a5  
3ffffde0:  40104ceb 40104ce8 00000000 f7ffffff  
3ffffdf0:  400005e1 3fffc6fc 00000001 3ffefbf8  
3ffffe00:  40213080 00000030 00000010 00000030  
3ffffe10:  4021306c 3fff040c a5a5a5a5 00000004  
3ffffe20:  3fff04ec 0000002f 00000000 00000135  
3ffffe30:  00000035 3fffc6fc 00000001 3fffff60  
3ffffe40:  000000c9 00000000 000000cc 00000000  
3ffffe50:  00000001 00004208 3ffee2d8 00000000  
3ffffe60:  3ffe93d8 008e0fd2 3ffefd2c 016b6c18  
3ffffe70:  3ffe93e4 2c9f0300 4000050c 3fffc278  
3ffffe80:  40102674 3fffc200 00000022 3ffefd2c  
3ffffe90:  40000f68 00000030 00000011 ffffffff  
3ffffea0:  00000020 00000000 3ffef660 00000001  
3ffffeb0:  000000c9 40203088 00000020 40100b30  
3ffffec0:  000000e8 00000014 3ffef660 000000cc  
3ffffed0:  00000000 40202fdd 000000e8 40100b30  
3ffffee0:  00000000 000000dc 3fffff60 40202fdd  
3ffffef0:  4021867d 000000c9 3fffff60 4021306c  
3fffff00:  00004208 00c28001 92040e00 3ffed768  
3fffff10:  00000000 3fff03e4 0000001c 4020dd48  
3fffff20:  3ffec5d0 4021ccfc 3ffec5d0 3ffea0a4  
3fffff30:  3ffea0a4 000000ef 00000000 3fff0184  
3fffff40:  3fffdc80 3fffff64 3fffff60 4020c0f4  
3fffff50:  3ffea098 3fff00ec 3fff03e4 4022aedc  
3fffff60:  402264ca 3ffec5d0 00000000 3fffdcb0  
3fffff70:  40225e17 00000000 3fff03e4 4022cb9b  
3fffff80:  40000f49 3fffdab0 3fffdab0 40000f49  
3fffff90:  40000e19 40001878 00000002 00000000  
3fffffa0:  3fffff10 aa55aa55 0000000a 40104ae9  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
vd6973cd6
~ld

Legacy Stack Trace (yes it's complete - nothing decoded after the ethernet_input line) :

PC: 0x4022bc74
EXCVADDR: 0x00000000

Decoding stack results
0x402030f8: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x402030f8: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x40100b8b: umm_calloc(size_t, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1716
0x401006a0: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x401008ae: check_poison_block(umm_block*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 819
0x40100977: check_poison_all_blocks() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 892
0x40213080: mem_malloc at core/mem.c line 221
0x4021306c: mem_malloc at core/mem.c line 210
0x40203088: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100b30: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202fdd: malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 95
0x40100b30: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202fdd: malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 95
0x4021306c: mem_malloc at core/mem.c line 210
0x4020dd48: pbuf_alloc_LWIP2 at core/pbuf.c line 284
0x4020c0f4: esp2glue_alloc_for_recv at glue-lwip/lwip-git.c line 428
0x4022aedc: ethernet_input at glue-esp/lwip-esp.c line 352

As the decode didn't look helpful with legacy stack, I also reverted to my original code above...

Normal Stack debug output


Hello
wifi evt: 2
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
cnt 

connected with apname, channel 9
dhcp client start...
wifi evt: 0
ip:192.168.10.60,mask:255.255.255.0,gw:192.168.10.1
wifi evt: 3
..........pm open,type:2 0
..............Fatal exception 0(IllegalInstructionCause):
epc1=0x4022bc6c, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Exception (0):
epc1=0x4022bc6c epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000000 depc=0x00000000

>>>stack>>>

ctx: sys
sp: 3fffecb0 end: 3fffffb0 offset: 01a0
3fffee50:  400005e1 00000608 3ffef154 00000030  
3fffee60:  402024d0 00000030 00000010 00000001  
3fffee70:  40000f49 3ffee598 00000000 3fffd9d0  
3fffee80:  00000000 00000000 00000000 fffffffe  
3fffee90:  ffffffff 3fffc6fc 00000000 3fffdab0  
3fffeea0:  00000000 3fffdad0 3ffee598 00000000  
3fffeeb0:  3ffeea38 00000000 3ffef69a 402117e5  
3fffeec0:  40218675 0000002a 3fffef30 40213064  
3fffeed0:  380aa8c0 00000000 9204ffff 3ffed768  
3fffeee0:  3ffef6a2 3ffef67c 00000160 401006a8  
3fffeef0:  3fffdc80 3ffef0bc 3ffef64c 3ffef15c  
3fffef00:  00000608 3ffeea38 3ffef67c 4020c2f8  
3fffef10:  3fffdc80 3ffef0bc 3ffef654 4020c113  
3fffef20:  4022af02 3ffef0bc 3ffef654 4022af13  
3fffef30:  3ffef68c 3ffef67c 00000000 3fffdcb0  
3fffef40:  40225e0f 00000000 3ffef654 4022cb93  
3fffef50:  40000f49 3fffdab0 3fffdab0 40000f49  
3fffef60:  40000e19 40001878 00000002 00000000  
3fffef70:  3fffff10 aa55aa55 000000c9 40104af1  
3fffef80:  40104af7 00000002 00000000 52ffe941  
3fffef90:  4010000d c11211c8 03542210 6800f00d  
3fffefa0:  40100dfc 3fffef3c 40100da9 3fffff68  
3fffefb0:  3fffffc0 00000000 00000000 feefeffe  
3fffefc0:  feefeffe feefeffe feefeffe feefeffe  
3fffefd0:  feefeffe feefeffe feefeffe feefeffe  
3fffefe0:  feefeffe feefeffe feefeffe feefeffe  
3fffeff0:  feefeffe feefeffe feefeffe feefeffe  
3ffff000:  feefeffe feefeffe feefeffe feefeffe  
3ffff010:  feefeffe feefeffe feefeffe feefeffe  
3ffff020:  feefeffe feefeffe feefeffe feefeffe  
3ffff030:  feefeffe feefeffe feefeffe feefeffe  
3ffff040:  feefeffe feefeffe feefeffe feefeffe  
3ffff050:  feefeffe feefeffe feefeffe feefeffe  
3ffff060:  feefeffe feefeffe feefeffe feefeffe  
3ffff070:  feefeffe feefeffe feefeffe feefeffe  
3ffff080:  feefeffe feefeffe feefeffe feefeffe  
3ffff090:  feefeffe feefeffe feefeffe feefeffe  
3ffff0a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff0f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff100:  feefeffe feefeffe feefeffe feefeffe  
3ffff110:  feefeffe feefeffe feefeffe feefeffe  
3ffff120:  feefeffe feefeffe feefeffe feefeffe  
3ffff130:  feefeffe feefeffe feefeffe feefeffe  
3ffff140:  feefeffe feefeffe feefeffe feefeffe  
3ffff150:  feefeffe feefeffe feefeffe feefeffe  
3ffff160:  feefeffe feefeffe feefeffe feefeffe  
3ffff170:  feefeffe feefeffe feefeffe feefeffe  
3ffff180:  feefeffe feefeffe feefeffe feefeffe  
3ffff190:  feefeffe feefeffe feefeffe feefeffe  
3ffff1a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff1f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff200:  feefeffe feefeffe feefeffe feefeffe  
3ffff210:  feefeffe feefeffe feefeffe feefeffe  
3ffff220:  feefeffe feefeffe feefeffe feefeffe  
3ffff230:  feefeffe feefeffe feefeffe feefeffe  
3ffff240:  feefeffe feefeffe feefeffe feefeffe  
3ffff250:  feefeffe feefeffe feefeffe feefeffe  
3ffff260:  feefeffe feefeffe feefeffe feefeffe  
3ffff270:  feefeffe feefeffe feefeffe feefeffe  
3ffff280:  feefeffe feefeffe feefeffe feefeffe  
3ffff290:  feefeffe feefeffe feefeffe feefeffe  
3ffff2a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff2f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff300:  feefeffe feefeffe feefeffe feefeffe  
3ffff310:  feefeffe feefeffe feefeffe feefeffe  
3ffff320:  feefeffe feefeffe feefeffe feefeffe  
3ffff330:  feefeffe feefeffe feefeffe feefeffe  
3ffff340:  feefeffe feefeffe feefeffe feefeffe  
3ffff350:  feefeffe feefeffe feefeffe feefeffe  
3ffff360:  feefeffe feefeffe feefeffe feefeffe  
3ffff370:  feefeffe feefeffe feefeffe feefeffe  
3ffff380:  feefeffe feefeffe feefeffe feefeffe  
3ffff390:  feefeffe feefeffe feefeffe feefeffe  
3ffff3a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff3f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff400:  feefeffe feefeffe feefeffe feefeffe  
3ffff410:  feefeffe feefeffe feefeffe feefeffe  
3ffff420:  feefeffe feefeffe feefeffe feefeffe  
3ffff430:  feefeffe feefeffe feefeffe feefeffe  
3ffff440:  feefeffe feefeffe feefeffe feefeffe  
3ffff450:  feefeffe feefeffe feefeffe feefeffe  
3ffff460:  feefeffe feefeffe feefeffe feefeffe  
3ffff470:  feefeffe feefeffe feefeffe feefeffe  
3ffff480:  feefeffe feefeffe feefeffe feefeffe  
3ffff490:  feefeffe feefeffe feefeffe feefeffe  
3ffff4a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff4f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff500:  feefeffe feefeffe feefeffe feefeffe  
3ffff510:  feefeffe feefeffe feefeffe feefeffe  
3ffff520:  feefeffe feefeffe feefeffe feefeffe  
3ffff530:  feefeffe feefeffe feefeffe feefeffe  
3ffff540:  feefeffe feefeffe feefeffe feefeffe  
3ffff550:  feefeffe feefeffe feefeffe feefeffe  
3ffff560:  feefeffe feefeffe feefeffe feefeffe  
3ffff570:  feefeffe feefeffe feefeffe feefeffe  
3ffff580:  feefeffe feefeffe feefeffe feefeffe  
3ffff590:  feefeffe feefeffe feefeffe feefeffe  
3ffff5a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff5f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff600:  feefeffe feefeffe feefeffe feefeffe  
3ffff610:  feefeffe feefeffe feefeffe feefeffe  
3ffff620:  feefeffe feefeffe feefeffe feefeffe  
3ffff630:  feefeffe feefeffe feefeffe feefeffe  
3ffff640:  feefeffe feefeffe feefeffe feefeffe  
3ffff650:  feefeffe feefeffe feefeffe feefeffe  
3ffff660:  feefeffe feefeffe feefeffe feefeffe  
3ffff670:  feefeffe feefeffe feefeffe feefeffe  
3ffff680:  feefeffe feefeffe feefeffe feefeffe  
3ffff690:  feefeffe feefeffe feefeffe feefeffe  
3ffff6a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff6f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff700:  feefeffe feefeffe feefeffe feefeffe  
3ffff710:  feefeffe feefeffe feefeffe feefeffe  
3ffff720:  feefeffe feefeffe feefeffe feefeffe  
3ffff730:  feefeffe feefeffe feefeffe feefeffe  
3ffff740:  feefeffe feefeffe feefeffe feefeffe  
3ffff750:  feefeffe feefeffe feefeffe feefeffe  
3ffff760:  feefeffe feefeffe feefeffe feefeffe  
3ffff770:  feefeffe feefeffe feefeffe feefeffe  
3ffff780:  feefeffe feefeffe feefeffe feefeffe  
3ffff790:  feefeffe feefeffe feefeffe feefeffe  
3ffff7a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff7f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff800:  feefeffe feefeffe feefeffe feefeffe  
3ffff810:  feefeffe feefeffe feefeffe feefeffe  
3ffff820:  feefeffe feefeffe feefeffe feefeffe  
3ffff830:  feefeffe feefeffe feefeffe feefeffe  
3ffff840:  feefeffe feefeffe feefeffe feefeffe  
3ffff850:  feefeffe feefeffe feefeffe feefeffe  
3ffff860:  feefeffe feefeffe feefeffe feefeffe  
3ffff870:  feefeffe feefeffe feefeffe feefeffe  
3ffff880:  feefeffe feefeffe feefeffe feefeffe  
3ffff890:  feefeffe feefeffe feefeffe feefeffe  
3ffff8a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff8f0:  feefeffe feefeffe feefeffe feefeffe  
3ffff900:  feefeffe feefeffe feefeffe feefeffe  
3ffff910:  feefeffe feefeffe feefeffe feefeffe  
3ffff920:  feefeffe feefeffe feefeffe feefeffe  
3ffff930:  feefeffe feefeffe feefeffe feefeffe  
3ffff940:  feefeffe feefeffe feefeffe feefeffe  
3ffff950:  feefeffe feefeffe feefeffe feefeffe  
3ffff960:  feefeffe feefeffe feefeffe feefeffe  
3ffff970:  feefeffe feefeffe feefeffe feefeffe  
3ffff980:  feefeffe feefeffe feefeffe feefeffe  
3ffff990:  feefeffe feefeffe feefeffe feefeffe  
3ffff9a0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9b0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9c0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9d0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9e0:  feefeffe feefeffe feefeffe feefeffe  
3ffff9f0:  feefeffe feefeffe feefeffe feefeffe  
3ffffa00:  feefeffe feefeffe feefeffe feefeffe  
3ffffa10:  feefeffe feefeffe feefeffe feefeffe  
3ffffa20:  feefeffe feefeffe feefeffe feefeffe  
3ffffa30:  feefeffe feefeffe feefeffe feefeffe  
3ffffa40:  feefeffe feefeffe feefeffe feefeffe  
3ffffa50:  feefeffe feefeffe feefeffe feefeffe  
3ffffa60:  feefeffe feefeffe feefeffe feefeffe  
3ffffa70:  feefeffe feefeffe feefeffe feefeffe  
3ffffa80:  feefeffe feefeffe feefeffe feefeffe  
3ffffa90:  feefeffe feefeffe feefeffe feefeffe  
3ffffaa0:  feefeffe feefeffe feefeffe feefeffe  
3ffffab0:  feefeffe feefeffe feefeffe feefeffe  
3ffffac0:  feefeffe feefeffe feefeffe feefeffe  
3ffffad0:  feefeffe feefeffe feefeffe feefeffe  
3ffffae0:  feefeffe feefeffe feefeffe feefeffe  
3ffffaf0:  feefeffe feefeffe feefeffe feefeffe  
3ffffb00:  feefeffe feefeffe feefeffe feefeffe  
3ffffb10:  feefeffe feefeffe feefeffe feefeffe  
3ffffb20:  feefeffe feefeffe feefeffe feefeffe  
3ffffb30:  feefeffe feefeffe feefeffe feefeffe  
3ffffb40:  feefeffe feefeffe feefeffe feefeffe  
3ffffb50:  feefeffe feefeffe feefeffe feefeffe  
3ffffb60:  feefeffe feefeffe feefeffe feefeffe  
3ffffb70:  feefeffe feefeffe feefeffe feefeffe  
3ffffb80:  feefeffe feefeffe feefeffe feefeffe  
3ffffb90:  feefeffe feefeffe feefeffe feefeffe  
3ffffba0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbb0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbc0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbd0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbe0:  feefeffe feefeffe feefeffe feefeffe  
3ffffbf0:  feefeffe feefeffe feefeffe feefeffe  
3ffffc00:  feefeffe feefeffe feefeffe feefeffe  
3ffffc10:  feefeffe feefeffe feefeffe feefeffe  
3ffffc20:  feefeffe feefeffe feefeffe feefeffe  
3ffffc30:  feefeffe feefeffe feefeffe feefeffe  
3ffffc40:  feefeffe feefeffe feefeffe feefeffe  
3ffffc50:  feefeffe feefeffe feefeffe feefeffe  
3ffffc60:  0000000a 40104d80 0000000a 00000000  
3ffffc70:  40001da0 0000000a feefeffe feefeffe  
3ffffc80:  40001db4 feefeffe feefeffe feefeffe  
3ffffc90:  00000002 00000000 0000000a 00000000  
3ffffca0:  00000002 00000000 0000000a 00000000  
3ffffcb0:  feefeffe feefeffe feefeffe feefeffe  
3ffffcc0:  00000000 a0000000 00000000 0000001c  
3ffffcd0:  00002000 80af1999 00002000 00000000  
3ffffce0:  3ffffe40 00000000 3ffffe40 4020a15e  
3ffffcf0:  0000a000 3ffffde3 3ffee630 00000000  
3ffffd00:  00000000 40203084 40205e6d 00000008  
3ffffd10:  3ffffe40 00000008 3ffffe40 4020a15e  
3ffffd20:  3ffffda0 3ffffddb 3ffffd50 00000000  
3ffffd30:  fffffffe 00000000 40101857 4020a098  
3ffffd40:  3ffffe40 3ffffddb 3ffffda0 40205f6c  
3ffffd50:  00000008 40226077 3ffef25c 3ffecf1c  
3ffffd60:  3ffe8304 00000000 0000000a 4023b3a0  
3ffffd70:  3ffffde3 00000002 00000000 00000008  
3ffffd80:  3ffef316 3ffeece4 00000008 3ffe8704  
3ffffd90:  00000000 3ffe8703 3ffffe40 4020a56f  
3ffffda0:  00000000 ffffffff ffffffff 00000000  
3ffffdb0:  00000008 00000008 3f302064 00000000  
3ffffdc0:  3ffede50 3ffed740 00000001 00000001  
3ffffdd0:  00000000 000035c6 3221d807 32303530  
3ffffde0:  00393930 4021de70 3ffed614 00000000  
3ffffdf0:  3ffed000 4021d77c 00000000 00000012  
3ffffe00:  00000005 00000000 00000020 40101712  
3ffffe10:  3ffe8b45 401049eb 3ffec5a8 3ffee630  
3ffffe20:  401022b7 3ffec5a8 0000008c 40100d36  
3ffffe30:  fffffff7 01634a7c 3ffed000 40102494  
3ffffe40:  3ffe93d8 00000000 00000000 ffff0208  
3ffffe50:  fffffff7 01634a7c 4010295a 00000100  
3ffffe60:  3ffe93d8 7fffffff 00000000 00000001  
3ffffe70:  00000001 00000080 4000050c 3fffc278  
3ffffe80:  3ffe93d8 00000030 00000010 01634a7c  
3ffffe90:  3ffe93e4 2c9f0300 4000050c 3fffc278  
3ffffea0:  4010267c 3fffc200 00000022 002e2e2e  
3ffffeb0:  40202671 00000030 00000010 ffffffff  
3ffffec0:  40100dc1 40100dbc 40202668 00000000  
3ffffed0:  00000000 00000000 00000000 fffffffe  
3ffffee0:  ffffffff 3fffc6fc 00000001 3ffe850c  
3ffffef0:  00000000 3fffdad0 3ffee598 00000030  
3fffff00:  3fffdad0 3fffff3c 3fffff48 3ffee598  
3fffff10:  3fffdad0 00000075 3ffee47c 402016fc  
3fffff20:  3ffe86ef 00000000 3ffe86ee 4020351a  
3fffff30:  40105155 0074bbf2 3ffe864a 3ffee598  
3fffff40:  40105202 3ffeceac 0074bbf2 00000000  
3fffff50:  401053db 0074c830 3ffee5f8 00000000  
3fffff60:  3ffede50 3ffee5f8 3ffe850c 3ffee5f8  
3fffff70:  3fffdad0 3ffee598 4020253b 3fffefa0  
3fffff80:  3ffee5f8 3fffdad0 0000000a 40202b3b  
3fffff90:  00000000 00000000 3ffee568 402011c0  
3fffffa0:  3fffdad0 00000000 3ffee568 40202688  
<<<stack<<<

 ets Jan  8 2013,rst cause:2, boot mode:(3,6)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
vf78ab66f
~ld

Normal Stack Decoded:

Exception 0: Illegal instruction
PC: 0x4022bc6c
EXCVADDR: 0x00000000

Decoding stack results
0x402024d0: loop_task(ETSEvent*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 144
0x402117e5: etharp_input at core/ipv4/etharp.c line 742
0x40213064: mem_malloc at core/mem.c line 210
0x401006a8: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x4020c2f8: ethernet_input_LWIP2 at netif/ethernet.c line 207
0x4020c113: esp2glue_ethernet_input at glue-lwip/lwip-git.c line 441
0x4022af02: ethernet_input at glue-esp/lwip-esp.c line 363
0x4022af13: ethernet_input at glue-esp/lwip-esp.c line 371
0x4020a15e: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 233
0x40203084: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40205e6d: _printf_i at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 194
0x4020a15e: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 233
0x4020a098: __ssputs_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 180
0x40205f6c: _printf_i at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf_i.c line 244
0x4020a56f: _svfprintf_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/nano-vfprintf.c line 660
0x40100d36: umm_realloc(void*, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1745
0x40202671: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 134
0x40202668: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 132
0x402016fc: HardwareSerial::write(unsigned char const*, unsigned int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/HardwareSerial.h line 158
0x4020351a: uart_write(uart_t*, char const*, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/uart.cpp line 498
0x4020253b: esp_yield() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 90
0x40202b3b: __delay(unsigned long) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring.cpp line 54
0x402011c0: loop() at /home/mnix/Arduino/blank/blank.ino line 53
0x40202688: loop_wrapper() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 140

Also any given build will crash consistently (ie at the same address with the same trace) although changing the code, building then changing it back building and flashing can give a different result.

I'm beginning to wonder if my build environment has an issue... although that would not explain why I have never had an issue with the Arduino IDE developing for Arduinos, but I have this issue with all releases of the esp8266 tools since 2.4.0 (may go back further but that was as far as I tested)

If I hadn't already tested with several WEMOS D1 boards which all crashed identically for a given build I would be sure it was dodgy hardware.

@ChocolateFrogsNuts
Copy link
Contributor Author

oops, can't read the display on my bench supply :-)
Actual power consumption of the board is 55-60mA @ 5v with wifi on and usb disconnected.

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 2, 2019

Please enable all debug options, so we may get more info.

@ChocolateFrogsNuts
Copy link
Contributor Author

I can't see an "ALL" option, but the above were done with what appears to be the "everything enabled" option - the second-last in the list, with all the other items in it.
SSL+TLS_MEM+HTTP_CLIENT+HTTP_SERVER+CORE+WIFI+HTTP_UPDATE+UPDATER+OTA+OOM

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 2, 2019

Yes, this entry.

@ChocolateFrogsNuts
Copy link
Contributor Author

Just to be clear, I had that option selected when I captured the output I posted above - so there's no much point in posting the same again....

If only we had access to the SDK source I could probably have a fix sorted in a couple of hours :-/

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 4, 2019

Can you post an archive with sketch source, selected options and the elf file ?

@ChocolateFrogsNuts
Copy link
Contributor Author

blank.zip

Ok, that should be what you need - complete preferences.txt from ~/.arduino15, the sketch and the binary file that was generated, plus a matching set of debug and stack decodes.

Interestingly this particular build is crashing very early in the run - much earlier than before, even though the sketch has not been modified since my last set of logs.
As soon as the sketch started to run, there were two crashes and re-starts, with different exceptions but the same symptoms - extremely large stack with a large un-modified block in the middle (still set to feefeffe)

Hopefully you can work something out from that..

@ChocolateFrogsNuts
Copy link
Contributor Author

oops, just realised the bin file in that archive isn't the elf file you wanted... correct file attached...
blank-elf.zip

Apologies for calling the sketch "blank" it kind of grew from the example :(

@spblinux
Copy link

spblinux commented Aug 5, 2019

@ChocolateFrogsNuts wrote: If only we had access to the SDK source I could probably have a fix sorted in a couple of hours :-/

: Read through this issue looking for sources of sdk 2.2.1; additional googling for sources led to:
https://github.com/espressif/ESP8266_NONOS_SDK/tree/v2.2.1
(Just switch from branch:master to tag v2.2.1 in https://github.com/espressif/ESP8266_NONOS_SDK)

In Arduino ide version of "board esp8266 package" seems to be equal to esp8266 sdk version

@ChocolateFrogsNuts
Copy link
Contributor Author

@spblinux at a quick glance it appears to offer no more than the tools/sdk directory in the Arduino library... but it is late here, so I will have a closer look in the morning.

@spblinux
Copy link

spblinux commented Aug 5, 2019

@ChocolateFrogsNuts sdk example code and sources of lib driver.a are stripped out in arduino ide version of sdk. But source code of functions referenced in ld/eagle.rom.addr.v6.ld is missing in both variants of sdk. (In my case I needed the source code of a uart controlled esp8285 "wifi shield" which is in examples/at.)

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 5, 2019

@spblinux we don't use libdriver.a. It is a leftover that shouldn't be in this repository.
About ld/eagle.rom.addr.v6.ld, this is 1) closed source 2) in rom (on the silicium, not in a .a)

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 5, 2019

@ChocolateFrogsNuts

I'm sorry I still can't reproduce with your binary

SDK:2.2.2-dev(38a443e)/Core:2.5.2-99-gf78ab66f=20502099/lwIP:STABLE-2_1_2_RELEASE/glue:1.1-8-g2314329/BearSSL:89454af
...
22:37:46.757 -> ..scandone
22:37:48.485 -> no Newbicup found, reconnect after 1s
22:37:48.518 -> wifi evt: 1
22:37:48.518 -> STA disconnect: 201
22:37:48.584 -> reconnect
22:37:48.784 -> ..wifi evt: 2
22:37:50.778 -> .scandone
22:37:52.406 -> state: 0 -> 2 (b0)
22:37:52.406 -> .state: 2 -> 3 (0)
22:37:52.406 -> state: 3 -> 5 (10)
22:37:52.406 -> add 0
22:37:52.406 -> aid 1
22:37:52.406 -> cnt 
22:37:52.439 -> 
22:37:52.439 -> connected with Newbicup, channel 11
22:37:52.538 -> dhcp client start...
22:37:52.538 -> wifi evt: 0
22:37:53.402 -> ..ip:192.168.43.154,mask:255.255.255.0,gw:192.168.43.103
22:37:55.363 -> wifi evt: 3
22:37:55.429 -> .......pm open,type:2 0
22:38:02.442 -> .........................
22:38:27.646 -> ........................................
22:39:07.910 -> ........................................
22:39:48.188 -> ........................................
22:40:28.492 -> ........................................
22:41:08.789 -> ........................................
22:41:49.086 -> ........................................
22:42:29.389 -> ........................................
22:43:09.693 -> ........................................
22:43:49.975 -> ................................

That could be a worse version of the #2330 issue (per your comment)

  //wifi_set_sleep_type(NONE_SLEEP_T); // this doesn't seem to crash as much
  wifi_set_sleep_type(MODEM_SLEEP_T); // this is the default and crashes better

Can you try with your mobile phone as AP ? (that's what I did)

@ChocolateFrogsNuts
Copy link
Contributor Author

Additional Info:
I just tried it on a different wifi network - one with zero traffic (an OKI printer as the AP) - and it is not crashing! This explains why you can't get the issue to happen for you, it's triggered by certain data packets on the wifi.
Further testing connected to a Samsung Galaxy Tab S4 tablet with hotspot turned on reveals it is any wifi with internet access. If I have hotspot turned on but mobile data off and no connection to my normal wifi, there are no crashes at all. The second I turn on mobile data, or wifi (the Tab S4 can share another wifi network via hotspot - there's something I would never have known!), anyway the second the hotspot device gets an internet connection via mobile data or sharing my normal wifi, the ESP code starts crashing, and when I disable that internet connection the ESP stops crashing.
Also, if the hotspot has mobile data enabled, but does not have phone signal there are no crashes.

When it crashes on my normal wifi network, I don't see any traffic from the ESP chip other than the usual DHCP after the stack dump has finished - my AP is not the gateway so I can get that with a 'tcpdump -s 1500 -X ether host xx:xx:xx:xx:xx:xx' on the gateway.

For some reason it still crashes when I drop the internet connection from my gateway though - I can only assume the ESP code still thinks it has an internet connection, so still does whatever it is that is the problem.

Yay! this is progress - I can now control when it crashes, which hopefully narrows down why a little.

@ChocolateFrogsNuts
Copy link
Contributor Author

Hmm, looks like it could in fact be related to #2330 -
Connected to the Tab S4 hotspot with no internet connection, it did not crash... until...

Using an app called Network Utilities, I can ping the ESP without any issues, however if I use the "IP Discovery" tool in that app, I get an instant crash every time - not only that it seems to cause a whole series of crashes (probably from multiple ARP requests).

I will go looking in the lwip arp code later today... I have something else I have to tend to this morning :-)

@TD-er
Copy link
Contributor

TD-er commented Aug 6, 2019

And now this issue became even more interesting :)
Do you have a link to that specific app?

@ChocolateFrogsNuts
Copy link
Contributor Author

Ok, further testing and I am increasingly convinced the stack is being used up by the binary blob...
I can't be certain it isn't a bad or missing piece of initialisation by the Arduino code... yet... but every time I decode a stack trace and locate the decoded calls in the trace I see the same thing:

  • the stack appears normal, with around 500+ bytes in use.
  • The CPU then jumps out of the expected execution path
  • further on there is a call to ethernet_input. At this time the stack is already over sized.

So I tried several of the LWIP options - all crashed in the same way, including the "v1.4 Compile from source" option.

With that option selected (easiest way to re-compile lwip), I then added one line of code to tools/sdk/lwip/src/netif/etharp.c
Line 1317 - straight after the local variables in ethernet_input()

if (&p < (0x3FFFFFB0 - 4000)) goto free_and_return;

That one line, which has no effect unless the stack is already full, seems to make it almost immune to crashes no matter how many ARP requests I send it! Unfortunately it is also almost completely deaf to IP traffic, but at least we know why ARP is behaving badly - it looks like every time network traffic is received, a "random" amount of stack is allocated!
I suppose it's still possible there is memory corruption somewhere else as the root cause, but that just doesn't sound right - I can't think of any sane code that would allocate stack space based on a size read from somewhere, it's usually small variables of fixed size.

The app I was using on the tablet is
https://play.google.com/store/apps/details?id=com.myprog.netutils

@d-a-v
Copy link
Collaborator

d-a-v commented Aug 6, 2019

If stack is a concern, please do your tests with this additional call:

    disable_extra4k_at_link_time(); // anywhere in the sketch

(this will allocate the stack in user heap space - which is the SDK default, not within the system stack space, which is a hack not compatible for example when WPS is used - for the record, this case is automatically handled)

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Aug 8, 2019

I've had disable_extra4k_at_link_time() back in while I try to work out the details of the stack.
Seeing a lot of Illegal Instruction exceptions, and the PC is at odd locations (ie not aligned with instructions)
The Esp Exception Decoder has been of limited help - it only decodes parts of the stack as it relies on gdb. I've been working on something that reads the objdump of the elf and produces a more complete stack trace (in one case 10 functions are on the stack, but gdb can only name 3 of them)

I still don't like the look of the stacks I am seeing.... for example:

Exception 0: Illegal instruction
PC: 0x40228a98
EXCVADDR: 0x00000000

Decoding stack results
0x40207669: _vsnprintf_r at /home/earle/src/esp-quick-toolchain/repo/newlib/newlib/libc/stdio/vsnprintf.c line 73
0x40100870: check_poison_block(umm_block*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 846
0x40202f9c: calloc_loc(size_t, size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 134
0x40100cc1: umm_calloc(size_t, size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1716
0x4010067c: _umm_free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1304
0x40100ea4: free(void*) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1764
0x40222f7e: pbuf_free at core/pbuf.c line 752
0x40202f28: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100c56: umm_malloc(size_t) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/umm_malloc/umm_malloc.cpp line 1685
0x40202f28: malloc_loc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 126
0x40100274: pvPortMalloc(size_t, char const*, int) at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/heap.cpp line 68
0x4022311d: pbuf_alloc at core/pbuf.c line 388
0x40222096: ethernet_input at netif/etharp.c line 1412
0x401000b8: app_entry() at /home/mnix/Arduino/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 263

There is no way umm_malloc should call malloc_loc then back to pbuf_free.
This tells me the stack pointer has been messed with and these are old calls... but figuring out just where this is happening is doing my head in :-(
Note that the PC was off in user_uart_wait_tx_fifo_empty() so this still looks like a corrupted return address on the stack, most likely it is being sent off into some random function, and when it executes the return from that, the different sized stack frame loads a "random" stack pointer and PC.

This could be anywhere... next job is to get more debugging output from the arduino libraries - there still seems to be stuff that is not enabled

@ChocolateFrogsNuts
Copy link
Contributor Author

Note that I am leaning towards a block of heap being free'd but the pointer being left in play somewhere as the root cause... something has somehow caused check_poison_block to be called on a block that has already been free'd at some point...

@earlephilhower
Copy link
Collaborator

One more thing you can try is enabling gdb. A crash like this goes into agdb break and you can try the stack unwinder or look at system memory. I've got a pr that does a core dump that gdb plus a custom app can load, like loading a core file on Unix, but the stack unwinding is more foolproof on a live system.

@ChocolateFrogsNuts
Copy link
Contributor Author

Hmm, today I completed my own stack decoder that does a better job than the ESP Exception Decoder tool, which simply prints out the line of code associated with every user code address found in the stack, if gdb can provide it.
The decoder I wrote actually attempts to walk the stack frames (which are not always the same size) in both directions to establish the integrity of the stack. It uses objdump on the elf file and gets not only the function addresses but the size of the stack frame that function creates.
This allows it to walk all the firmware calls on the stack. In addition it loads the linker file with the ROM function addresses (rom_8266.ld) allowing me to identify those functions - but not the size of their stack frames. This all gives me a really good solid stack dumper that knows for sure if the stack is valid.

So what have I worked out about the stack on these crashes?
Well firstly even with enable_extra4k_at_link_time() the dump is of the "sys" stack, not the "cont" stack. (and yes I did try a deliberate exception in my code to make sure the "cont" stack is dumped when appropriate).
Secondly, the stack dumps I am getting are trash :-( they are almost entirely the left over data that would normally be beyond the stack pointer, and not dumped at all - except it is dumped because the stack pointer is way off. We can't trust anything on the stack to tell us where it was.
I have also tried dumping the heap with umm_info() - so far it appears to be in tact.

At this point, trying to get gdb to do anything with a stack or core dump is going to be fairly useless I think :( I need a way to step through the live code and see what is happening, or I need to get more debug output from all the code in the form of text out the serial port.

It's not going to be easy to tell exactly where the thing is when it initially goes wrong.

@TD-er
Copy link
Contributor

TD-er commented Aug 8, 2019

Hmm, that's bug hunting for pro's :)

I would love to see good stack decoding utils for this platform.

@ChocolateFrogsNuts
Copy link
Contributor Author

@earlephilhower one of the things I found today was espressif's gdbstub which can be compiled in with your code and apparently allows gdb to step through the live code on chip. I haven't ruled out trying it out yet.....

I have at least managed to get lwip to start dumping massive amounts of debug info to Serial, but wouldn't you know it, now that I might be able to see what happens, hitting it with ARP requests from my tablet no longer crashes it. $#@%*@^#$$ computers.....

Fortunately it still crashes on my normal network... and so far there has been a gap of around 1 second between the last network traffic and the exception, and one exception happened before any network traffic was logged.....

Time to find more debugging options to enable :-/

Oh, and @TD-er don't get your hopes up - my stack decoder is a fairly rough perl script for the command line - nothing like the exception decoder tool for the IDE, although I may do a conversion/upgrade/shove the two together some day.

@earlephilhower
Copy link
Collaborator

@ChocolateFrogsNuts don't use the Espressif version, it doesn't support sharing the UART and has a custom format that's incompatible with the plain-GNU toolchain we've been using.

How to use the included one is in here: https://github.com/esp8266/Arduino/blob/master/doc/gdb.rst

It has full GDB support, including single stepping/breakpoint/Ctrl-C interrupt/etc. on the live system. We have no ELF or source for the blob or ROMs, though, so don't expect to be able to get anything other than assembly inside non-open source code.

@TD-er
Copy link
Contributor

TD-er commented Aug 8, 2019

Oh, and @TD-er don't get your hopes up - my stack decoder is a fairly rough perl script for the command line - nothing like the exception decoder tool for the IDE, although I may do a conversion/upgrade/shove the two together some day.

Well that's the nice thing with open source.
No matter how ugly the hack is to make something work, there is always someone out there to polish it, if it is based on a good idea.

@devyte
Copy link
Collaborator

devyte commented Aug 8, 2019

@ChocolateFrogsNuts please continue pursuit, you may be on to a stability issue. Also, in case you haven't realized it already, @earlephilhower has done a lot of work related to what you're looking at, so please discuss with him.

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Aug 9, 2019

thanks @earlephilhower I'll keep it in mind if the current line of attack fails...

Currently I'm doing my own build of lwip2. I've got debugging output from the glue layer, it tells me plenty, but it isn't doing anything for about a second before the crash, although there seems to be a packet repeated several times over the previous 10 seconds, so I will find out what that is.

I'm working on getting more output by enabling more debugging from lwip2 itself to see if anything is happening in there.
At least it's still crashing consistently on my network.... while there's crash, there's hope :-)

EDIT: got the lwip2 debugging on... I turned everything on... This will take some time to analyse :)

@ChocolateFrogsNuts ChocolateFrogsNuts changed the title SDK appears to randomly allocate >3K stack [solved, TX pwr too high] SDK appears to randomly allocate >3K stack [solved, high TX pwr + cheap/slow flash chip = random crash] Sep 24, 2019
@d-a-v
Copy link
Collaborator

d-a-v commented Sep 24, 2019

I wonder why d1 (mini, mini pro) defaults to DIO and not QIO, maybe @wemos can tell ?

@ChocolateFrogsNuts
Copy link
Contributor Author

I wonder why d1 (mini, mini pro) defaults to DIO and not QIO, maybe @wemos can tell ?

Not sure it's wemos that selected that default.. it's set in boards.txt (from tools/boards.txt.py)

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 24, 2019

@wemos is indeed the one user who defined these boards
(https://github.com/esp8266/Arduino/pulls?q=author%3Awemos)

@earlephilhower
Copy link
Collaborator

https://wiki.wemos.cc/_media/products:d1:sch_d1_mini_v3.0.0.pdf

There's only DO and DI on the schematic, so maybe legit boards' flash chips don't have the add'l D2/D3 pins?

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 24, 2019

It seems sdio_data_2/3 are gpio9/10 and are connected

@earlephilhower
Copy link
Collaborator

Yes, that's the actual IC pinout. But the flash chip needs them as well, and to support QIO. I imagine the Wemos selected flash chips did not have QIO mode, or they couldn't route it properly on the 2-layer PCB they had.

@ChocolateFrogsNuts obviously has a really weird fake with different chips and evidently a different schematic. Seems they just stole the silkscreen layer from WeMos. :(

@TD-er
Copy link
Contributor

TD-er commented Sep 24, 2019

I don't think any so called "Wemos" board out there that you can now buy is made by Wemos/Lolin.
They stopped making them over a year ago, so the "original" Wemos boards are long time sold by now.
Also the quality of these clones is getting worse every week so it seems.
The voltage regulators these boards now ship with can only handle 150 mA max. and even if they use parts rated for more, they cannot handle the heat in those SOT23 form factors.

I'm no longer using the "Wemos D1 mini" form factor in my projects. They have become unreliable since you are not sure what you'll get when ordering them.

@ChocolateFrogsNuts
Copy link
Contributor Author

That reads to me like WP and HP on the flash double as D2 and D3 for QIO.
I will test QIO with the genuine boards (winbond flash) when I get a chance.

@ChocolateFrogsNuts
Copy link
Contributor Author

QIO works for both types of D1 board that I have - the other thing is if the early versions didn't have D2/D3 connected - that schematic is marked as rev 3.0.0.
The v2.x.x D1 mini boards used ESP-12S modules, whereas v3.x.x D1 mini used the ESP8266 directly. The D1 mini PRO boards all used the ESP8266 directly too.
From what I can find. the ESP-12S probably didn't have the connections for QIO, only DIO hence the default.

@JAndrassy
Copy link
Contributor

JAndrassy commented Sep 25, 2019

two a little off topic comments :-)

what is faster? 40 MHz DIO or 26 MHz QIO?

with all this variations of only one of all the esp8266 boards it is really hard to troubleshoot on own bench, here on GitHub or help on forums or Stack Exchange

@ChocolateFrogsNuts
Copy link
Contributor Author

theoretically (without benchmarking) DIO is 2-bit, QIO is 4-bit, so 26/40*2=1.3 - 26MHz QIO is 1.3 times the speed of 40MHz DIO... adjust for overheads.

In practice, I just ran the following tests on the genuine (winbond chip) D1 mini Pro
using 100kb of /dev/urandom data, copying it 10 times each test with a 1K block size:

26MHz DIO   37559ms  rate=27.263772 bytes/ms (==K/s)
26MHz QIO   33187ms  rate=30.855455 bytes/ms (==K/s)
40MHz DIO   31451ms  rate=32.558583 bytes/ms (==K/s)
40MHz QIO   29813ms  rate=34.347432 bytes/ms (==K/s)
80MHz QIO   26317ms  rate=38.910210 bytes/ms (==K/s)

Conclusion: surprisingly little difference, with only a 30% improvement between 26MHz QIO vs 80MHz QIO despite 3x the clock.
Surprisingly 40MHz DIO is faster than 26MHz QIO, but I don't think I'll be stressing about the 5% difference.

Test code pushed to DebugTools/SPIFFS.ino at https://github.com/ChocolateFrogsNuts/ESP-DebugTools

@ChocolateFrogsNuts
Copy link
Contributor Author

for completeness...

80MHz DIO   28840ms   rate=35.506241 bytes/ms
20MHz DOUT  49556ms   rate=20.663492 bytes/ms

and with a 160MHz CPU clock

26MHz QIO  28724ms   rate=35.649631 bytes/ms
40MHz DIO  27425ms   rate=37.338195 bytes/ms
80MHz QIO  21416ms   rate=47.814718 bytes/ms

so unless you're running 160MHz CPU with 80MHz QIO flash, don't stress about the flash mode and clock too much :-)

@ChocolateFrogsNuts
Copy link
Contributor Author

Aaaannd after realising the main limiting factor was the write speed of the flash which has nothing to do with read speed, here are the read-only results....

CPU, Flash, Time, Rate (bytes/ms)
 80, 20 DOUT, 1255ms, 815.936255
 80, 26 DIO, 1066ms, 960.600375
 80, 40 DIO, 919ms, 1114.254625
 80, 26 QIO, 893ms, 1146.696529
 80, 40 QIO, 804ms, 1273.631841
 80, 80 DIO, 787ms, 1301.143583
 80, 80 QIO, 730ms, 1402.739726
160, 40 DIO, 635ms, 1612.598425
160, 26 QIO, 600ms, 1706.666667
160, 80 QIO, 445ms, 2301.123596

Interesting that 26MHz QIO actually is faster than 40MHz DIO for read, but again 3% isn't anything to be excited/worried about.
Much the same conclusions though - unless you're running 160MHz CPU with 80MHz QIO flash, don't stress about the flash mode and speed to much.

@ChocolateFrogsNuts
Copy link
Contributor Author

For the people about to say "you just benchmarked SPIFFS with all its overheads"....
Results for ESP.flashRead direct call, same amount of data.

CPU, Flash, Time, Rate (bytes/ms)
 80, 20 DOUT, 376ms, 2832.340426
 80, 26 DIO, 285ms, 3736.701754
 80, 40 DIO, 214ms, 4976.448598
 80, 26 QIO, 200ms, 5324.800000
160, 40 DIO, 184ms, 5787.826087
160, 26 QIO, 165ms, 6454.303030
 80, 40 QIO, 157ms, 6783.184713
 80, 80 DIO, 150ms, 7099.733333
 80, 80 QIO, 122ms, 8729.180328
160, 80 QIO, 92ms, 11575.652174

Looking more like we might expect, but only increases the difference between 40MHz DIO and 26MHz QIO up to 6% on 80MHz CPU, or 11% on 160MHz CPU.
Still need to run 80MHz flash to get any real jump in performance.

Test code in DebugTools/flash.ino in the above repository.

@ChocolateFrogsNuts
Copy link
Contributor Author

Hmm, XMC chips default to 75% drive on their outputs, but can be switched up to 100% by setting SR3:5,6 to 1,1. It's likely this will get them operating at full speed... I just need to find the correct way to send the SPI commands needed to the chip.

@ChocolateFrogsNuts
Copy link
Contributor Author

ChocolateFrogsNuts commented Sep 27, 2019

Well that was easier than I thought....
Currently testing an XMC flash that was unstable at 40MHz, but is now stable at 80MHz with 100% drive on the outputs.

To identify an XMC chip, run 'esptool.py flash_id'
and look for 'Manufacturer: 20'
Then run 'esptool.py read_flash_status --bytes 3'
and you should see 'Status value: 0x400000'
which means the chip is configured for the default 75% drive on outputs.

To enable 100% drive until power is removed:
esptool.py write_flash_status --bytes 3 0x600000

To enable 100% drive permanently:
esptool.py write_flash_status --non-volatile --bytes 3 0x600000

And there you have it, your XMC flash chip should be just as good as the Winbond ones.
As a bonus, you have the option to leave it at 75% and run the lower clock speed if you want to save on power.

EDIT: the XMC chips are rated for 104MHz :-)

@d-a-v
Copy link
Collaborator

d-a-v commented Sep 27, 2019

Wow thanks !
This deserves a nice python script that can be triggered from the IDE

@ChocolateFrogsNuts
Copy link
Contributor Author

hehe, I was kinda thinking of making it automatically set 100% output if an XMC chip at >26MHz is found at boot. By only setting it in the volatile register, it will automatically save power or not as required. Setting it permanently would still work too.
I need to look at how esptool does it though as SPI_read/write_status in ROM only do 2 registers, not 3.

@devyte
Copy link
Collaborator

devyte commented Sep 27, 2019

Guys, this is super interesting, but how about opening a new issue and discussing there? This one was already addressed, is closed, and is super long.

@ChocolateFrogsNuts a thought before you jump in: there may be other flash chips that could benefit. In @igrr's words, what you found could be described as a flash chip "quirk", similar to the puya quirk, which requires special code to get correct operation. It may make sense to structure this type of code in a way that minimizes invasiveness and allows easier additions in the future, whether to add other manufacturers to a quirk or add other quirks.

@ChocolateFrogsNuts
Copy link
Contributor Author

erk! 2 problems :

  • It can't be made permanent in the chip - it seems that status register exists only as a volatile register (reset at next power on).
  • So far, I haven't been able to replicate what esptool does to set the register. I'm doing the same thing with all the registers, but I'm unable to read SR3.

It's not like the stub esptool is using is initialising much either. It does call rom functions SelectSpiFunction() and spi_flash_attach() but I would think that has already been done by the time sketch code is running. Calling them again results in a crash further down the code, which I expected...

So close and yet so far :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants