Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic() called from ISR causes Hardware WDT #6283

Closed
6 tasks done
mhightower83 opened this issue Jul 11, 2019 · 5 comments · Fixed by #6288
Closed
6 tasks done

panic() called from ISR causes Hardware WDT #6283

mhightower83 opened this issue Jul 11, 2019 · 5 comments · Fixed by #6288

Comments

@mhightower83
Copy link
Contributor

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in the current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-12
  • Core Version: [5a47cab]
  • Development Env: Arduino IDE
  • Operating System: Ubuntu

Settings in IDE

  • Module: Adafruit HUZZAH
  • Flash Mode: qio
  • Flash Size: 4MB
  • lwip Variant: v2 Lower Memory
  • Reset Method: nodemcu
  • Flash Frequency: 40Mhz
  • CPU Frequency: 80Mhz
  • Upload Using: SERIAL
  • Upload Speed: 115200 (serial upload only)

Problem Description

I have found, that there are some actions that an ISR can take, that can unexpectedly result in a "Hardware WDT". Calls made from an ISR to panic(), abort(), assert(), or the __unhandled_exception() will result in: no message, no stack trace, just:

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset

Obviously, it would be best to avoid these in our ISRs. However, it might not be obvious we are doing these calls from an ISR. Take the case of running a build with just "Debug: Serial" selected in the IDE.
This will enable the UMM_POISON, build flag. If an ISR steps on the poison, then later does a free(), a message is printed (with non-IRAM based printf() function) then an attempt is made within umm_malloc.cpp to call panic(). When I tested this the umm poison message printed; however, the panic() details never appeared. Alter a long period, the "Hardware WDT" event appeared.

What I think I am seeing is, calls from an ISR that reach a panic(), abort(), assert(), or the __unhandled_exception() will eventually arrive at raise_exception() in module core_postmortem.cpp. Then onto the inline assembly for "syscall". After a long period of silence, I see the "Hardware WDT" event. My test method at that time was less than ideal because I was using a printf function that was not in IRAM. However, it did print all the way up to the "syscall", before the WDT appeared and nothing at __wrap_system_restart_local().

It is interesting to find that there are causes for the "Hardware WDT", that are not caused by being stuck in a loop.

In Arduino Core 2.5.0 and up, a failed "new" operator without (std::nothrow) in an ISR will call abort() resulting in WDT. I am concerned that there might be a lot more cases out there.

This one is unconfirmed however looks like a good one to verify:

extern "C" void __yield() {
if (cont_can_yield(g_pcont)) {
esp_schedule();
esp_yield_within_cont();
}
else {
panic();
}
}
extern "C" void yield(void) __attribute__ ((weak, alias("__yield")));

bool ICACHE_RAM_ATTR cont_can_yield(cont_t* cont) {
return !ETS_INTR_WITHINISR() &&
cont->pc_ret != 0 && cont->pc_yield == 0;
}

It looks like a call from an ISR, that reaches yield(), would call panic() and the message would never be seen.

I could say more but I'll wait and see what you say, just in case I made a wrong turn in my assessment of this issue.

MCVE Sketch

#include <Arduino.h>
#include <ESP8266WiFi.h>
#include <interrupts.h>
#include <assert.h>

#define CC4US_DELAY(us)   ((us) * clockCyclesPerMicrosecond())

void ICACHE_RAM_ATTR timer0_handler(void) {
  timer0_detachInterrupt();
  // From an ISR context these calls will fail with a hardware WDT
  // No event related information or stack trace is printed.
  panic();
  // abort();
  // assert(1==0);
}

void setup(void) {
  Serial.begin(115200);
  delay(20);
  WiFi.mode(WIFI_OFF);
  Serial.printf("\n\nUp and running.\n");
  {
    AutoInterruptLock(15);
    timer0_isr_init();
    timer0_attachInterrupt(&timer0_handler);
    timer0_write(ESP.getCycleCount() + CC4US_DELAY(150));
  }
}

void loop() {
}

Debug Messages

Up and running.

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v5a47cab7
~ld
@d-a-v
Copy link
Collaborator

d-a-v commented Jul 11, 2019

After a quick look it looks like asm("syscall") line does nothing and WDT is triggered by the following one supposed to be not reached.

static void raise_exception() {
__asm__ __volatile__ ("syscall");
while (1); // never reached, needed to satisfy "noreturn" attribute

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 11, 2019

Replacing __asm__ __volatile__ ("syscall"); by *((int*)0)=0; works better (an illegal address exception is displayed before the panic message though).

Never dug into this code before.
I am now trying to understand where the exception vector (called by syscall) is set up.

(edit0: ::printf("x1\n") or ets_printf("x2\n") after syscall and before while(1) are shown)
(edit1: going back to core-2.3.0, that was the way used to cause the exception)
(edit2: #4482 changed it to asm("syscall"))
(edit3: @igrr maybe you can tell where the exception vector called by syscall is supposed to be set up or where it points to)
(edit4: all of the above only when called from an ISR)

@mhightower83
Copy link
Contributor Author

FWIW: This is logic flow that I think I see at raise_exception():

For ps.intlevel = 0 in non ISR context:

  • syscall - has not visable effect - call and returns.
  • while(1) - sit and wait for soft wdt
  • hit soft wdt - system passes control onto __wrap_system_restart_local()
  • postmortem prints everyone is happy

For ps.intlevel != 0 and from a hardware ISR context:

  • syscall - has not visable effect - call and returns.
  • while(1) - sit and wait for soft wdt
  • hardware WDT stikes, __wrap_system_restart_local() is never called
  • postmortem never prints everyone is sad

Assumption: Between ps.intlevel !=0 and hardware interrupt priority soft WDT
cannot be serviced. Yea, I tried forcing ps.intlevel to 0 to see.

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 12, 2019

@mhightower83 #6172 is a first fix for this issue.
At least it solves displayed message issues on user exceptions from ISR.
It works on the important hardware exceptions cases with gdb (cont/user or sys contexts)
There may be room for improvement, at least on gdb side.

@mhightower83
Copy link
Contributor Author

@d-a-v Thanks! I think this is going to help out a lot of people.
I think this also applies to the non-ISR case, where a lock (exp. rsil(15)) is active and a user exception happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants