Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESP8266] WDT reset issue #392

Closed
1 task
SvenLuebke opened this issue Nov 5, 2022 · 34 comments
Closed
1 task

[ESP8266] WDT reset issue #392

SvenLuebke opened this issue Nov 5, 2022 · 34 comments
Assignees
Labels
bug Something isn't working fixed dev fixed

Comments

@SvenLuebke
Copy link

SvenLuebke commented Nov 5, 2022

Platform

ESP8266

Model name

LoLin NodeMCU V3 (AliExpr.) 4MB

nRF24L01+ Module

nRF24L01+ plus

Antenna

external antenna

Power Stabilization

nothing

Connection diagram

Connection diagram I used:

nRF24L01+ Pin ESP8266/32 GPIO
Pin 1 GND [] GND
Pin 2 +3.3V +3.3V
Pin 3 CE GPIO4 CE
Pin 4 CSN GPIO15 CS
Pin 5 SCK GPIO14 SCLK
Pin 6 MOSI GPIO13 MOSI
Pin 7 MISO GPIO12 MISO
Pin 8 IRQ GPIO5 IRQ

Connection picture

  • I will attach/upload an Image of my wiring

Version

0.5.28

Github Hash

2e08ee0

Build & Flash Method

ESP Tools (flash)

Desktop

Linux

Setup

Device Host Name

- Device Name: AHOY-DTU

WiFi

- SSID: YOUR_WIFI_SSID *don't paste here*
- Password: YOUR_WIFI_PWD *don't paste here*

Inverter

Inverter 0

- Address: 116181853696
- Name: HM-1500
- Active Power Limit: 65535
- Active Power Limit Control Type: no powerlimit
- Max Module Power (Wp): 410
- Module Name: link, rech

General

- Interval [s]: 10
- Max retries per Payload: 5

NTP Server

- NTP Server / IP: pool.ntp.org (tried also fritz.box)
- NTP Port: 123

MQTT

- Broker / Server IP: 
- Port: 1883
- Username (optional): 
- Password (optional): 
- Topic: inverter

System Config

Pinout (Wemos)

- CS: D8 (GPIO15)
- CE: D2 (GPIO4)
- IRQ: D1 (GPIO5)

Radio (NRF24L01+)

- Amplifier Power Level: LOW

Serial Console

- print inverter data: [x]
- Serial Debug: [x]
- Interval [s]: 2
  • Reboot device after successful save: [x]
  • SAVE

Debug Serial Log output

I: procPyld: cmd:  11
I: procPyld: txid: 0x95
I: Payload (62): 00 01 00 09 00 01 00 01 00 00 00 00 00 00 00 C8 00 00 00 03 00 00 00 00 01 71 00 02 00 04 00 09 00 10 00 00 00 59 00 00 2A 1D 00 03 02 E3 09 37 13 88 00 00 00 D9 00 00 00 00 00 C2 00 10 
I: resetPayload: id: 0
I: Requesting Inverter SN 116181853696
I: enqueuedCmd: 11
I: sendTimePacket
I: TX 27B Ch40 | 15 81 85 36 96 81 72 89 27 80 0B 00 63 66 53 6E 00 00 00 10 00 00 00 00 28 13 74 
I: RX 27B Ch3 | 95 81 85 36 96 81 85 36 96 01 00 01 00 09 00 01 00 01 00 00 00 00 00 00 00 C8 54 
I: RX 27B Ch3 | 95 81 85 36 96 81 85 36 96 02 00 00 00 03 00 00 00 00 01 71 00 02 00 04 00 09 EB 
I: RX 27B Ch3 | 95 81 85 36 96 81 85 36 96 84 13 8A 00 00 00 D5 00 00 00 00 00 C1 00 10 82 13 1D 
W: while retrieving data: Frame 3 missing: Request Retransmit
I: TX 11B Ch61 | 15 81 85 36 96 81 72 89 27 83 6F 
I: RX 27B Ch3 | 95 81 85 36 96 81 85 36 96 03 00 10 00 00 00 59 00 00 2A 1D 00 03 02 E3 09 20 23 
I: RX 27B Ch23 | 95 81 85 36 96 81 85 36 96 03 00 10 00 00 00 59 00 00 2A 1D 00 03 02 E3 09 20 23 

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 3460, room 16 
tail 4
chksum 0xcc
load 0x3fff20b8, len 40, room 4 
tail 4
chksum 0xc9
csum 0xc9
v0006a2b0
~ld
I: resetPayload: id: 0
I: resetPayload: id: 0
I: resetPayload: id: 0
I: resetPayload: id: 0
I: connect to network 'MYWIFINETWORK' ...
...........
I: 

----------------------------------------
I: Welcome to AHOY!
I: 
point your browser to http://192.168.1.56
I: to configure your device
I: ----------------------------------------

I: RF24 Amp Pwr: RF24_PA_I: LOW
I: Radio Config:
SPI Frequency= 1 Mhz
Channel= 3 (~ 2403 MHz)
Model= nRF24L01+
RF Data Rate= 250 KBPS
RF Power Amplifier= PA_LOW
RF Low Noise Amplifier= Enabled
CRC Length= 16 bits
Address Length= 5 bytes
Static Payload Length= 32 bytes
Auto Retry Delay= 250 microseconds
Auto Retry Attempts= 0 maximum
Packets lost on
    current channel= 0
Retry attempts made for
    last transmission= 5
Multicast= Disabled
Custom ACK Payload= Disabled
Dynamic Payloads= Enabled
Auto Acknowledgment= Disabled
Primary Mode= RX
TX address= 0xdeadbeef01
pipe 0 (closed) bound= 0xdeadbeef01
pipe 1 ( open ) bound= 0x2789728101
pipe 2 (closed) bound= 0xc3
pipe 3 (closed) bound= 0xc4
pipe 4 (closed) bound= 0xc5
pipe 5 (closed) bound= 0xc6
I: [NTP]: 2022-11-05 12:13:45 UTC
I: enqueued cmd failed/timeout
I: Inverter #0 I: no Payload received! (retransmits: 0)
I: resetPayload: id: 0
I: Requesting Inverter SN 116181853696
I: enqueuedCmd: 11
I: enqueuedCmd: 1
I: enqueuedCmd: 5
I: sendTimePacket
I: TX 27B Ch40 | 15 81 85 36 96 81 72 89 27 80 0B 00 63 66 53 79 00 00 00 00 00 00 00 00 1B 39 6A

Error description

Approx. every hour my ESP8266 flashed with AhoyDTU does a reset. See logs! Might this be a hardware issue? Any other one experiencing this? Beside that, the software runs quite nicely and I am very satisfied!

Thanks!

@SvenLuebke SvenLuebke added the bug Something isn't working label Nov 5, 2022
@lumapu
Copy link
Owner

lumapu commented Nov 5, 2022

Do you have a stack trace of the wdt? On the attached log I can't see any reason.
Have you tried to use a different port for the IRQ

As I can see in your settings both of the intervals a shorter than default. Can you increase them to verify if they causing the wdt?

@SvenLuebke
Copy link
Author

Unfortunately I have no stack trace! I remember that I saw it in another project. Shouldn't this be a standard output? Probably wrong baud rate?

I tried some things to get rid of this issue. I just now changed the IRQ pin again...let's see. The change of the intervals was something I did to check whether a buffer overrun appeared. I reverted the values back to original.

I saw yesterday, that the system survives more than 9h in night time mode, without any SPI and reduced serial traffic, so hardware looks OK.

@lumapu
Copy link
Owner

lumapu commented Nov 6, 2022

If the log off your initial post contains the complete log then there ist no software issue.
The baud rate was ok, all the logs are printed with the same baud.
Hopefully the IRQ Pin change helps.

@SvenLuebke
Copy link
Author

Of course it's not the complete log, that would be ~800KB of data, but a complete log of the time, that issue happened (before and after). Hope that is fine?! I remember, that I selected "Printable output" for session logging in Putty. I changed that to "All session output" now to see whether other output (with a different baud rate) is generated.

Regarding the baudrate I remember, that the ESP8266 changes the baudrate directly after reset to 74880. But I guess, the missing stack output is not a core function but some kind of user software function.

The issue just happened again, so change of IRQ pin didn't help.

Just for fun I will try to flash an own build, but I don't think this will change something...

@lumapu
Copy link
Owner

lumapu commented Nov 6, 2022

Are you using MQTT? As I can see in your settings MQTT seems to be not set.
I faced an issue during testing yesterday an fixed it in the latest development build.
Can you try to install that version to your ESP?

64fb587 from build Action wild be firmware version 0.5.32

@stefan123t
Copy link
Collaborator

@lumapu we had several (at least two or three on the Discord) reports of users with no MQTT and experiencing such WDT timer issues repeatedly in versions prior to 0.5.32. So if you fixed anything in this regards, I would say this is a strong case for @SvenLuebke and others to retry with the latest development build / release. Thanks!

@lumapu
Copy link
Owner

lumapu commented Nov 7, 2022

@stefan123t yes during development I saw an issue regarding MQTT. It happend directly at boot and endet in a boot loop.
Maybe it helps others to get their system more stable starting with version 0.5.32

@SvenLuebke
Copy link
Author

Unfortunately this proposed version didn't help. I also activated MQTT, which also didn't help. The WDT resets were still happening. After flashing the 0.5.32 I tried
https://github.com/lumapu/ahoy/commits/4093be7
which seems to be stable now: Uptime: 4 Days, 12:16:05

@lumapu
Copy link
Owner

lumapu commented Nov 20, 2022

so it could be closed now? can you verify the release version?

@SvenLuebke
Copy link
Author

SvenLuebke commented Nov 22, 2022

Let's wait another day. I flashed 4c52e9c before...which seem to be stable, but I wasn't able to update the system via web update to dec333f. It just said "failed" and 0.5.40 started again (althoug no reboot happened according to the uptime).

I flashed it via USB serial and it seems to be working for now (Uptime: 0 Days, 04:47:02). The WDT was rebooting the system before only when NRF24L01 traffic was happening.

@SvenLuebke
Copy link
Author

dec333f restarted yesterday at ~10 PM (when no traffic happened) and some seconds ago. 4c52e9c was more stable for some reason. But are there so many differences? I guess not, right?

@roku133
Copy link
Contributor

roku133 commented Nov 24, 2022

I cannot confirm stability issues using dec333f.
My ESP8266 based DTU (however, CE and IRQ swapped) is stable since more than four days now.
👍
Perhaps it makes sense to change the power supply. Capacitor stabilizing 3.3 V power source is used?

@stefan123t
Copy link
Collaborator

@SvenLuebke do you have the option to change the Power Source and/or Micro USB cable.
It has been reported that Power Supply is a major issue for WDTs on ESPs in general.

Here is a blog post from a Makerlab in Hannover about tracing the ESP power supply using an oscilloscope with revealing results:
https://arduino-hannover.de/2018/07/25/die-tuecken-der-esp32-stromversorgung/

@roku133
Copy link
Contributor

roku133 commented Nov 24, 2022

@SvenLuebke A power bank providing a USB 5 V output may also be helpful to check power adapter issues.

@stefan123t stefan123t added the stale closed to no response / progress label Jan 12, 2023
@stefan123t
Copy link
Collaborator

@SvenLuebke can you update on stability with latest development or release version ?

@SvenLuebke
Copy link
Author

SvenLuebke commented Jan 27, 2023

Hi!

@stefan123t I exchanged

  • the ESP8266 mainboard (keeping the type of board the same)
  • power source
  • cable

and soldered a 2200µF capacitor to the 3.3V power pins. The software still reboots as soon as SPI traffic is happening. Really strange! This is happening with all the versions I tested up to 0.5.76 .

@stefan123t
Copy link
Collaborator

Why do you use a 2200uF capacitor. We encourage the use of a 10uF to 100uF cap for smoothing the voltage ripples and sustaining the 3.3V at the NRF module.
Yours is more than 22 times as large this may be the reason too ?

@stefan123t stefan123t removed the stale closed to no response / progress label Jan 27, 2023
@stefan123t stefan123t reopened this Jan 27, 2023
@SvenLuebke
Copy link
Author

To be honest this capacitor was available in my box. Do you think a 2200µF cap will smooth the voltage worse than a 100µF one? It might be a little bit slower. I thought it's for stabilizing the 3.3 V power of the ESP8266. I'll try to find a 10µF one...and will also attach a ceramic cap.

@SvenLuebke
Copy link
Author

I just saw a new reboot_reason (copied the rest for some system information):

sdk
2.2.2-dev(38a443e)
cpu_freq
80
heap_free
16720
sketch_used
486
version
0.5.66
wifi_rssi
-53
ts_uptime
31
esp_type
ESP8266
core_version
3.0.2
flash_size
4096
heap_frag
14
max_free_blk
7080
reboot_reason
Software/System restart
Radio
nrf24l01+
is connected
Datarate
250 kbps
Power Level
MIN

I didn't trigger the "Software/System restart". What is the reason for that? I tried to open the "live" website and then it restarted.

@Argafal
Copy link
Contributor

Argafal commented Feb 26, 2023

Software/system restart could be an indication of a NullPointerException or OOM. Both I had also seen with 0.5.66 and i documented them in other bug reports. I believe most issues I had documented are fixed in 0.5.92, have you tried it already?

Having said all that, without a stack trace I think it's just guessing. What's your serial output look like when the reboot happens?

@SvenLuebke
Copy link
Author

SvenLuebke commented Mar 2, 2023

Hey @Argafal
Thanks for your message! I tried different versions after 0.5.66...and they behaved even more strange:
After some uptime nearly all pages couldn't be displayed anymore. The menu bar on the left contained only one entry (don't remember which one) and the rest vanished. Page refresh often took more than 10s. Tried some things and then I flashed back to 0.5.66 which restarts ~3 times a day but doesn't show this page vanishing.

I just installed 0.5.92...looks much more better, but I have to wait for the sun.

BTW: I noticed that WiFi between ESP8266 and my router is not stable (also have this with my laptop). There are more than 20 reconnects a day. Could that lead to my reported reset behaviour?

@Argafal
Copy link
Contributor

Argafal commented Mar 3, 2023

The first thing you describe about the webUI sounds like issue #660. Is that what it looks like? This should be much better again in 0.5.92/93.

I would hope that an unstable wifi connection would not cause random reboots of ahoy. I don't think it does. But without a stack trace it is pure guess work. So I think you need to find a way to record a stack trace if you want to look into this further. For that I would connect the esp via USB to a computer, that might be the easiest way.

@SvenLuebke
Copy link
Author

SvenLuebke commented Mar 3, 2023

Yes, that was exactly my issue! I didn't want to create another issue for this, because I thought, I'm the only one having this issue. Nice, thank you!

I guess these reconnect messages were just a consequence of hourly resets...that's what I think now. Because with 0.5.92 the disconnects are vanished. Yes...it really looks promising: Uptime: 0 Days, 14:37:37

@SvenLuebke
Copy link
Author

Yippee! Uptime: 1 Day, 16:31:07...never saw "1 Day" before...if it reaches 4 I guess we can close the issue.

@lumapu
Copy link
Owner

lumapu commented Mar 8, 2023

does it reached 4?

@SvenLuebke
Copy link
Author

SvenLuebke commented Mar 9, 2023

Yes, it reached 4 days and ~18 hours, then it resetted again, but that's long enough for me. After 3 days i got a similar behaviour to this #660 again. I had to press refresh two or three times and then it worked again.

Shall I close the ticket?

@lumapu
Copy link
Owner

lumapu commented Mar 9, 2023

cool, seems that we fixed something. I will close this issue with the next release.

@lumapu lumapu added the fixed dev fixed label Mar 9, 2023
@lumapu lumapu assigned lumapu and unassigned stefan123t Mar 9, 2023
@SvenLuebke
Copy link
Author

But I'm still thinking about why I was - more or less - the only one with this issue:
Are some versions of the ESP8266 less stable? Are some RAM cells (in some memory area...for example at the end) not stable or dead? Are some PCBs less stable? Currently I don't have an explanation for that.

@Argafal
Copy link
Contributor

Argafal commented Mar 12, 2023

I don't think you are the only one. I have opened a few issues reporting reboots and/or exceptions running ahoy on ESP8266. As to why that doesn't happen to everyone on an ESP8266, I don't know.

My current status: With the current dev 0.5.98, ahoy runs stable for me as long as I don't use the WebUI. If I use the WebUI it occasionally reboots.

@lumapu
Copy link
Owner

lumapu commented Mar 13, 2023

do you have a capacitor placed to your circuit? I had a very unstable ESP8266 which became stable at the moment where I placed a capacitor next to its 3.3V pin

@SvenLuebke
Copy link
Author

My current status: With the current dev 0.5.98, ahoy runs stable for me as long as I don't use the WebUI. If I use the WebUI it occasionally reboots.

Same here...it didn't survive a day.

do you have a capacitor placed to your circuit?

I have two of them connected to 3.3V, one small and one big one. But that didn't change anything regarding the reset behaviour.

@SvenLuebke
Copy link
Author

I tried 0.5.104 yesterday and it didn't survive 20 minutes. So, for now 0.5.96 is the latest most stable version for me.

@lumapu
Copy link
Owner

lumapu commented Mar 24, 2023

@SvenLuebke
kannst du mir bitte kurz dein Setup auflisten (Anzahl Inverter, Esp-Typ, Kondensator, webIf genutzt oder nicht, Heap-Fragmentation)
Gibt es Anzeichen warum der ESP die Krätsche macht?

@Argafal
Copy link
Contributor

Argafal commented Mar 24, 2023

@SvenLuebke Und kannst du bitte auch erwähnen, ob du MQTT benutzt oder nicht, in welchem Interval die Wechselrichter abgefragt werden (siehe Einstellungen) und in welchem Interval MQTT verschickt wird (siehe Einstellungen)? Danke.

@lumapu lumapu closed this as completed Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed dev fixed
Projects
None yet
Development

No branches or pull requests

5 participants