Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDT reset in ws2812fx.service() #151

Closed
sprior opened this issue Jan 2, 2019 · 39 comments
Closed

WDT reset in ws2812fx.service() #151

sprior opened this issue Jan 2, 2019 · 39 comments

Comments

@sprior
Copy link

sprior commented Jan 2, 2019

mqtt_neopixel.zip

The attached sketch intermittently experiences wdt resets during the ws2812fx.service() as indicated by Serial.println's just before and after the call. It'll run for an hour or more and then experience the issue.

This sketch is running ArduinoOTA, webserver, MQTT publish/subscribe, ws2812fx. When it boots it sends a MQTT message to request initialization, here are the messages it receives (one message per line).

[{"args":{"b":50},"f":"setBrightness"}]
[{"args":{"color":16776960,"mode":11,"n":0,"reverse":"F","speed":1000,"start":0,"stop":0},"f":"setSegment"}]
[{"args":{"color":255,"mode":42,"n":1,"reverse":"F","speed":1000,"start":1,"stop":8},"f":"setSegment"}]
[{"args":{"color":65535,"mode":42,"n":2,"reverse":"T","speed":1500,"start":9,"stop":20},"f":"setSegment"}]
[{"args":{"color":65280,"mode":42,"n":3,"reverse":"F","speed":2000,"start":21,"stop":36},"f":"setSegment"}]
[{"args":{"color":16776960,"mode":42,"n":4,"reverse":"T","speed":4000,"start":37,"stop":60},"f":"setSegment"}]
[{"args":{"color":16711680,"mode":42,"n":5,"reverse":"F","speed":6000,"start":61,"stop":92},"f":"setSegment"}]
[{"f":"start"}]

And these settings run for quite a while before it fails.

So it is an extended amount of time between the receipt of the last MQTT message and when the WDT reset happens.

The ESP8266 module and the Neopixel rings are powered by an external 5A regulated power supply and not via the USB connection, so I don't think power is the issue (but not 100% sure).

@sprior
Copy link
Author

sprior commented Jan 2, 2019

I discussed this with a friend and he suggested that my current main loop has no calls to sleep() and I may be overheating the CPU as a result. If true that would certainly explain why this is an intermittent issue. I'll experiment with adding a minimal sleep() at the end of the loop and see if this makes a difference.

@moose4lord
Copy link
Collaborator

I loaded your sketch into my NodeMCU test setup and tweaked it a bit to use my MQTT broker at CloudMQTT. I see the same intermittent WDT resets as you :(. So I don't think there's anything wrong with your hardware.

The serial port doesn't show a stack trace when the WDT triggers, so no help there. I see the MQTT client disconnecting, then reconnecting, periodically for no particular reason, and when this happens the ESP8266 freezes for a couple seconds (the ws2812 LEDs stop animating). It may be that occasionally the MQTT disconnect/reconnect sequence (along with everything else going on in the loop() section) is taking too long, causing the WDT to trigger.

I'll try sprinkling a few yield(); statements in the loop() section to see if that helps.

@sprior
Copy link
Author

sprior commented Jan 3, 2019

Thanks for all the work! Adding a simple sleep(30) at the end of my loop did not solve the problem.

Just for some background on the code, I started with the Octoprint Monitor code by Qrome: https://github.com/Qrome/printer-monitor

which I initially liked as a template for future code, but decided to refactor to break into more understandable chunks. Then I decided to build the Neopixel Christmas Tree (https://www.nutsvolts.com/magazine/article/build-a-neopixel-led-tree), but with code based on my refactored template. I've already got other projects based on this template that run LCD/OLED screens as well as temperature sensors. I'm also planning to use this for other NeoPixel projects.

@sprior
Copy link
Author

sprior commented Jan 3, 2019

I previously added calls to ESP.wdtFeed() between the major functions called in loop(), but that didn't solve the problem. The other thing that I've noticed is that sometimes the device reboots and can reinitialize and keep going, and sometimes it just hangs (NeoPixels frozen and it does not respond to web page requests) so the CPU is stuck, not just the NeoPixels.

@moose4lord
Copy link
Collaborator

Well the added yield(); statements didn't help. :(
I also see the two failure modes you describe, sometimes the WDT triggers, then the ESP8266 reboots and starts running the sketch again. Other sometimes the WDT triggers and the CPU just freezes.

Not sure what to try next. I'll have to give it some thought.

@sprior
Copy link
Author

sprior commented Jan 3, 2019

At least I'm not crazy (or this isn't evidence of same).

@sprior
Copy link
Author

sprior commented Jan 3, 2019

One thought was a memory leak somewhere, but I would think that would be less random than this appears to be.

@sprior
Copy link
Author

sprior commented Jan 7, 2019

I want to reverse my previous statement that adding sleep(30) to the bottom of the loop didn't help - it appears to be much more stable now. I have seen it hang once or twice a few minutes after a power up, but I haven't seen it spontaneously reboot at all.

@moose4lord
Copy link
Collaborator

This one has me stumped. I've been looking into it off and on all week, but haven't really figured out what's going on. I thought it might be heat related too, but tried blasting it with a hair dryer and that didn't make it any worse, so I've abandoned that idea.

The only way I've been able to get your sketch to run overnight, without a wdt reset, is to turn off Wifi with these statements:

WiFi.mode( WIFI_OFF ); // disable WiFi
WiFi.forceSleepBegin();
delay(1);

Otherwise, sooner or later, after several minutes or several hours, it throws a wdt reset.
Sorry I couldn't be more help.

@sprior
Copy link
Author

sprior commented Jan 8, 2019

One thing I'm curious about that may be related - I've noticed that with my sketch the blue LED on the ESP8266 module itself (not the extra LED on the NodeMCU part of the module) stays lit continuously, but on any other sketch which uses all the same code EXCEPT the NeoPixel part does not light up that LED. I don't believe the pin I'm using to drive the NeoPixels is the same one that's connected to that LED so I've thought it strange.

On your test rig are you seeing that same behavior? It would indicate that something unexpected is accessing that LED.

I haven't tried running my NeoPixel tree overnight, but in the last few days since I added the sleep(30) in my loop I haven't seen a reset over several hours of runtime.

@moose4lord
Copy link
Collaborator

The LED on the ESP module is wired to GPIO2, which is the same as the "D4" pin definition used in your sketch. The nodemcu pin definitions are defined here. So the pin used to drive your WS2812s is also driving the LED. That shouldn't be a problem though.

@sprior
Copy link
Author

sprior commented Jan 22, 2019

I hooked up the ESP8266 to a Raspberry Pi to act as a serial logger to try to diagnose this. I did catch one reboot which seems to have taken place during the call to Adafruit_NeoPixel::show() when interrupts were disabled, but unless the pixel config was getting corrupted that didn't make much sense to me. I also had to slow down the esp8266 serial port speed to 19200 and I haven't caught a crash for several days. I'm doing experiments on other devices for more data.

Since you had a test rig set up and recreating the issue I'd be interested to see if you find a difference with a lower serial port data rate.

@moose4lord
Copy link
Collaborator

I have been tinkering with this problem as well. Like you, I couldn't get the Serial port to provide any meaningful debug info. It would either swamp my computer with data and lock up the port, or slow down the loop() to the point where the wdt resets disappeared. I wired up a 7-segment display thinking I could latch data into it and use it to narrow down the spot in the code that was triggering the wdt, but its serial interface was too slow and the wdt resets disappeared. :(

I thought I was on to something when I realized removing the first LED (the one that runs mode 11) from the setup caused the wdt resets to disappear. That mode refreshes the LED strip a lot, to the point where the LED strip updates with every loop() iteration. I thought "Eureka! That's it! The sketch is spending too much time (with interrupts disabled) updating the strip." If that were true, then the problem should go away by simply increasing the speed parameter for that one LED, slowing down the strip refresh rate, and allow the ESP8266 time to do other things besides refreshing the LEDs. Well, that didn't work. I cranked the speed parameter for that one LED to 25000 and it still threw wdt resets. Bah!

So there's something else going on that I don't understand. I think my next step is to wire up a high speed shift register that I can use to latch data when the sketch is running at full speed. Hopefully that will trap when the wdt reset occurs and provide some useful debug info. I don't spend a lot of time on this, but I do come back to it periodically when I have some free time.

@mabe42
Copy link

mabe42 commented Jan 23, 2019

Well, I saw that you are using pubsubclient. I remember that I had similar problems with a different project. As far as I remember, the problem is related to mqtt.loop() not being called often enough. This fits also with the disconnections from the broker.

However, I just don't remember where I found the solution. Maybe just somewhere around pubsubclient or perhaps on stackoverflow.

Good luck!

@sprior
Copy link
Author

sprior commented Jan 23, 2019

@mabe42 If you happen to come across that solution I'd be grateful if you post it here. I looked at other MQTT implementations and none of them seem as stable/up to date as pubsubclient. It's interesting if mqtt.loop() isn't called often enough in my case because this main loop pretty much calls the 4 various service routines (OTA, web, pubsub, ws2812fx) and does nothing else. Unless I call mqtt.loop() more than once per main loop I don't see how to call it more often.

@mabe42
Copy link

mabe42 commented Jan 23, 2019

Well, I'm sorry. I think I can't help you furhter. I looked through some notes but apparently I have not kept one on this topic. Well, I also use the pubsubclient library and for my little projects it works ok. I'm only having trouble in places where my WiFi is weak.

I had a quick look if I could find the loop topic again. Maybe it was this one:
knolleary/pubsubclient#372

However, there seems to be contradictory information:
knolleary/pubsubclient#508

Hope you find the solution to your nasty problem!

@sprior
Copy link
Author

sprior commented Jan 26, 2019

Maybe a clue, but I've also noticed that some of my other devices which don't have NeoPixels are logging a fair number of reconnects to my MQTT server. The MQTT client id's I use are the MAC address of the device plus a small random suffix, so an example from my Mosquitto log:

1548473889: Socket error on client 2c:3a:e8:0b:7a:cb-973d, disconnecting.
1548473894: New connection from 192.168.0.165 on port 1883.
1548473894: New client connected from 192.168.0.165 as 2c:3a:e8:0b:7a:cb-ae6 (c1, k15).

I came across this:
knolleary/pubsubclient#417
which mentions a 15 second timeout.

I don't know if this is the cause of the issue, but it feels like I might be getting closer.

@sprior
Copy link
Author

sprior commented Jan 26, 2019

Browsing the pubsubclient source...

I see MQTT_SOCKET_TIMEOUT used in two places, in while (!_client->available()) loops in readByte and one in connect. But I only see a call to yield() in the loop in readByte. I don't understand why if it's necessary there that it isn't also necessary in the other one.

@sprior
Copy link
Author

sprior commented Jan 26, 2019

Keeping my fingers crossed that maybe it's fixed.

knolleary/pubsubclient#566

@sprior
Copy link
Author

sprior commented Jan 29, 2019

I was so hopeful that patch would be it, but my tree is still crashing. I do think the patch makes the code more correct though.

@moose4lord
Copy link
Collaborator

After some more experimenting I think the wdt reset is being triggered during the call to interrupts() in Adafruit_NeoPixel::show(). That's the call that reenables interrupts after the NeoPixel strip has been updated. From looking at the Arduino.h file, interrupts() just runs xt_rsil(0) to set the interrupt level to 0 (the corresponding nointerrupts() function runs xt_rsil(15) to set the interrupt level to 15). Seems simple enough.

I thought maybe enabling all interrupts with rsil(0), instead of only reenabling interrupts that where enabled before show() was called, may have been a problem. But I changed the nointerrupts()/interrupts() code to this form (as mentioned in Arduino.h):

uint32_t savedPS = xt_rsil(15); // disable interrupts
// do work here
xt_wsr_ps(savedPS); // restore interrupts

But those changes did not solve the wdt reset problem. Bah!
Back to the drawing board.

@haganwalker
Copy link

I also want to bring attention to this. I’m in the middle of a 200 ft installation of WS2812 LEDs on a bridge and am using this awesome library. However, the WDT pops up every now and then and completely freezes whatever is going on. If not a solution yet, is there a workaround to at least reset the device altogether to get things going again? Secondly, I think I might need an ESP32 for more memory. Has anyone used an ESP32 with this library, and if so, does this same WDT error occur?

@moose4lord
Copy link
Collaborator

Wow, 200 ft, that's impressive. How many LEDs?

I have not come up with a foolproof way to recover from these kind of WDT resets. Sometimes the WDT does it's job and restarts the ESP, but other times the WDT causes the ESP to freeze and become unresponsive. :( You might want to consider wiring a timer (like a 555) to the RST pin, and use a GPIO pin to constantly reset the timer. If the ESP freezes, it will stop constantly reseting the timer and the timer will eventually fire, triggering a hard reset. Something similar to this.

I tried to get my ESP32 working a few months ago, but had issues unrelated to the WDT problem. See #41.

@sprior
Copy link
Author

sprior commented Feb 9, 2019

I haven't looked at how difficult it would be, but I'm starting to wonder if it would be worth trying the ws2812fx code ported to run on top of a different NeoPixel library besides the Adafruit one to see if that made any difference. I believe there are a couple of alternatives.

@moose4lord
Copy link
Collaborator

moose4lord commented Feb 9, 2019

Debashish Sahu created NeoAnimationFX to merge WS2812FX with Michael Miller's NeoPixelBus library. This has a variety of methods (bit bang/DMA/UART) to create the pulse stream that updates the LEDs. That might be something worth experimenting with.

There's also the ws2812fx_dma sketch in the WS2812FX examples folder that uses DMA, instead of bit banging, to communicate with the LEDs.

@haganwalker
Copy link

It’s right at 1800 LEDs - it’s going to be a heck of a project. The city is supposed to be drilling holes through the concrete this upcoming week for wiring. I’m confident in getting it wired up correctly. My background is in EE, but I’m not so confident on the software side, and it worries me that I’m going to get this thing installed and have daily issues. Have y’all thought about porting to NeoPixelBus? I was surprised to see LadaAda from Adafruit asking for updates on some bug fixes for the ESP32 on the main NeoPixel git, going on several months - this makes me think some significant rewrite might be needed to resolve.

@sprior
Copy link
Author

sprior commented Feb 9, 2019

Adafruit seems to tends towards using Circuit Python for projects, and I've gotten the impression that ESP8266 isn't their top priority controller.

@haganwalker
Copy link

So, I believe the DMA example will solve my problem, except 1) I can't get it to compile, and 2) I can't figure out how to reference the ESP32 instead of the ESP8266.

This is the issue as-is:

Arduino: 1.8.8 (Windows 10), Board: "Heltec_WIFI_LoRa_32, 80MHz, 921600"

ws2812fx_dma:47:1: error: 'NeoEsp8266Dma800KbpsMethod' does not name a type

 NeoEsp8266Dma800KbpsMethod dma = NeoEsp8266Dma800KbpsMethod(LED_COUNT, 3);

 ^

C:\Users\HAGANW~1\AppData\Local\Temp\arduino_modified_sketch_478657\ws2812fx_dma.ino: In function 'void setup()':

ws2812fx_dma:55:3: error: 'dma' was not declared in this scope

   dma.Initialize();

   ^

C:\Users\HAGANW~1\AppData\Local\Temp\arduino_modified_sketch_478657\ws2812fx_dma.ino: In function 'void myCustomShow()':

ws2812fx_dma:72:6: error: 'dma' was not declared in this scope

   if(dma.IsReadyToUpdate()) {

      ^

exit status 1
'NeoEsp8266Dma800KbpsMethod' does not name a type

And looking at the header file in NeoPixelBus, it looks like I need to change "NeoEsp8266Dma800KbpsMethod" to "NeoEsp32I2sMethod" but that doesn't work. Can anyone point me in the right direction?

@moose4lord
Copy link
Collaborator

@haganwalker for the ESP32 you would want to change the dma class to this:

//NeoEsp8266Dma800KbpsMethod dma = NeoEsp8266Dma800KbpsMethod(LED_COUNT, 3);
NeoEsp32I2s0800KbpsMethod dma = NeoEsp32I2s0800KbpsMethod(2, LED_COUNT, 3);

Note NeoEsp32I2s0800KbpsMethod takes three parameters, not two. The first parameter is the GPIO pin that you're using to drive the LED strip. DO NOT set it to the same GPIO pin used for the WS2812FX class.

@sprior
Copy link
Author

sprior commented Feb 14, 2019

Up until now I have been using version 2.4.2 of the ESP8266 support library, but I see now that 2.5.0 has been released so I'll upgrade. Has anyone tried the new version yet to see if it improved the WDT resets?

@moose4lord
Copy link
Collaborator

I have upgraded to v2.5.0 of the ESP8266 core package. It does not fix the wdt reset issue. It also suffers from a nasty network bug that is being discussed here. You might want to stick with v2.4.2 for now.

@haganwalker
Copy link

Using the DMA method, I can confirm that the WDT_Reset issue no longer occurs. This seems to suggest that something upstream from the NeoPixel library is, in fact, the root cause. A workaround would be to use the DMA method and use either
NeoEsp8266Dma800KbpsMethod dma = NeoEsp8266Dma800KbpsMethod(LED_COUNT, 3); for an ESP8266 or NeoEsp32I2s0800KbpsMethod dma = NeoEsp32I2s0800KbpsMethod(17, LED_COUNT, 3); with an ESP32, combined with the NeoPixelBus library.

@sprior
Copy link
Author

sprior commented Feb 16, 2019

Does the DMA method interfere with the use of the web server function or AndroidOTA? There does seem to be a conflict with using the serial port (for debug info).

@tobi01001
Copy link

No it doesn't. On the contrary it separates writing the LEDs from other tasks (AFAIK for the sake of doubling the memory required for the led array - but the esp8266 has a lot to work with).

Anyway, you will be very happy with the DMA method. Speed is nearly only limited by the number of leds because of the time required to write the complete frame...

And there is no direct conflict with the serial port. It needs to run on the rx pin for sure but I am debugging all the time with the micro USB without any problems.

Be aware that during flashing all leds may light up full bright white. So either disconnect data before or be sure your power supply and wires are able to handle it. However, if Ota is running properly this drawback is gone as well... I implemented a few things to ensure Ota will be possible at least after booting but that's another story...

One thing is, that you can only use one strip on one data line.. But that's as far as I remember the case for ws2812fx anyway.

@sprior
Copy link
Author

sprior commented Feb 19, 2019

Thanks for explaining that tobi1001! Now that you point it out if DMA uses the pin used for rx and you only care about using tx for outputting debug info then you won't care. I do use OTA mostly now, but needed to know any issues in case I needed to go back after screwing up something more than usual.

@sprior
Copy link
Author

sprior commented Feb 22, 2019

A day after I switched the ws2812fx code to use DMA and NeopixelBus it hasn't crashed since, but has had periodic MQTT reconnects - not the end of the world, but I'll probably want to figure out why eventually. I'm guessing that some of the MQTT ping packets are getting delayed past their deadline which is causing it to think the connection was lost.

@generic-beat-detector
Copy link

generic-beat-detector commented Feb 14, 2021

WORKING:SOLVED Thanks to this discussion, I was able to solve the following issue:

Scenario

Using pubsubclient.h for MQTT messaging, and a custom mode routine via the setCustomMode(F("Custom Func"), myFunc) wrapper to effect smooth blend-and-fade color transitions using this solution

The MQTT messages are sent by a remote program upon beat detection during music playback, causing the LED strip to change color accordingly. The setup is a NodeMCU v1.0 with ArduinoOTA support.

Issue

ws2812fx::service() crashed frequently, most of the time leaving the ESP8266 completely unresponsive: WiFi disconnected and the watchdog timer did not reset/restart the ESP8266 even after explicitly specifying ESP.wdtFeed() in loop().

This freezing occurred quite randomly, but especially after long idle periods (e.g. > 10min) between MQTT (beat info) messaging (i.e. long idle periods between music playlists).

For the record, a previous version of the sketch used the standard FX_MODE_STATIC when toggling between colors. The ESP8266 would reset occasionally but never freeze completely.

Solution

As y'all suggested, the solution is to use DMA. I used the Arduino IDE's File >> Examples >> Examples from Custom Libraries >> WS2812FX >> ws2812fx_dma as a template. The only changes to my original sketch were:

  • Install and include #include <NeoPixelBus.h> library by Makuna
  • #define LED_PIN 3 (i.e. RX/D9) for ESP8266 DMA; previously was #define LED_PIN 5 (i.e. D2)
  • NeoPixelBus instantiation via NeoPixelBus<NeoGrbFeature, NeoEsp8266Dma800KbpsMethod> strip(LED_COUNT)
  • In setup(), specified strip.Begin() (immediately) after ws2812fx.init() to allow proper init for GPIO3, then strip.Show() followed by ws2812fx.setCustomShow(myCustomShow) before calling ws2812fx.setCustomMode()
  • Got rid of the ESP.wdtFeed() call in loop()
  • The myCustomShow() function from the ws2812fx_dma.ino template was used AS-IS.

Been testing for 7+ days straight -- with intermittent and long (30min to 20+ hour) idle intervals between MQTT messaging. So far, so good -- no freezes, no resets. It's nice!

Regards

@generic-beat-detector
Copy link

FWIW,

  • Arduino IDE v1.8.13
  • ESP8266 Core v2.7.4
  • WS2812FX v1.3.1
  • PubSubClient v2.8.0
  • NeoPixelBus v2.6.0
  • ArduinoOTA v1.0.5

@sprior
Copy link
Author

sprior commented Jun 28, 2022

I just saw that this is still open. I have also found this to be long term stable when using DMA.

@sprior sprior closed this as completed Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants