-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDT reset in ws2812fx.service() #151
Comments
I discussed this with a friend and he suggested that my current main loop has no calls to sleep() and I may be overheating the CPU as a result. If true that would certainly explain why this is an intermittent issue. I'll experiment with adding a minimal sleep() at the end of the loop and see if this makes a difference. |
I loaded your sketch into my NodeMCU test setup and tweaked it a bit to use my MQTT broker at CloudMQTT. I see the same intermittent WDT resets as you :(. So I don't think there's anything wrong with your hardware. The serial port doesn't show a stack trace when the WDT triggers, so no help there. I see the MQTT client disconnecting, then reconnecting, periodically for no particular reason, and when this happens the ESP8266 freezes for a couple seconds (the ws2812 LEDs stop animating). It may be that occasionally the MQTT disconnect/reconnect sequence (along with everything else going on in the loop() section) is taking too long, causing the WDT to trigger. I'll try sprinkling a few |
Thanks for all the work! Adding a simple sleep(30) at the end of my loop did not solve the problem. Just for some background on the code, I started with the Octoprint Monitor code by Qrome: https://github.com/Qrome/printer-monitor which I initially liked as a template for future code, but decided to refactor to break into more understandable chunks. Then I decided to build the Neopixel Christmas Tree (https://www.nutsvolts.com/magazine/article/build-a-neopixel-led-tree), but with code based on my refactored template. I've already got other projects based on this template that run LCD/OLED screens as well as temperature sensors. I'm also planning to use this for other NeoPixel projects. |
I previously added calls to ESP.wdtFeed() between the major functions called in loop(), but that didn't solve the problem. The other thing that I've noticed is that sometimes the device reboots and can reinitialize and keep going, and sometimes it just hangs (NeoPixels frozen and it does not respond to web page requests) so the CPU is stuck, not just the NeoPixels. |
Well the added Not sure what to try next. I'll have to give it some thought. |
At least I'm not crazy (or this isn't evidence of same). |
One thought was a memory leak somewhere, but I would think that would be less random than this appears to be. |
I want to reverse my previous statement that adding sleep(30) to the bottom of the loop didn't help - it appears to be much more stable now. I have seen it hang once or twice a few minutes after a power up, but I haven't seen it spontaneously reboot at all. |
This one has me stumped. I've been looking into it off and on all week, but haven't really figured out what's going on. I thought it might be heat related too, but tried blasting it with a hair dryer and that didn't make it any worse, so I've abandoned that idea. The only way I've been able to get your sketch to run overnight, without a wdt reset, is to turn off Wifi with these statements:
Otherwise, sooner or later, after several minutes or several hours, it throws a wdt reset. |
One thing I'm curious about that may be related - I've noticed that with my sketch the blue LED on the ESP8266 module itself (not the extra LED on the NodeMCU part of the module) stays lit continuously, but on any other sketch which uses all the same code EXCEPT the NeoPixel part does not light up that LED. I don't believe the pin I'm using to drive the NeoPixels is the same one that's connected to that LED so I've thought it strange. On your test rig are you seeing that same behavior? It would indicate that something unexpected is accessing that LED. I haven't tried running my NeoPixel tree overnight, but in the last few days since I added the sleep(30) in my loop I haven't seen a reset over several hours of runtime. |
The LED on the ESP module is wired to GPIO2, which is the same as the "D4" pin definition used in your sketch. The nodemcu pin definitions are defined here. So the pin used to drive your WS2812s is also driving the LED. That shouldn't be a problem though. |
I hooked up the ESP8266 to a Raspberry Pi to act as a serial logger to try to diagnose this. I did catch one reboot which seems to have taken place during the call to Adafruit_NeoPixel::show() when interrupts were disabled, but unless the pixel config was getting corrupted that didn't make much sense to me. I also had to slow down the esp8266 serial port speed to 19200 and I haven't caught a crash for several days. I'm doing experiments on other devices for more data. Since you had a test rig set up and recreating the issue I'd be interested to see if you find a difference with a lower serial port data rate. |
I have been tinkering with this problem as well. Like you, I couldn't get the Serial port to provide any meaningful debug info. It would either swamp my computer with data and lock up the port, or slow down the loop() to the point where the wdt resets disappeared. I wired up a 7-segment display thinking I could latch data into it and use it to narrow down the spot in the code that was triggering the wdt, but its serial interface was too slow and the wdt resets disappeared. :( I thought I was on to something when I realized removing the first LED (the one that runs mode 11) from the setup caused the wdt resets to disappear. That mode refreshes the LED strip a lot, to the point where the LED strip updates with every loop() iteration. I thought "Eureka! That's it! The sketch is spending too much time (with interrupts disabled) updating the strip." If that were true, then the problem should go away by simply increasing the speed parameter for that one LED, slowing down the strip refresh rate, and allow the ESP8266 time to do other things besides refreshing the LEDs. Well, that didn't work. I cranked the speed parameter for that one LED to 25000 and it still threw wdt resets. Bah! So there's something else going on that I don't understand. I think my next step is to wire up a high speed shift register that I can use to latch data when the sketch is running at full speed. Hopefully that will trap when the wdt reset occurs and provide some useful debug info. I don't spend a lot of time on this, but I do come back to it periodically when I have some free time. |
Well, I saw that you are using pubsubclient. I remember that I had similar problems with a different project. As far as I remember, the problem is related to mqtt.loop() not being called often enough. This fits also with the disconnections from the broker. However, I just don't remember where I found the solution. Maybe just somewhere around pubsubclient or perhaps on stackoverflow. Good luck! |
@mabe42 If you happen to come across that solution I'd be grateful if you post it here. I looked at other MQTT implementations and none of them seem as stable/up to date as pubsubclient. It's interesting if mqtt.loop() isn't called often enough in my case because this main loop pretty much calls the 4 various service routines (OTA, web, pubsub, ws2812fx) and does nothing else. Unless I call mqtt.loop() more than once per main loop I don't see how to call it more often. |
Well, I'm sorry. I think I can't help you furhter. I looked through some notes but apparently I have not kept one on this topic. Well, I also use the pubsubclient library and for my little projects it works ok. I'm only having trouble in places where my WiFi is weak. I had a quick look if I could find the loop topic again. Maybe it was this one: However, there seems to be contradictory information: Hope you find the solution to your nasty problem! |
Maybe a clue, but I've also noticed that some of my other devices which don't have NeoPixels are logging a fair number of reconnects to my MQTT server. The MQTT client id's I use are the MAC address of the device plus a small random suffix, so an example from my Mosquitto log: 1548473889: Socket error on client 2c:3a:e8:0b:7a:cb-973d, disconnecting. I came across this: I don't know if this is the cause of the issue, but it feels like I might be getting closer. |
Browsing the pubsubclient source... I see MQTT_SOCKET_TIMEOUT used in two places, in while (!_client->available()) loops in readByte and one in connect. But I only see a call to yield() in the loop in readByte. I don't understand why if it's necessary there that it isn't also necessary in the other one. |
Keeping my fingers crossed that maybe it's fixed. |
I was so hopeful that patch would be it, but my tree is still crashing. I do think the patch makes the code more correct though. |
After some more experimenting I think the wdt reset is being triggered during the call to interrupts() in Adafruit_NeoPixel::show(). That's the call that reenables interrupts after the NeoPixel strip has been updated. From looking at the Arduino.h file, interrupts() just runs I thought maybe enabling all interrupts with rsil(0), instead of only reenabling interrupts that where enabled before show() was called, may have been a problem. But I changed the nointerrupts()/interrupts() code to this form (as mentioned in Arduino.h):
But those changes did not solve the wdt reset problem. Bah! |
I also want to bring attention to this. I’m in the middle of a 200 ft installation of WS2812 LEDs on a bridge and am using this awesome library. However, the WDT pops up every now and then and completely freezes whatever is going on. If not a solution yet, is there a workaround to at least reset the device altogether to get things going again? Secondly, I think I might need an ESP32 for more memory. Has anyone used an ESP32 with this library, and if so, does this same WDT error occur? |
Wow, 200 ft, that's impressive. How many LEDs? I have not come up with a foolproof way to recover from these kind of WDT resets. Sometimes the WDT does it's job and restarts the ESP, but other times the WDT causes the ESP to freeze and become unresponsive. :( You might want to consider wiring a timer (like a 555) to the RST pin, and use a GPIO pin to constantly reset the timer. If the ESP freezes, it will stop constantly reseting the timer and the timer will eventually fire, triggering a hard reset. Something similar to this. I tried to get my ESP32 working a few months ago, but had issues unrelated to the WDT problem. See #41. |
I haven't looked at how difficult it would be, but I'm starting to wonder if it would be worth trying the ws2812fx code ported to run on top of a different NeoPixel library besides the Adafruit one to see if that made any difference. I believe there are a couple of alternatives. |
Debashish Sahu created NeoAnimationFX to merge WS2812FX with Michael Miller's NeoPixelBus library. This has a variety of methods (bit bang/DMA/UART) to create the pulse stream that updates the LEDs. That might be something worth experimenting with. There's also the ws2812fx_dma sketch in the WS2812FX examples folder that uses DMA, instead of bit banging, to communicate with the LEDs. |
It’s right at 1800 LEDs - it’s going to be a heck of a project. The city is supposed to be drilling holes through the concrete this upcoming week for wiring. I’m confident in getting it wired up correctly. My background is in EE, but I’m not so confident on the software side, and it worries me that I’m going to get this thing installed and have daily issues. Have y’all thought about porting to NeoPixelBus? I was surprised to see LadaAda from Adafruit asking for updates on some bug fixes for the ESP32 on the main NeoPixel git, going on several months - this makes me think some significant rewrite might be needed to resolve. |
Adafruit seems to tends towards using Circuit Python for projects, and I've gotten the impression that ESP8266 isn't their top priority controller. |
So, I believe the DMA example will solve my problem, except 1) I can't get it to compile, and 2) I can't figure out how to reference the ESP32 instead of the ESP8266. This is the issue as-is:
And looking at the header file in NeoPixelBus, it looks like I need to change "NeoEsp8266Dma800KbpsMethod" to "NeoEsp32I2sMethod" but that doesn't work. Can anyone point me in the right direction? |
@haganwalker for the ESP32 you would want to change the dma class to this: //NeoEsp8266Dma800KbpsMethod dma = NeoEsp8266Dma800KbpsMethod(LED_COUNT, 3);
NeoEsp32I2s0800KbpsMethod dma = NeoEsp32I2s0800KbpsMethod(2, LED_COUNT, 3); Note NeoEsp32I2s0800KbpsMethod takes three parameters, not two. The first parameter is the GPIO pin that you're using to drive the LED strip. DO NOT set it to the same GPIO pin used for the WS2812FX class. |
Up until now I have been using version 2.4.2 of the ESP8266 support library, but I see now that 2.5.0 has been released so I'll upgrade. Has anyone tried the new version yet to see if it improved the WDT resets? |
I have upgraded to v2.5.0 of the ESP8266 core package. It does not fix the wdt reset issue. It also suffers from a nasty network bug that is being discussed here. You might want to stick with v2.4.2 for now. |
Using the DMA method, I can confirm that the WDT_Reset issue no longer occurs. This seems to suggest that something upstream from the NeoPixel library is, in fact, the root cause. A workaround would be to use the DMA method and use either |
Does the DMA method interfere with the use of the web server function or AndroidOTA? There does seem to be a conflict with using the serial port (for debug info). |
No it doesn't. On the contrary it separates writing the LEDs from other tasks (AFAIK for the sake of doubling the memory required for the led array - but the esp8266 has a lot to work with). Anyway, you will be very happy with the DMA method. Speed is nearly only limited by the number of leds because of the time required to write the complete frame... And there is no direct conflict with the serial port. It needs to run on the rx pin for sure but I am debugging all the time with the micro USB without any problems. Be aware that during flashing all leds may light up full bright white. So either disconnect data before or be sure your power supply and wires are able to handle it. However, if Ota is running properly this drawback is gone as well... I implemented a few things to ensure Ota will be possible at least after booting but that's another story... One thing is, that you can only use one strip on one data line.. But that's as far as I remember the case for ws2812fx anyway. |
Thanks for explaining that tobi1001! Now that you point it out if DMA uses the pin used for rx and you only care about using tx for outputting debug info then you won't care. I do use OTA mostly now, but needed to know any issues in case I needed to go back after screwing up something more than usual. |
A day after I switched the ws2812fx code to use DMA and NeopixelBus it hasn't crashed since, but has had periodic MQTT reconnects - not the end of the world, but I'll probably want to figure out why eventually. I'm guessing that some of the MQTT ping packets are getting delayed past their deadline which is causing it to think the connection was lost. |
WORKING:SOLVED Thanks to this discussion, I was able to solve the following issue: ScenarioUsing The MQTT messages are sent by a remote program upon beat detection during music playback, causing the LED strip to change color accordingly. The setup is a NodeMCU v1.0 with ArduinoOTA support. Issue
This freezing occurred quite randomly, but especially after long idle periods (e.g. > 10min) between MQTT (beat info) messaging (i.e. long idle periods between music playlists). For the record, a previous version of the sketch used the standard SolutionAs y'all suggested, the solution is to use DMA. I used the Arduino IDE's
Been testing for 7+ days straight -- with intermittent and long (30min to 20+ hour) idle intervals between MQTT messaging. So far, so good -- no freezes, no resets. It's nice! Regards |
FWIW,
|
I just saw that this is still open. I have also found this to be long term stable when using DMA. |
mqtt_neopixel.zip
The attached sketch intermittently experiences wdt resets during the ws2812fx.service() as indicated by Serial.println's just before and after the call. It'll run for an hour or more and then experience the issue.
This sketch is running ArduinoOTA, webserver, MQTT publish/subscribe, ws2812fx. When it boots it sends a MQTT message to request initialization, here are the messages it receives (one message per line).
[{"args":{"b":50},"f":"setBrightness"}]
[{"args":{"color":16776960,"mode":11,"n":0,"reverse":"F","speed":1000,"start":0,"stop":0},"f":"setSegment"}]
[{"args":{"color":255,"mode":42,"n":1,"reverse":"F","speed":1000,"start":1,"stop":8},"f":"setSegment"}]
[{"args":{"color":65535,"mode":42,"n":2,"reverse":"T","speed":1500,"start":9,"stop":20},"f":"setSegment"}]
[{"args":{"color":65280,"mode":42,"n":3,"reverse":"F","speed":2000,"start":21,"stop":36},"f":"setSegment"}]
[{"args":{"color":16776960,"mode":42,"n":4,"reverse":"T","speed":4000,"start":37,"stop":60},"f":"setSegment"}]
[{"args":{"color":16711680,"mode":42,"n":5,"reverse":"F","speed":6000,"start":61,"stop":92},"f":"setSegment"}]
[{"f":"start"}]
And these settings run for quite a while before it fails.
So it is an extended amount of time between the receipt of the last MQTT message and when the WDT reset happens.
The ESP8266 module and the Neopixel rings are powered by an external 5A regulated power supply and not via the USB connection, so I don't think power is the issue (but not 100% sure).
The text was updated successfully, but these errors were encountered: