EMS-ESP status disconnected #998
Replies: 41 comments
-
have a look at the BUS status and uptime to see if your EMS-ESP is losing network connectivity (your RSSI is very low at -86, see https://www.metageek.com/training/resources/understanding-rssi/) or your EMS-ESP is restarting dur to out-of-memory (your available free memory at 83 and available block size is also very low). |
Beta Was this translation helpful? Give feedback.
-
The why the RSSI is that low I do not understand, there is a WIFI mesh module nearby. But that might be related to the fact that I can't see the EMS-ESP as device connected to the network... Working on that. Question is why it is restarting that often. It looks like that this restart frequency has increased since I installed EMS-ESP-3_5_0b12-ESP32 and higher. In the last two graphs you see the uptime, strange that I have 2 different uptime fields from EMS-ESP, one before and one after the upgrade. Looks like the UoM has changed... When I look at the Free Memory graph I get the impression there is enough free memory. But maybe the system did not get enough time to interface the low memory. |
Beta Was this translation helpful? Give feedback.
-
I have done a factory reset and configured all again, now with fixed IP. Right now the RSSI is -47 so it connected to the nearby DECO M5. But I doubt that this RSSI was causing the restart of the device each time. |
Beta Was this translation helpful? Give feedback.
-
And again the EMS-ESP restarted. |
Beta Was this translation helpful? Give feedback.
-
Keeps restarting. Is there a way to capture the reason for the restart? E.g. the reset reason is "Software reset CPU", so maybe a more detailed reason can be given for the root cause. Or maybe a dump can be stored somewhere with the needed details. I'm using this version: https://bbqkees-electronics.nl/product/gateway-e32-ethernet-edition/ "System Info": { |
Beta Was this translation helpful? Give feedback.
-
hmm, it could me memory related. Your free mem and max alloc is very low (mine is 133/75, yours 83/39). The first thing I would do is change the Log Buffer max size. I updated the wiki with some other things to try out: https://emsesp.github.io/docs/Troubleshooting/#ems-esp-sometimes-crashes-and-restarts sorry about this. I know how annoying it can be. It's hard to simulate everyone's setup but we're working on it. |
Beta Was this translation helpful? Give feedback.
-
In the settings I disabled all the options (in my case Telnet Console and Analog Sensors) Question: why do we need a Buffer Size? If we keep the screen open we keep all the messages. Would Buffer Size = 0 be an option or will that not save much memory? Shortly after restart I have: A few minutes later when all the entities are found: With NTP and MQTT disabled: With MQTT enabled and NTP and OTA disabled: So not much saving by disabling services. Any suggestion how to free up memory? 000+00:00:00.000 I 0: [emsesp] Last system reset reason Core0: Software reset CPU, Core1: Software reset CPU |
Beta Was this translation helpful? Give feedback.
-
It's so you can go to the Log page and always see the last logs, which is useful when EMS-ESP is booting up I'm going to try and simulate your setup by adding dummy devices so I can trace what is happening. |
Beta Was this translation helpful? Give feedback.
-
I don't want to be to optimistic, but the EMS-ESP didn't restarted anymore since I disabled... |
Beta Was this translation helpful? Give feedback.
-
that's good but equally strange since Telnet/Analog/OTA don't really use a lot of heap memory. 100/41 is just on the edge. Michael and I are going to do some memory optimizations in 3.6.0 as soon as the current 3.5.x is out. Out of interest, are you compiling yourself or taking the firmware .bin files from the repo? Reason I'm asking is that there was a pio library update a few days ago that is using by default espressif32 5.3.0 which adds a lot more mem |
Beta Was this translation helpful? Give feedback.
-
No, I'm using your dev build from https://github.com/emsesp/EMS-ESP32/releases. I have installed the one of Dec 28th. I see there is one of yesterday, will install that one. Didn't help, 106 KB / 40 KB. |
Beta Was this translation helpful? Give feedback.
-
@MichaelDvP what do you think? It's 100+ KB free heap seem normal for 3 devices (boiler and 2 thermostats) with a total of180 entities. My system has 78 so hard to compare. I'm planning to make some changes to test.cpp so we can simulate adding devices on a standalone ESP32 and watch how EMS-ESP handles the memory. |
Beta Was this translation helpful? Give feedback.
-
Yes, in #857 the system is E32, boiler, heatsource, thermostat, 2x mixer, solar and also ~100k heap and 38k block. The heap is imo not critical, it's the fragmentation. I tried to malloc(40k) in emsesp-start and in dashboard, custoimzation and api i free the buffer, alloc the reponse and after send realloc the 40k buffer. Seems to work good. Worst what could happen is, that realloc 40k failed, then it's normal emsesp again. Now i'm testing to malloc another extra 20k just to reduce free heap. Also unlimited 100 message log buffer. |
Beta Was this translation helpful? Give feedback.
-
I was to optimistic... Restarted twice today, at 04:00 and 10:00. |
Beta Was this translation helpful? Give feedback.
-
another to try is using the customization web page and disabling about 30-50 of the entities, or things you're not interested in seeing (from the 180 you have). And then use http://ems-esp.local/api/system/info to see the "free mem" and "max alloc" |
Beta Was this translation helpful? Give feedback.
-
mDNS disabled Heap (Free / Max Alloc) |
Beta Was this translation helpful? Give feedback.
-
Ok, let us know if it still crashes. We're doing some memory optimizations in the background (#869) |
Beta Was this translation helpful? Give feedback.
-
The highest uptime since Dec 20th... So looks promising. |
Beta Was this translation helpful? Give feedback.
-
See #894 Maybe something to add to the It may be memory related in I will check if these extra KB are enough to prevent the restarts. Heap (Free / Max Alloc) disabled OTA disabled ethernet |
Beta Was this translation helpful? Give feedback.
-
Thanks Hans for all the time you put into benchmarking the scenarios. It's been really helpful. #869 will help us figure out how much heap and max-alloc we can safely live with before the ESP32 implodes. I'm working on some optimizations in a separate branch, one that includes this max-alloc in the heartbeat MQTT topic so I can see if it starts getting smaller over time. |
Beta Was this translation helpful? Give feedback.
-
Do you want to keep this one open or shall we close it? We have the #869 in place to improve the heap memory, and on top of that we have the #891 which allows to excluded entities completely if they are not relevant for someone's configuration. After #869 I can try to enable functionality again. |
Beta Was this translation helpful? Give feedback.
-
I'd prefer we keep it open as a reference and reminder. I'm making progress on #869 - using that build I've seen an increase in the max alloc buffer by 24% on my live environment and heap by 8%. It's still not enough though, loading 200+ entities crashes and I'm almost 100% sure it's related to MQTT publishing. I'll have more time in a few weeks to pick this up again as I'm traveling extensively for work this month |
Beta Was this translation helpful? Give feedback.
-
Do I understand that (on high level);
If the above is true then is the problem related to the speed of sending the MQTT messages or is it related to free up the speed in the buffer? On top of that, if the setting is QoS0 then there is a risk that messages will get lost. If someone doesn't want that then QoS1 or QoS2 should be used. Don't get me wrong, finding and solving the root cause is better. But in the meantime we can try to avoid the restarts by using such tricks.. |
Beta Was this translation helpful? Give feedback.
-
This is what I will need to investigate, now that I can reproduce the crash. The fix to prevent the restarts is to prevent over-allocation of buffers, and report loss of messages/data as an log ERR. Then we can optimize the memory allocation. |
Beta Was this translation helpful? Give feedback.
-
I have disconnection issues, the s32 drops to AP mode and disconnects from the network. I don't know whether it's related to this issue, but @proddy asked me to mention it here. I will attach my config and the screenshot of the HA Free memory and EMS Status around the time of the last disconnect. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I just manually rebootet HA, and it disconnected MQTT and it won't reconnect.
|
Beta Was this translation helpful? Give feedback.
-
if you're hosting a Mosquitto broker on HA and restarting HA every night and EMS-ESP doesn't reconnect - is that correct? If so please create a new issue and include the support info. I don't think it's related to this one, I was wrong, sorry |
Beta Was this translation helpful? Give feedback.
-
I disabled the automation, but yes, that's what happened last time. MQTT Broker is also running on HA. I will open an issue later. |
Beta Was this translation helpful? Give feedback.
-
closing this for now as we know its memory related and workarounds have been published. |
Beta Was this translation helpful? Give feedback.
-
Question
I sometimes see some "gaps" in some measurement in HA. Checking the EMS-ESP status then I see that around those times the status was disconnected. However, when checking the logs I do not see anything strange on the EMS-ESP.
Not sure yet where is the problem. Can we, based on the attached log file, exclude that the issue is in EMS-ESP?
Screenshots
Device information
emsesp_info.txt
Additional context
log 20221228_153827 to 20221229_072137.zip
Beta Was this translation helpful? Give feedback.
All reactions