Memory optimisations ESP32 #869

proddy · 2022-12-31T14:50:42Z

Look at ways to optimize memory usage, reduce heap and fragmentation

Things to investigate and try out:

Default reference config:

standalone ESP32 4MB, Wifi only, compiled with -DEMSESP_DEBUG
using the same build each time as we're only comparing differences
using default factory Settings with the exception it's connected to WiFi and MQTT settings are present

Test with:

first enable/disable any settings via the WebUI
close any WebUI browser windows (so its doesn't reconnect)
run python3 scripts/run_memory_test.py -i <IP> -w 30 -t <token>

The default test is called general and will load a boiler and thermostat with 64 entities and will wait for 30 seconds before taking a measurement.

The text was updated successfully, but these errors were encountered:

proddy · 2022-12-31T14:53:19Z

One idea is to look at https://arduinojson.org/news/2022/12/26/arduinojson-6-20-0/#shallow-copy

proddy · 2023-01-03T10:54:04Z

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/performance/ram-usage.html

proddy · 2023-01-03T11:14:13Z

@MichaelDvP we could experiment with reducing the UART buffer in EMSuart::start() and making the stack 1024 instead of 2048 like

        xTaskCreate(uart_event_task, "uart_event_task", 1024, NULL, configMAX_PRIORITIES - 1, NULL);

which gives 1k back in heap. I tested this and seems to work, but its risky. What do you think is the correct value?

MichaelDvP · 2023-01-04T09:36:05Z

I've added a LOG_DEBUG("free stack: %d", uxTaskGetStackHighWaterMark(NULL)); to incoming_telegram and found only 88 bytes free stack for this task. I don't thik it works stable with less than 2k stacksize.
We can remove the -D CONFIG_UART_ISR_IN_IRAM from platformio.ini, freeing a bit iram.

proddy · 2023-01-04T09:57:48Z

not sure if it's related but when I reduced the task to 1024 bytes the OTA stops working around 8% upload (on E32). So guess we leave it as 2048, it's fine and safe and also the same as in the example library.

It seems memory is becoming an issue again and all my nightmares from the ESP8266 writing my own internal std libraries and doing code optimizations into the night are coming back! In future gateway boards, we'll have more RAM and PSRAM to play with but that doesn't solve the thousands of existing users using the 4MB ESP32.

We can throttle and limit the WebUI and MQTT like you did by checking the memory before creating those large DynamicJson buffers. We can also think of a way to do this dynamically by first calculating the size of the buffer by counting the entities and then allocating the memory to JsonBuffer - just a thought and not sure if this can be done real-time. What do you think?

What keeps me awake at night are these crashes.

Heap on a cold-start with no devices/entities is 160 KB and drops to 100 KB on a few devices and 180 entities. Which seems like more than enough stack. I haven't seen signs of memory leakage anymore. BTW I didn't know the mDNS library takes up 5KB, it's a beast.

The Max allocation/available block I think is what is hurting us. All the string functions (sprintf/std::string concatenation) are causing fragmentation and reducing the size of the block. I think anything < 40KB is a problem.

The other thing I've been working on is it making it more thread-safe, especially if the console, syslog and web log are running. The ESP32 is dual-core and not all functions are single-threaded. Although I don't think that is related to the memory problems or crashes.

MichaelDvP · 2023-01-04T16:49:40Z

Could you take a look at the post of weblog settings. Every time i change a setting i get reproduceable a rx error. But usart/rx is a comlete different task. And it's only in weblog page. Settings in weblog are different from other, there is no send button and post seems to handled different.

For the weblog buffer i've tested to only reset the log_message_id_tail_ and let the event send the buffer one by one. Then we don't need the large msgpackjson to send the buffer at once.

void WebLogService::fetchLog(AsyncWebServerRequest * request) {
    log_message_id_tail_ = 0;
    request->send(200);
    return;

Works. We only need to start id from 1 (emplace_back(++log_message_id_,) to get first message.

proddy · 2023-01-05T08:53:09Z

I've created a new branch tech-upgrade for testing these changes and monitoring memory usage and performance. These changes will most likely end up in EMS-ESP 3.6.0.

proddy · 2023-01-05T09:23:10Z

Another thing to look at (potentially) is the filesystem, which uses 28K of Flash. By compressing (json messagepack) or using something completely different like CBOR (Concise Binary Object Representation, see https://www.rfc-editor.org/rfc/rfc7049 and https://github.com/ssilverman/libCBOR) would half that, and we can use the space for more Web code and translations.

proddy · 2023-01-05T15:10:17Z

I've created a new branch tech-upgrade for testing these changes and monitoring memory usage and performance. These changes will most likely end up in EMS-ESP 3.6.0.

I've made some small minor changes in the new branch. Should have done a PR but forgot I was adding to upstream so apologies if the changes are difficult to track. One rule I did is make any buffer <1024 use a StaticJsonDocument and not Dynamic, which adds onto stack vs heap, so in theory less fragmentation

proddy · 2023-01-07T16:24:25Z

I moved some std::strings over to const char *'s in this branch. Also the ability to run and debug in Visual Studio Code without using an ESP32 which will help memory profile the crashes @HansRemmerswaal is experiencing.

proddy · 2023-01-07T20:31:22Z

@MichaelDvP for shallowCopy() see bblanchon/ArduinoJson#1849

MichaelDvP · 2023-01-08T13:27:16Z

Yes, the shallCopy will help to reduce the large onBlock buffers on heap. But another thing are the crashes. This should not happen. I think it's a larger buffer that can not be allocated and returns nullptr. Json never retruns nullptr, only documentcapacity zero, that is safe. I think maybe somewhere in mqtt, the boiler_data is large and needs a serialization buffer. Or thermostat_data, Hans have 2 thermostats, both published in a single data. Mqtts-single option could help.

proddy · 2023-01-08T14:05:33Z

I think it's in the MQTT code too. I'm running some tests and can reproduce the crash - I just need to figure out where.

I'm using this new tech-upgrade branch, building with -DEMSESP_DEBUG, flashing to a bare ESP32, then from the Console test memory which replicates Hans' setup with 2 thermostats and the boiler, and adds all the 216 entities with dummy data. With MQTT enabled and mDNS and NTP disabled show system comes back with "Free heap/Max alloc: 106 KB / 43 KB" (note: also have the console and web open at this point and the console eats up 2KB heap as well).

Then I turn on MQTT Discovery and ka-boom

Also without MQTT show system gives me 114/51 (free/max_alloc). When MQTT is enabled it drops slightly to 112/46 which means the MQTT code is quite optimized and efficient with it's buffers.

bbqkees · 2023-01-09T10:36:11Z

I currently have at least one customer who also experiences crashes when they specifically turn on MQTT.
"Ich kann diesen Fehler jetzt auch reproduzieren. Jedesmal wenn ich MQTT einschalte, verliert das Gateway ständig die Verbindung bis es irgendwann gar nicht mehr funktioniert."

proddy · 2023-01-09T12:07:39Z

It's something in the MQTT code for sure. I changed the test to only load the boiler (160 entities)

 without MQTT: 138/79 (heap/max) <--- Works fine
 with MQTT   : 113/39 (heap/max) <--- *CRASH*

Which points to a defrag problem in the MQTT code and the max allocation not big enough. Investigation continues....

proddy · 2023-01-11T19:21:52Z

another thing to look at is the MALLOC_CAP_32BIT flags https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/mem_alloc.html#bit-accessible-memory

MichaelDvP · 2023-01-12T08:53:03Z

Do you know what process tries to malloc such a large memory block?
I think this is the real issue. The 20k json is only for web (Dashboard/Custom/Log) and api. But mqtt only allocs small messages.
In the test build i've seen that mqtt discovery queues 160 messages, heap goes down to 60/38, then the messages sended and heap goes up again to 110/38. After last message: crash. If i reduce the max queue to 10, a lot of messages are kicked, but the crash also happens.

I think there is a char * with missing termination and something like a serialiation buffer reads memory until finding a null.
But i can not find it.

proddy · 2023-01-12T11:25:16Z

I found the similar pattern and can simulate the crash without HA (I've been working on the automated test script). Bu tI haven't had time to debug where it is happening. I think it's also related to a nullptr somewhere as I can't imagine malloc() crashing the system like it does. There is also a memory leak somewhere - I've seen the heap get smaller over a period of days. I'll find time to test this all out soon

proddy · 2023-01-13T19:16:18Z

I've been running some memory tests over the last 24hrs looking at how services affect the free heap and max alloc buffer. Some interesting results. I'll summarize below what I've found out so far:

the good news:

enabling MQTT with those 29+6 entities takes up 10KB heap which sounds right and hardly any change to max alloc, so there is not a fragmentation problem there
disabling AP frees up only 1KB, so that's fine
MQTT QOS to 1 or 2 hardly makes a difference to memory usage
Telnet doesn't use any memory when enabled, but opening a session takes 6KB heap and 2KB max alloc which doesn't always get cleaned up when exiting. Not a big problem and is acceptable

the bad news:

adding a boiler (29 entities) and a thermostat (6 entities) takes up 28KB free heap mem and 53KB from the max alloc buffer. This is 50% of the max alloc to start with (!!) so there is some defrag happening in adding device entities that need looking into
enabling Discovery takes up an additional 20KB heap and 12KB max alloc, so there is a problem there with fragmentation (probably the strings) that needs looking into

things we can't fix but should be aware of and document:

Disabling OTA frees up 10KB heap and 10KB max alloc buffer so that library could do with some optimizing
Enabling mDNS uses 6KB of heap and 6KB max alloc buffer

I haven't tested SysLog or the Web Log yet.

@MichaelDvP

MichaelDvP · 2023-01-14T10:33:10Z

adding a boiler (29 entities) and a thermostat (6 entities) takes up 28KB free heap mem and 53KB from the max alloc buffer. This is 50% of the max alloc to start with (!!) so there is some defrag happening in adding device entities that need looking into

This splits into adding devices, a lot of register_telegram and the 35 register_device_value. Are the entities the registered, or the only ones with values. A register_device_value is around 100 bytes, e.g. 3,5k for the entties, 24k for the telegrams and devices.

proddy · 2023-01-15T08:46:21Z

This splits into adding devices, a lot of register_telegram and the 35 register_device_value. Are the entities the registered, or the only ones with values. A register_device_value is around 100 bytes, e.g. 3,5k for the entties, 24k for the telegrams and devices.

It's not the free mem I'm worried about, I think we have enough. It's the max allocation block that goes from 107KB to 54KB and stays there after adding the 35 entities/device values. That is what I'm going to look into when I'm back next week. The frag is mainly due to the std::string concatenations and snprintf()'s.

One other memory test we can add is just adding a malloc() somewhere in setup() to force the available heap to say 70KB (leaving max alloc high at 100) and see how that operates.

proddy · 2023-01-21T09:17:39Z

I'll also set OTA to be disabled as Default to save 6KB heap&alloc buffer. Any issues with this @MichaelDvP ?

MichaelDvP · 2023-01-21T09:29:13Z

I see no issues, imho most people use web upload, if someone really want to use OTA he can enable it. I think we can also set this as default in v3.5.

proddy · 2023-01-22T13:21:03Z

I see no issues, imho most people use web upload, if someone really want to use OTA he can enable it. I think we can also set this as default in v3.5.

added to 3.5.0-dev-16

proddy · 2023-02-06T15:21:24Z

@MichaelDvP I've implemented #911 (show a message in WebUI when there are unsaved changes) and flash is up to 99.8% on 4MB which breaks the OTA (not enough space). I haven't looked into it yet but it should fit the partition?

MichaelDvP · 2023-02-06T16:47:36Z

The upload checks for filesize, this is always a bit bigger than the reported flash size. I think this the image header, see here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/app_image_format.html
Maybe change this line

EMS-ESP32/lib/framework/UploadFileService.cpp

Line 80 in e66e2c4

if (Update.begin(fsize)) {

to if (Update.begin()) { then it writes the partition and do not compare partitionsize with filesize..
This shlould also be safe, if new data does not fit in partition this

EMS-ESP32/lib/framework/UploadFileService.cpp

Line 106 in e66e2c4

if (Update.write(data, len) != len) {

will give an error.

proddy · 2023-02-23T19:46:36Z

think I'm done here. Profiled the tech-upgrade build to death and the key conclusions are

AP, MQTT, MQTT Discovery, Telnet take up little heap memory or cause fragmentation
Disabling OTA, mDNS, and switching off ETH (when using Wifi only) are things that give back a huge amount of data back
Adding device entities takes up most of the heap and causes a lot of fragmentation but there's nothing I code to make this more efficient.

proddy · 2023-02-23T19:49:25Z

In the tech-upgrade build I made some other modifications along the way, half of which I've forgotten with my Covid brain fog. However one worth noticing is that I removed DEBUG from the log. This is only for coders who want to debug when building (with -DEMSESP_DEBUG flag) and not part of the production build.

proddy · 2023-02-24T09:52:27Z

and @MichaelDvP i've added all your changes from your dev branch to tech-upgrade. I'll keep testing for a few days more and then, if you agree, push it to dev as 3.6.0-dev-0

proddy added the enhancement New feature or request label Dec 31, 2022

proddy added this to the v3.6.0 milestone Dec 31, 2022

proddy added technical Technical enhancement, or tech-debt issue and removed enhancement New feature or request labels Dec 31, 2022

proddy mentioned this issue Jan 4, 2023

EMS-ESP status disconnected #859

Closed

proddy mentioned this issue Jan 6, 2023

Blacklist / Remove Entities #891

Closed

proddy changed the title ~~Memory optimizations~~ Memory optimisations ESP32 Jan 6, 2023

proddy added a commit that referenced this issue Jan 7, 2023

replace std::string with const char #869

1f493b2

proddy added the enhancement New feature or request label Feb 18, 2023

proddy removed the enhancement New feature or request label Feb 24, 2023

proddy removed this from the v3.6.0 milestone May 25, 2023

proddy closed this as completed Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimisations ESP32 #869

Memory optimisations ESP32 #869

proddy commented Dec 31, 2022 •

edited

Loading

proddy commented Dec 31, 2022

proddy commented Jan 3, 2023

proddy commented Jan 3, 2023

MichaelDvP commented Jan 4, 2023 •

edited

Loading

proddy commented Jan 4, 2023

MichaelDvP commented Jan 4, 2023

proddy commented Jan 5, 2023

proddy commented Jan 5, 2023

proddy commented Jan 5, 2023

proddy commented Jan 7, 2023

proddy commented Jan 7, 2023

MichaelDvP commented Jan 8, 2023

proddy commented Jan 8, 2023 •

edited

Loading

bbqkees commented Jan 9, 2023

proddy commented Jan 9, 2023

proddy commented Jan 11, 2023

MichaelDvP commented Jan 12, 2023

proddy commented Jan 12, 2023

proddy commented Jan 13, 2023 •

edited

Loading

MichaelDvP commented Jan 14, 2023

proddy commented Jan 15, 2023

proddy commented Jan 21, 2023

MichaelDvP commented Jan 21, 2023

proddy commented Jan 22, 2023

proddy commented Feb 6, 2023

MichaelDvP commented Feb 6, 2023 •

edited

Loading

proddy commented Feb 23, 2023

proddy commented Feb 23, 2023

proddy commented Feb 24, 2023

Memory optimisations ESP32 #869

Memory optimisations ESP32 #869

Comments

proddy commented Dec 31, 2022 • edited Loading

proddy commented Dec 31, 2022

proddy commented Jan 3, 2023

proddy commented Jan 3, 2023

MichaelDvP commented Jan 4, 2023 • edited Loading

proddy commented Jan 4, 2023

MichaelDvP commented Jan 4, 2023

proddy commented Jan 5, 2023

proddy commented Jan 5, 2023

proddy commented Jan 5, 2023

proddy commented Jan 7, 2023

proddy commented Jan 7, 2023

MichaelDvP commented Jan 8, 2023

proddy commented Jan 8, 2023 • edited Loading

bbqkees commented Jan 9, 2023

proddy commented Jan 9, 2023

proddy commented Jan 11, 2023

MichaelDvP commented Jan 12, 2023

proddy commented Jan 12, 2023

proddy commented Jan 13, 2023 • edited Loading

MichaelDvP commented Jan 14, 2023

proddy commented Jan 15, 2023

proddy commented Jan 21, 2023

MichaelDvP commented Jan 21, 2023

proddy commented Jan 22, 2023

proddy commented Feb 6, 2023

MichaelDvP commented Feb 6, 2023 • edited Loading

proddy commented Feb 23, 2023

proddy commented Feb 23, 2023

proddy commented Feb 24, 2023

proddy commented Dec 31, 2022 •

edited

Loading

MichaelDvP commented Jan 4, 2023 •

edited

Loading

proddy commented Jan 8, 2023 •

edited

Loading

proddy commented Jan 13, 2023 •

edited

Loading

MichaelDvP commented Feb 6, 2023 •

edited

Loading