Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimisations ESP32 #869

Closed
7 of 13 tasks
proddy opened this issue Dec 31, 2022 · 29 comments
Closed
7 of 13 tasks

Memory optimisations ESP32 #869

proddy opened this issue Dec 31, 2022 · 29 comments
Labels
technical Technical enhancement, or tech-debt issue

Comments

@proddy
Copy link
Contributor

proddy commented Dec 31, 2022

Look at ways to optimize memory usage, reduce heap and fragmentation

Things to investigate and try out:

  • Allocate fixed memory buffers for those large DynamicJsonDocuments needed for reading the config, sending MQTT or Web Dashboard values, to avoid fragmentation
  • Replace JSON with CBOR for the Settings internally stored in the File System. It will save us flash only.
  • Support PSRAM, using this for the JSON buffers when generating the payloads
  • Replace MessagePack JSON buffers sent to the Web with binary format, CBOR too?
  • Thread safe uuid libraries (upgrade) (Upgrade uuid console, telnet and log services #809)
  • See if upgrading to espressif 3.5.0 helps (Upgrade to platform-espressif32 6.0.0 (after 5.3.0) #862)
  • optimizing the way we build up json buffers, maybe try using ShallowCopy - decided this won't help
  • replace functions that return std::string with const char * where possible
  • MALLOC_CAP_32BIT flags https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/mem_alloc.html#bit-accessible-memory
  • replace free() with static buffer
  • look at using pre-allocated memory pools for JSON
  • look at replacing std::function with bind() or back to traditional function pointers
  • look at IRAM and using this for fixed/slow data elements

Default reference config:

  • standalone ESP32 4MB, Wifi only, compiled with -DEMSESP_DEBUG
  • using the same build each time as we're only comparing differences
  • using default factory Settings with the exception it's connected to WiFi and MQTT settings are present

Test with:

  • first enable/disable any settings via the WebUI
  • close any WebUI browser windows (so its doesn't reconnect)
  • run python3 scripts/run_memory_test.py -i <IP> -w 30 -t <token>

The default test is called general and will load a boiler and thermostat with 64 entities and will wait for 30 seconds before taking a measurement.

@proddy proddy added the enhancement New feature or request label Dec 31, 2022
@proddy proddy added this to the v3.6.0 milestone Dec 31, 2022
@proddy proddy added technical Technical enhancement, or tech-debt issue and removed enhancement New feature or request labels Dec 31, 2022
@proddy
Copy link
Contributor Author

proddy commented Dec 31, 2022

@proddy
Copy link
Contributor Author

proddy commented Jan 3, 2023

@proddy
Copy link
Contributor Author

proddy commented Jan 3, 2023

@MichaelDvP we could experiment with reducing the UART buffer in EMSuart::start() and making the stack 1024 instead of 2048 like

        xTaskCreate(uart_event_task, "uart_event_task", 1024, NULL, configMAX_PRIORITIES - 1, NULL);

which gives 1k back in heap. I tested this and seems to work, but its risky. What do you think is the correct value?

@MichaelDvP
Copy link
Contributor

MichaelDvP commented Jan 4, 2023

I've added a LOG_DEBUG("free stack: %d", uxTaskGetStackHighWaterMark(NULL)); to incoming_telegram and found only 88 bytes free stack for this task. I don't thik it works stable with less than 2k stacksize.
We can remove the -D CONFIG_UART_ISR_IN_IRAM from platformio.ini, freeing a bit iram.

@proddy
Copy link
Contributor Author

proddy commented Jan 4, 2023

not sure if it's related but when I reduced the task to 1024 bytes the OTA stops working around 8% upload (on E32). So guess we leave it as 2048, it's fine and safe and also the same as in the example library.

It seems memory is becoming an issue again and all my nightmares from the ESP8266 writing my own internal std libraries and doing code optimizations into the night are coming back! In future gateway boards, we'll have more RAM and PSRAM to play with but that doesn't solve the thousands of existing users using the 4MB ESP32.

We can throttle and limit the WebUI and MQTT like you did by checking the memory before creating those large DynamicJson buffers. We can also think of a way to do this dynamically by first calculating the size of the buffer by counting the entities and then allocating the memory to JsonBuffer - just a thought and not sure if this can be done real-time. What do you think?

What keeps me awake at night are these crashes.

Heap on a cold-start with no devices/entities is 160 KB and drops to 100 KB on a few devices and 180 entities. Which seems like more than enough stack. I haven't seen signs of memory leakage anymore. BTW I didn't know the mDNS library takes up 5KB, it's a beast.

The Max allocation/available block I think is what is hurting us. All the string functions (sprintf/std::string concatenation) are causing fragmentation and reducing the size of the block. I think anything < 40KB is a problem.

The other thing I've been working on is it making it more thread-safe, especially if the console, syslog and web log are running. The ESP32 is dual-core and not all functions are single-threaded. Although I don't think that is related to the memory problems or crashes.

@MichaelDvP
Copy link
Contributor

Could you take a look at the post of weblog settings. Every time i change a setting i get reproduceable a rx error. But usart/rx is a comlete different task. And it's only in weblog page. Settings in weblog are different from other, there is no send button and post seems to handled different.

For the weblog buffer i've tested to only reset the log_message_id_tail_ and let the event send the buffer one by one. Then we don't need the large msgpackjson to send the buffer at once.

void WebLogService::fetchLog(AsyncWebServerRequest * request) {
    log_message_id_tail_ = 0;
    request->send(200);
    return;

Works. We only need to start id from 1 (emplace_back(++log_message_id_,) to get first message.

@proddy
Copy link
Contributor Author

proddy commented Jan 5, 2023

I've created a new branch tech-upgrade for testing these changes and monitoring memory usage and performance. These changes will most likely end up in EMS-ESP 3.6.0.

@proddy
Copy link
Contributor Author

proddy commented Jan 5, 2023

Another thing to look at (potentially) is the filesystem, which uses 28K of Flash. By compressing (json messagepack) or using something completely different like CBOR (Concise Binary Object Representation, see https://www.rfc-editor.org/rfc/rfc7049 and https://github.com/ssilverman/libCBOR) would half that, and we can use the space for more Web code and translations.

@proddy
Copy link
Contributor Author

proddy commented Jan 5, 2023

I've created a new branch tech-upgrade for testing these changes and monitoring memory usage and performance. These changes will most likely end up in EMS-ESP 3.6.0.

I've made some small minor changes in the new branch. Should have done a PR but forgot I was adding to upstream so apologies if the changes are difficult to track. One rule I did is make any buffer <1024 use a StaticJsonDocument and not Dynamic, which adds onto stack vs heap, so in theory less fragmentation

@proddy proddy changed the title Memory optimizations Memory optimisations ESP32 Jan 6, 2023
proddy added a commit that referenced this issue Jan 7, 2023
@proddy
Copy link
Contributor Author

proddy commented Jan 7, 2023

I moved some std::strings over to const char *'s in this branch. Also the ability to run and debug in Visual Studio Code without using an ESP32 which will help memory profile the crashes @HansRemmerswaal is experiencing.

@proddy
Copy link
Contributor Author

proddy commented Jan 7, 2023

@MichaelDvP for shallowCopy() see bblanchon/ArduinoJson#1849

@MichaelDvP
Copy link
Contributor

Yes, the shallCopy will help to reduce the large onBlock buffers on heap. But another thing are the crashes. This should not happen. I think it's a larger buffer that can not be allocated and returns nullptr. Json never retruns nullptr, only documentcapacity zero, that is safe. I think maybe somewhere in mqtt, the boiler_data is large and needs a serialization buffer. Or thermostat_data, Hans have 2 thermostats, both published in a single data. Mqtts-single option could help.

@proddy
Copy link
Contributor Author

proddy commented Jan 8, 2023

I think it's in the MQTT code too. I'm running some tests and can reproduce the crash - I just need to figure out where.

I'm using this new tech-upgrade branch, building with -DEMSESP_DEBUG, flashing to a bare ESP32, then from the Console test memory which replicates Hans' setup with 2 thermostats and the boiler, and adds all the 216 entities with dummy data. With MQTT enabled and mDNS and NTP disabled show system comes back with "Free heap/Max alloc: 106 KB / 43 KB" (note: also have the console and web open at this point and the console eats up 2KB heap as well).

Then I turn on MQTT Discovery and ka-boom

Also without MQTT show system gives me 114/51 (free/max_alloc). When MQTT is enabled it drops slightly to 112/46 which means the MQTT code is quite optimized and efficient with it's buffers.

image

@bbqkees
Copy link
Contributor

bbqkees commented Jan 9, 2023

I currently have at least one customer who also experiences crashes when they specifically turn on MQTT.
"Ich kann diesen Fehler jetzt auch reproduzieren. Jedesmal wenn ich MQTT einschalte, verliert das Gateway ständig die Verbindung bis es irgendwann gar nicht mehr funktioniert."

@proddy
Copy link
Contributor Author

proddy commented Jan 9, 2023

It's something in the MQTT code for sure. I changed the test to only load the boiler (160 entities)

 without MQTT: 138/79 (heap/max) <--- Works fine
 with MQTT   : 113/39 (heap/max) <--- *CRASH*

Which points to a defrag problem in the MQTT code and the max allocation not big enough. Investigation continues....

@proddy
Copy link
Contributor Author

proddy commented Jan 11, 2023

@MichaelDvP
Copy link
Contributor

Do you know what process tries to malloc such a large memory block?
I think this is the real issue. The 20k json is only for web (Dashboard/Custom/Log) and api. But mqtt only allocs small messages.
In the test build i've seen that mqtt discovery queues 160 messages, heap goes down to 60/38, then the messages sended and heap goes up again to 110/38. After last message: crash. If i reduce the max queue to 10, a lot of messages are kicked, but the crash also happens.

I think there is a char * with missing termination and something like a serialiation buffer reads memory until finding a null.
But i can not find it.

@proddy
Copy link
Contributor Author

proddy commented Jan 12, 2023

I found the similar pattern and can simulate the crash without HA (I've been working on the automated test script). Bu tI haven't had time to debug where it is happening. I think it's also related to a nullptr somewhere as I can't imagine malloc() crashing the system like it does. There is also a memory leak somewhere - I've seen the heap get smaller over a period of days. I'll find time to test this all out soon

@proddy
Copy link
Contributor Author

proddy commented Jan 13, 2023

I've been running some memory tests over the last 24hrs looking at how services affect the free heap and max alloc buffer. Some interesting results. I'll summarize below what I've found out so far:

the good news:

  • enabling MQTT with those 29+6 entities takes up 10KB heap which sounds right and hardly any change to max alloc, so there is not a fragmentation problem there
  • disabling AP frees up only 1KB, so that's fine
  • MQTT QOS to 1 or 2 hardly makes a difference to memory usage
  • Telnet doesn't use any memory when enabled, but opening a session takes 6KB heap and 2KB max alloc which doesn't always get cleaned up when exiting. Not a big problem and is acceptable

the bad news:

  • adding a boiler (29 entities) and a thermostat (6 entities) takes up 28KB free heap mem and 53KB from the max alloc buffer. This is 50% of the max alloc to start with (!!) so there is some defrag happening in adding device entities that need looking into
  • enabling Discovery takes up an additional 20KB heap and 12KB max alloc, so there is a problem there with fragmentation (probably the strings) that needs looking into

things we can't fix but should be aware of and document:

  • Disabling OTA frees up 10KB heap and 10KB max alloc buffer so that library could do with some optimizing
  • Enabling mDNS uses 6KB of heap and 6KB max alloc buffer

I haven't tested SysLog or the Web Log yet.

@MichaelDvP

@MichaelDvP
Copy link
Contributor

adding a boiler (29 entities) and a thermostat (6 entities) takes up 28KB free heap mem and 53KB from the max alloc buffer. This is 50% of the max alloc to start with (!!) so there is some defrag happening in adding device entities that need looking into

This splits into adding devices, a lot of register_telegram and the 35 register_device_value. Are the entities the registered, or the only ones with values. A register_device_value is around 100 bytes, e.g. 3,5k for the entties, 24k for the telegrams and devices.

@proddy
Copy link
Contributor Author

proddy commented Jan 15, 2023

This splits into adding devices, a lot of register_telegram and the 35 register_device_value. Are the entities the registered, or the only ones with values. A register_device_value is around 100 bytes, e.g. 3,5k for the entties, 24k for the telegrams and devices.

It's not the free mem I'm worried about, I think we have enough. It's the max allocation block that goes from 107KB to 54KB and stays there after adding the 35 entities/device values. That is what I'm going to look into when I'm back next week. The frag is mainly due to the std::string concatenations and snprintf()'s.

One other memory test we can add is just adding a malloc() somewhere in setup() to force the available heap to say 70KB (leaving max alloc high at 100) and see how that operates.

@proddy
Copy link
Contributor Author

proddy commented Jan 21, 2023

I'll also set OTA to be disabled as Default to save 6KB heap&alloc buffer. Any issues with this @MichaelDvP ?

@MichaelDvP
Copy link
Contributor

I see no issues, imho most people use web upload, if someone really want to use OTA he can enable it. I think we can also set this as default in v3.5.

@proddy
Copy link
Contributor Author

proddy commented Jan 22, 2023

I see no issues, imho most people use web upload, if someone really want to use OTA he can enable it. I think we can also set this as default in v3.5.

added to 3.5.0-dev-16

@proddy
Copy link
Contributor Author

proddy commented Feb 6, 2023

@MichaelDvP I've implemented #911 (show a message in WebUI when there are unsaved changes) and flash is up to 99.8% on 4MB which breaks the OTA (not enough space). I haven't looked into it yet but it should fit the partition?

@MichaelDvP
Copy link
Contributor

MichaelDvP commented Feb 6, 2023

The upload checks for filesize, this is always a bit bigger than the reported flash size. I think this the image header, see here: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/app_image_format.html
Maybe change this line

if (Update.begin(fsize)) {
to if (Update.begin()) { then it writes the partition and do not compare partitionsize with filesize..
This shlould also be safe, if new data does not fit in partition this
if (Update.write(data, len) != len) {
will give an error.

@proddy proddy added the enhancement New feature or request label Feb 18, 2023
@proddy
Copy link
Contributor Author

proddy commented Feb 23, 2023

think I'm done here. Profiled the tech-upgrade build to death and the key conclusions are

  • AP, MQTT, MQTT Discovery, Telnet take up little heap memory or cause fragmentation
  • Disabling OTA, mDNS, and switching off ETH (when using Wifi only) are things that give back a huge amount of data back
  • Adding device entities takes up most of the heap and causes a lot of fragmentation but there's nothing I code to make this more efficient.

@proddy
Copy link
Contributor Author

proddy commented Feb 23, 2023

In the tech-upgrade build I made some other modifications along the way, half of which I've forgotten with my Covid brain fog. However one worth noticing is that I removed DEBUG from the log. This is only for coders who want to debug when building (with -DEMSESP_DEBUG flag) and not part of the production build.

@proddy
Copy link
Contributor Author

proddy commented Feb 24, 2023

and @MichaelDvP i've added all your changes from your dev branch to tech-upgrade. I'll keep testing for a few days more and then, if you agree, push it to dev as 3.6.0-dev-0

@proddy proddy removed the enhancement New feature or request label Feb 24, 2023
@proddy proddy removed this from the v3.6.0 milestone May 25, 2023
@proddy proddy closed this as completed Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
technical Technical enhancement, or tech-debt issue
Projects
None yet
Development

No branches or pull requests

3 participants