Website breaks while ahoy keeps running #660

Argafal · 2023-02-07T11:28:25Z

Running 0.5.80 on an ESP8266. This problem was also present in 0.5.78 (and maybe before).

Error description: the website stops working. There is no obvious user interaction or inverter condition that I can identify as a possible cause of the problem. The problem shows like this:

MQTT data keeps coming, so AHOY seems to be running just fine. I do not get any error messages related to the webserver on the serial console. This might be related to #645, however in #645 it is reported that the website is not reachable at all, whereas I still see parts of it.

Argafal · 2023-02-07T13:31:18Z

MQTT has stopped reported production values now and only reports uptime. Here are two pages of recent serial output. Right at half way through the log the pattern of messages changes, related? I don't know the ahoy code well enough to understand what causes the change in output.

I have now rebooted ahoy.

I: Payload (62): 00 01 01 63 01 43 01 3B 04 7B 04 5E 00 04 65 72 00 03 F8 15 01 1E 01 19 01 70 01 BB 01 F2 06 5B 07 26 00 04 75 6A 00 04 7C 83 01 64 01 8F 09 4D 13 8A 15 3B 00 DD 00 E4 03 E7 00 F7 00 02
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch61 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 43 D7 00 00 00 01 00 00 00 00 DB 0D 45
W: (#2) nothing received
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch75 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 43 F5 00 00 00 01 00 00 00 00 7A 8D 5A
W: (#0) nothing received
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch3 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 44 13 00 00 00 02 00 00 00 00 C4 9A 1E
RX whatHappened Ch61 001
RX whatHappened Ch61 001
RX whatHappened Ch40 001
RX whatHappened Ch40 001
I: RX 27B Ch61 | 95 74 40 35 30 74 40 35 30 01 00 01 01 64 01 3F 01 37 04 6D 04 51 00 04 65 75 D0
I: RX 27B Ch61 | 95 74 40 35 30 74 40 35 30 02 00 03 F8 18 01 21 01 1C 01 70 01 B5 01 EC 06 47 20
I: RX 27B Ch40 | 95 74 40 35 30 74 40 35 30 03 07 12 00 04 75 6E 00 04 7C 87 01 68 01 93 09 4B DA
I: RX 27B Ch40 | 95 74 40 35 30 74 40 35 30 84 13 8A 14 FD 00 DC 00 E2 03 E7 00 F6 00 02 AC 21 C2
I: procPyld: cmd:  0xb
I: procPyld: txid: 0x95
I: Payload (62): 00 01 01 64 01 3F 01 37 04 6D 04 51 00 04 65 75 00 03 F8 18 01 21 01 1C 01 70 01 B5 01 EC 06 47 07 12 00 04 75 6E 00 04 7C 87 01 68 01 93 09 4B 13 8A 14 FD 00 DC 00 E2 03 E7 00 F6 00 02
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch23 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 44 31 00 00 00 01 00 00 00 00 65 5E 49
W: (#2) nothing received
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch40 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 44 4F 00 00 00 01 00 00 00 00 C7 77 A0
W: (#0) nothing received
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch61 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 44 6D 00 00 00 02 00 00 00 00 66 B3 EB
RX whatHappened Ch23 001
RX whatHappened Ch3 001
RX whatHappened Ch3 001
RX whatHappened Ch3 001
I: RX 27B Ch23 | 95 74 40 35 30 74 40 35 30 01 00 01 01 63 01 40 01 38 04 71 04 54 00 04 65 78 B3
I: RX 27B Ch3 | 95 74 40 35 30 74 40 35 30 02 00 03 F8 1B 01 24 01 1F 01 70 01 B6 01 ED 06 4B 2B
I: RX 27B Ch3 | 95 74 40 35 30 74 40 35 30 03 07 16 00 04 75 72 00 04 7C 8B 01 6C 01 97 09 4B CE
I: RX 27B Ch3 | 95 74 40 35 30 74 40 35 30 84 13 87 15 0B 00 DC 00 E2 03 E7 00 F7 00 02 45 E0 11
I: procPyld: cmd:  0xb
I: procPyld: txid: 0x95
I: Payload (62): 00 01 01 63 01 40 01 38 04 71 04 54 00 04 65 78 00 03 F8 1B 01 24 01 1F 01 70 01 B6 01 ED 06 4B 07 16 00 04 75 72 00 04 7C 8B 01 6C 01 97 09 4B 13 87 15 0B 00 DC 00 E2 03 E7 00 F7 00 02
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch75 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 44 8B 00 00 00 01 00 00 00 00 02 15 DF
W: (#2) nothing received
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch3 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 44 A9 00 00 00 01 00 00 00 00 A3 95 C0
W: (#0) nothing received
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch23 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 44 C7 00 00 00 02 00 00 00 00 C1 35 60
RX whatHappened Ch61 001
RX whatHappened Ch61 001
RX whatHappened Ch40 001
I: RX 27B Ch61 | 95 74 40 35 30 74 40 35 30 02 00 03 F8 1D 01 26 01 21 01 70 01 B6 01 ED 06 48 12
I: RX 27B Ch61 | 95 74 40 35 30 74 40 35 30 03 07 14 00 04 75 76 00 04 7C 90 01 70 01 9C 09 4E C1
I: RX 27B Ch40 | 95 74 40 35 30 74 40 35 30 84 13 89 14 F4 00 E2 00 E1 03 E7 00 F7 00 02 60 50 49
W: Frame 1 missing: Request Retransmit
I: TX 11B Ch40 | 15 74 40 35 30 86 99 51 75 81 9E
RX whatHappened Ch75 001
I: RX 27B Ch75 | 95 74 40 35 30 74 40 35 30 01 00 01 01 62 01 3C 01 38 04 60 04 52 00 04 65 7A DB
I: procPyld: cmd:  0xb
I: procPyld: txid: 0x95
I: Payload (62): 00 01 01 62 01 3C 01 38 04 60 04 52 00 04 65 7A 00 03 F8 1D 01 26 01 21 01 70 01 B6 01 ED 06 48 07 14 00 04 75 76 00 04 7C 90 01 70 01 9C 09 4E 13 89 14 F4 00 E2 00 E1 03 E7 00 F7 00 02
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch61 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 44 E5 00 00 00 01 00 00 00 00 60 F1 37
W: (#2) nothing received
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch75 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 45 03 00 00 00 01 00 00 00 00 C1 42 DE
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch3 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 45 21 00 00 00 02 00 00 00 00 60 86 95
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 50)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch23 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 45 3F 00 00 00 01 00 00 00 00 C0 43 FE
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch40 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 45 5D 00 00 00 01 00 00 00 00 A2 F2 53
I: enqueued cmd failed/timeout
I: (#1) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch61 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 45 7B 00 00 00 02 00 00 00 00 C3 04 EE
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch75 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 45 99 00 00 00 01 00 00 00 00 67 90 2C
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch3 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 45 B7 00 00 00 01 00 00 00 00 C6 45 6A
I: enqueued cmd failed/timeout
I: (#1) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch23 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 45 D5 00 00 00 02 00 00 00 00 A4 B0 93
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch40 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 45 F3 00 00 00 01 00 00 00 00 C5 46 32
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch61 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 46 11 00 00 00 01 00 00 00 00 6E 66 44
I: enqueued cmd failed/timeout
I: (#1) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket
I: TX 27B Ch75 | 15 74 40 35 30 86 99 51 75 80 0B 00 63 E2 46 2F 00 00 00 02 00 00 00 00 0F 3A 4B
I: enqueued cmd failed/timeout
I: (#2) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 2
I: (#2) Requesting Inv SN 1161XXX4
I: (#2) enqueuedCmd: 0xb
I: (#2) sendTimePacket
I: TX 27B Ch3 | 15 74 40 42 54 86 99 51 75 80 0B 00 63 E2 46 4D 00 00 00 01 00 00 00 00 6D CF AE
I: enqueued cmd failed/timeout
I: (#0) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 0
I: (#0) Requesting Inv SN 1141XXX9
I: (#0) enqueuedCmd: 0xb
I: (#0) sendTimePacket
I: TX 27B Ch23 | 15 72 22 17 79 86 99 51 75 80 0B 00 63 E2 46 6B 00 00 00 01 00 00 00 00 0C 7D 47
I: enqueued cmd failed/timeout
I: (#1) I: no Payload received! (retransmits: 0)
I: resetPayload: id: 1
I: (#1) Requesting Inv SN 1161XXX0
I: (#1) enqueuedCmd: 0xb
I: (#1) sendTimePacket

lumapu · 2023-02-07T20:51:54Z

I think we have some kind of memory problem. The website issue comes up because the API only returns null. There is no recover from this state and happens to me from time to time.
The log shows that there is data received RX but suddenly there will be no new data.

Argafal · 2023-02-07T22:01:57Z

Yes, I agree. I documented an OOM in issue #644, may or may not be related.

How can I help to narrow this down further? Is building the esp8266-debug target the right way of getting more debug output? (Building this target currently fails for me, so I can't try it yet, see issue #653)

lumapu · 2023-02-07T23:02:33Z

Saw both of the related issues. Don't know about that, maybe digging deeper the next days.
I try to reduce code which dynamically allocates memory, so yesterday I removed another new. I think it cannot be solved with one commit, it's more like a process over several of commits.
You can try to review the code and check for memory leaks. Maybe you find some section where the code can be optimized to reduce memory.
We use several subroutines which enlarges the stack once they're called. Maybe we have to think about a desing which is more flat without a deep hierachy.

beegee3 · 2023-02-08T12:28:28Z

@Argafal in short:
dont't know if it is related to the problem, but I saw two strange things in your log file:

in the first half of the log inverter no. 1 sends a complete payload (no frame missing), what should lead to an output on MQTT or the web page. But next line is "enqueued cmd failed/timeout". I'm wondering, how this can happen? I don't see this possibility in the program code
you set the number of retransmits to 50. This is too much. In particular combined with short intervals. Retransmits are for the case, that frames are missing and are send immediately to the inverter. With a number of 50 you may flood the inverter and it will stop answering for some time. Hence a retransmit value of 5 should be sufficient and perhaps an interval value not lower than 10 s. The latter means that you should get updates from all of your 3 inverters within 30 seconds. If you need a lower interval, please decrease it starting with 10 and take a look into the serial log to see if it's going worse.

Argafal · 2023-02-08T16:44:18Z

@beegee3 Much appreciate that you commented on the log file. The first point you make is most curious. Maybe @lumapu understands it better?

About the second point, I recently changed retransmits from 5 to 50. The reason was: I noticed that if any inverter didn't come online within the first 10-15 minutes after boot it never seems to come online. The only thing that helps is a reboot. I assumed that this means 5 connection attempts failed and ahoy has permanently given up on that inverter. Increasing the number of attempts to infinite (or in my case 50) seemed the smart thing to do: I know the inverter is there and producing, thus: "Ahoy, please keep on trying." Do I misunderstand what retransmits actually does/means?

stefan123t · 2023-02-08T19:11:37Z

@beegee3 your point 1. is the same I was about to make after looking thoroughly at the log file.
The inverter sent a complete payload and Ahoy should process that

iv->addValue(i, payload, rec); 
...
iv->doCalculations();
notify(mPayload[iv->id].txCmd);
...
iv->setQueuedCmdFinished();

decode and add the values to the internal inverter structure using memcpy
do some calculations on various composite metrics
and notify other parts of the code about the changes it received and processed.

But then instead of finishing the enqueuedCmd with success,
it complains about the command being failed/timeout ?

See here:
https://github.com/lumapu/ahoy/blob/development03/src/hm/hmPayload.h#L261-L322

@Argafal regarding Retransmits & Retries, it has been shown in the earliest HackRF traces between Hoymiles DTU WLite and HM-600 inverters that the NRF24L01+ in the inverter is only sending 15 Retries of the same packet in sequence.
But I guess you have modified the Retransmits, which is the number of times the DTU will send another 0x8N packet to retrieve the Nth packet of the complete message. So here you will remain in the same Send loop for 50 iterations and try on receiving any of the up to 4 packets each complete message / payload can be made up.
According to the logfiles you posted via the Discord / Pastebin I have seen that one of your inverters has some difficulty getting the message over without Retransmits.
I am a bit indifferent whether this is a smart / bad thing to do, if it helps 👍
However there might be difficulties in the order of the packets that the Hoymiles inverter expects in a new communication with the DTU. We may not have the correct order yet, so this may contribute to the success of reaching the inverter.
In the past we used 0x0B RealTimeRunData_Debug (4 packets) first and now we ask for 0x01 InverterDevInform_Simple (should be only 1) in the beginning.

tastendruecker123 · 2023-02-08T22:40:39Z

I have experienced this as well. For what it's worth, I was able to reboot ahoy remotely by calling /reboot in this state. Since it does look like a memory allocation issue, perhaps we should add an automatic reboot as a temporary workaround. Maybe the httpd portion of the code could keep an eye on its own responses and trigger a reboot if needed.

beegee3 · 2023-02-08T22:47:06Z

@Argafal @stefan123t delving deeper into the programming I found the explanation for 1.:
the timeout message belongs to the next inverter (no. 2). It's here: https://github.com/lumapu/ahoy/blob/development03/src/hm/hmPayload.h#L125-L129
Inverter no. 1 enqueuedCmd ended successfully and the process on inverter no. 2 started.
This one hasn't answered before (cf. a few lines above "(# 2) nothing received").

The timeout message is confusing because of the missing inverter no. and some hint like "last time ...",
but I think @lumapu liked to replace lines 125 and 126 with 127 - 129 and just forgot to delete the first two lines 😆

Sorry for wasting your time 😒

stefan123t · 2023-02-08T23:28:09Z

@beegee3 @Argafal I gave the timestamps in the sendTimePacket(s) some more attention:

# 63 E2 43 D7
$ date --date="@$(echo 'ibase=16; 63E243D7'|bc)" '+%F %H:%M:%S'
2023-02-07 13:28:07
# 63 E2 44 E5
$ date --date="@$(echo 'ibase=16; 63E244E5'|bc)" '+%F %H:%M:%S'
2023-02-07 13:32:37

But this seems to have been somewhere around lunch time, so the inverters should have eventually received enough sunshine.

Argafal · 2023-02-09T09:37:07Z

I can rule out to little sun light as a part of the problem. Super sunny the last two days.

However, I think you are onto something with the retransmits/intervals. With retransmits=5, interval=30s I waited a long time for the inverters and sometimes never caught them. With retransmits=50, interval=30s it was probably the same. Maybe worse, hard to measure. I now use retransmits=5, interval=10s, and boy that seems to have made for a big improvement. Inverters are now found much more reliably and quickly upon boot-up. Is there a reason to have a 30s default? @knickohr what is your interval time?

Related question: does ahoy ever give up on an inverter entirely, and if so, when?

On the negative side, whilst I now quickly find the inverters the webserver breaks away much more quickly than before. I don't make it through even a few minutes now before the website becomes an empty template (screenshot above). If there is anything to test or any additional debug to activate to narrow the issue down, I now seem to reproduce the problem easily.

Argafal · 2023-02-09T13:09:40Z

Additional symptom: I notice that the website occasionally recovers after a minute or two.

knickohr · 2023-02-09T13:14:07Z

I‘m still at default 30s interval time.

beegee3 · 2023-02-09T15:20:02Z

@Argafal interval describes the time period between two requests. Since a loop iterates over the inverters it needs
(number of inverters) * interval seconds that the same inverter is requested again (with your current setting 3 * 10 = 30 seconds).
The default setting interval=30 comes from older times where the logic was different in a way.
retransmits is the maximum number of single requests if a frame is missing. That is the inverter has a counter in his answers (=frames) starting with 1. In the last frame 0x80 is added to the counter. That way it is easy to detect missing frames. These will be requested again immediately (up to x times with x = your retransmits value).
If there is no answer at all there is no retransmit.

lumapu · 2023-02-15T19:23:11Z

@Argafal please check again with latest version 0.5.89

Argafal · 2023-02-17T14:06:10Z

I confirm that this issue still occurs with 0.5.89.

tastendruecker123 · 2023-02-17T14:15:53Z

@Argafal How are you using Ahoy? Is more than one client accessing the website (maybe browser + a script collecting values from the API) and are you writing limit to the inverter using either MQTT or the API?

Argafal · 2023-02-17T14:29:13Z

AHOY sends inverter data on MQTT, whenever available. No interval specified.
I do not set any settings/limits via MQTT.
I occasionally refresh the ahoy website for the purpose of our current debugging.
I have exactly one tab open. I refresh a single time and wait for the answer before sending the next refresh. Does my refresh maybe sometimes coincide with the automatic refresh of the ahoy webserver homepage, which causes twice the load?
I have enabled serial output and read the serial log.
I did not enable further debugging when building.

tastendruecker123 · 2023-02-17T15:09:28Z

I have run into the same issue with a single inverter and multiple clients, but you don't seem to be hammering the website nearly as much as I was at the time.

It might not be a memory leak as such, but more an issue with memory fragmentation that can occur if one is creating and destroying larger String objects. Dynamically allocating and releasing blocks of memory will lead to 'holes' in the heap that are simply too small the accommodate the Strings that you're trying to put in there. I'm guessing that's also why the web server is returning 'null'. It can't allocate a large enough block of memory to hold the response. That would also explain why the site sometimes receivers, if a larger block of memory is released elsewhere.

Quick primer on the subject:
https://cpp4arduino.com/2018/11/06/what-is-heap-fragmentation.html

To diagnose the problem, the same site links to a code fragment that's able to check the size of the largest available block of memory and calculate the level of fragmentation on and ESP8266. It may be wise to include this piece of code to get a better idea of what's causing the issue.

As a 'quick fix' that would be very quick to implement, we may want to check whether it's possible to allocate a large enough block of memory and reset the micro if that's not the case.

With a bit more time and effort we could use the String.reserve() function for suspicious Strings to avoid much of the dynamic memory allocation.

To properly fix the problem permanently, we may have to avoid dynamically allocated String altogether.

beegee3 · 2023-02-17T17:24:22Z

@tastendruecker123: the latest version 0.5.89 worked on the problem you describe. It seems to be fine now🤞

@lumapu: because of @Argafal's comment

I have enabled serial output and read the serial log

I looked at the code for creating the serial log output (web.h). Maybe there is a possible issue in the function serialCb where the serial buffer mSerialBuf of size WEB_SERIAL_BUF_SIZE is filled. There a two lines if((...) <= WEB_SERIAL_BUF_SIZE) {...} to avoid overflow errors. So it is possible that the buffer can be filled completely and does not contain a '0' indicating the end of the buffer, i.e. an overflow error occurs on sending it to the website.
In summary the two if clauses should be corrected to if((...) < WEB_SERIAL_BUF_SIZE) {...} (delete the = sign).

Argafal · 2023-02-18T16:11:27Z

I have changed the serial buffer check as outlined by @beegee3 above, in two places. Code base 0.5.89.
I changed from a computer USB to a USB power supply to see if power could be an issue.
The webserver still dies. MQTT continues. Same symptoms as before.
I think the serial buffer change is still a good fix to include. Maybe it helps with Memory leak - Out of Memory Exception #644, for example.
What happens if serial debug is activated but no one is listening? Does a buffer fill up? Or do messages get discarded?

lumapu · 2023-02-19T10:02:26Z

the messages at discarded or not collected. I don't think the serial buffer will lead to a problem

niekniek89 · 2023-02-22T09:28:17Z

I have the same problem as mentioned above.
I'm on the latest developer version.
after +- 24 hours of working properly, a reboot takes place.
Before that, the web server is half/unreachable.

reduced maximum allowed JSON size for API to 6000Bytes #660 small fix: output command at `prepareDevInformCmd` #692 improved inverter handling for MQTT #671

lumapu · 2023-03-09T21:46:11Z

should be solved with current dev version eg. 0.5.96

Argafal · 2023-03-09T22:29:49Z

So cool, well done. I'd love to hear what the solution was in the end. Do you care to share? :)

lumapu · 2023-03-09T23:24:58Z

main issue were strtok functions. It generates huge heap fragmentation which leads to the problems.
Never thought that before, functionality is now the same but working around strtok

Argafal · 2023-03-10T08:49:17Z

Great work, @lumapu!

Argafal · 2023-03-25T11:21:34Z

Problem ist scheinbar doch wieder aufgetaucht.... Schade.

Könnten Betroffene (und auch Nicht-Betroffene) bitte kurz diese Fragen beantworten:

ESP8266 oder ESP32?
Version?
Wie viele Inverter und welche?
Inverter-Abfrage Interval und Retries aus den Einstellungen? (Default: 30s und 5 retries)
MQTT Server konfiguriert: ja/nein?
Falls ja, auf welcher Hardware läuft der MQTT Server? (z.B. richtiger Rechner? oder Raspberry Pi? welche Variante?)
MQTT-Interval aus den Einstellungen? (Default: 0)
WebUI aktiv benutzt: ja/nein?
Typische heap_fragmentation Wert nach 20 Minuten Laufzeit? Siehe /system
Regelmässige Probleme dass die WebUI kaputt geht, ja/nein? (siehe screenshot weiter oben)

theunknownstarman · 2023-03-25T17:31:35Z

ESP8266
0.5.92
1 Hoymiles 1500
30 Sekunden
MQTT: ja
Homeassistant
MQTT Default auf "0"
WEB UI aktiv
10

Absturz wäre der falsche Begriff:
WLAN Verbindung für WEB UI geht nicht mehr (MQTT Datentransport funktioniert weiterhin).
Nach WLAN reconnect (via WLAN Überwachung wird der AHOY zum reconnect gezwungen) ist WEB UI wieder normal bis zum nächsten "Problem" erreichbar.

Meiner Einschätzung nach existiert ein Problem mit dem Port des Webservers der UI.

This was referenced Feb 7, 2023

2W at night #575

Closed

Web Interface tot aber MQTT funkt Daten #557

Closed

This was referenced Feb 20, 2023

Improve version information #701

Closed

Version 0.5.66 MQTT logout #702

Closed

lumapu mentioned this issue Feb 24, 2023

HW Watchdog und Exceptions #564

Closed

lumapu added a commit that referenced this issue Feb 25, 2023

improved html and navi, navi is visible even when API dies #660

4f0d365

reduced maximum allowed JSON size for API to 6000Bytes #660 small fix: output command at `prepareDevInformCmd` #692 improved inverter handling for MQTT #671

This was referenced Mar 3, 2023

[ESP8266] WDT reset issue #392

Closed

Memory leak - Out of Memory Exception #644

Closed

lumapu added the fixed dev fixed label Mar 9, 2023

lumapu self-assigned this Mar 9, 2023

niekniek89 mentioned this issue Mar 10, 2023

ahoydtu regularly unavailable #680

Closed

Argafal closed this as completed Mar 10, 2023

Argafal mentioned this issue Mar 13, 2023

Bug Offical 5.66 - WEB Oberfläche nicht mehr erreichbar #765

Closed

1 task

Argafal mentioned this issue Mar 25, 2023

High Heap Fragmentation with 2 inverters #768

Closed

Argafal reopened this Mar 25, 2023

lumapu closed this as completed in 9ef2df2 Mar 27, 2023

Argafal mentioned this issue Mar 29, 2023

Webgui almost empty after update, instable #828

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Website breaks while ahoy keeps running #660

Website breaks while ahoy keeps running #660

Argafal commented Feb 7, 2023

Argafal commented Feb 7, 2023 •

edited

Loading

lumapu commented Feb 7, 2023

Argafal commented Feb 7, 2023 •

edited

Loading

lumapu commented Feb 7, 2023

beegee3 commented Feb 8, 2023

Argafal commented Feb 8, 2023

stefan123t commented Feb 8, 2023

tastendruecker123 commented Feb 8, 2023

beegee3 commented Feb 8, 2023

stefan123t commented Feb 8, 2023

Argafal commented Feb 9, 2023

Argafal commented Feb 9, 2023

knickohr commented Feb 9, 2023

beegee3 commented Feb 9, 2023

lumapu commented Feb 15, 2023

Argafal commented Feb 17, 2023

tastendruecker123 commented Feb 17, 2023

Argafal commented Feb 17, 2023 •

edited

Loading

tastendruecker123 commented Feb 17, 2023

beegee3 commented Feb 17, 2023

Argafal commented Feb 18, 2023 •

edited

Loading

lumapu commented Feb 19, 2023

niekniek89 commented Feb 22, 2023

lumapu commented Mar 9, 2023

Argafal commented Mar 9, 2023

lumapu commented Mar 9, 2023

Argafal commented Mar 10, 2023

Argafal commented Mar 25, 2023 •

edited

Loading

theunknownstarman commented Mar 25, 2023 •

edited

Loading

Website breaks while ahoy keeps running #660

Website breaks while ahoy keeps running #660

Comments

Argafal commented Feb 7, 2023

Argafal commented Feb 7, 2023 • edited Loading

lumapu commented Feb 7, 2023

Argafal commented Feb 7, 2023 • edited Loading

lumapu commented Feb 7, 2023

beegee3 commented Feb 8, 2023

Argafal commented Feb 8, 2023

stefan123t commented Feb 8, 2023

tastendruecker123 commented Feb 8, 2023

beegee3 commented Feb 8, 2023

stefan123t commented Feb 8, 2023

Argafal commented Feb 9, 2023

Argafal commented Feb 9, 2023

knickohr commented Feb 9, 2023

beegee3 commented Feb 9, 2023

lumapu commented Feb 15, 2023

Argafal commented Feb 17, 2023

tastendruecker123 commented Feb 17, 2023

Argafal commented Feb 17, 2023 • edited Loading

tastendruecker123 commented Feb 17, 2023

beegee3 commented Feb 17, 2023

Argafal commented Feb 18, 2023 • edited Loading

lumapu commented Feb 19, 2023

niekniek89 commented Feb 22, 2023

lumapu commented Mar 9, 2023

Argafal commented Mar 9, 2023

lumapu commented Mar 9, 2023

Argafal commented Mar 10, 2023

Argafal commented Mar 25, 2023 • edited Loading

theunknownstarman commented Mar 25, 2023 • edited Loading

Argafal commented Feb 7, 2023 •

edited

Loading

Argafal commented Feb 7, 2023 •

edited

Loading

Argafal commented Feb 17, 2023 •

edited

Loading

Argafal commented Feb 18, 2023 •

edited

Loading

Argafal commented Mar 25, 2023 •

edited

Loading

theunknownstarman commented Mar 25, 2023 •

edited

Loading