-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Website breaks while ahoy keeps running #660
Comments
MQTT has stopped reported production values now and only reports uptime. Here are two pages of recent serial output. Right at half way through the log the pattern of messages changes, related? I don't know the ahoy code well enough to understand what causes the change in output. I have now rebooted ahoy.
|
I think we have some kind of memory problem. The website issue comes up because the API only returns |
Saw both of the related issues. Don't know about that, maybe digging deeper the next days. |
@Argafal in short:
|
@beegee3 Much appreciate that you commented on the log file. The first point you make is most curious. Maybe @lumapu understands it better? About the second point, I recently changed retransmits from 5 to 50. The reason was: I noticed that if any inverter didn't come online within the first 10-15 minutes after boot it never seems to come online. The only thing that helps is a reboot. I assumed that this means 5 connection attempts failed and ahoy has permanently given up on that inverter. Increasing the number of attempts to infinite (or in my case 50) seemed the smart thing to do: I know the inverter is there and producing, thus: "Ahoy, please keep on trying." Do I misunderstand what retransmits actually does/means? |
@beegee3 your point 1. is the same I was about to make after looking thoroughly at the log file. iv->addValue(i, payload, rec);
...
iv->doCalculations();
notify(mPayload[iv->id].txCmd);
...
iv->setQueuedCmdFinished(); decode and add the values to the internal inverter structure using memcpy But then instead of finishing the enqueuedCmd with success, See here: @Argafal regarding Retransmits & Retries, it has been shown in the earliest HackRF traces between Hoymiles DTU WLite and HM-600 inverters that the NRF24L01+ in the inverter is only sending 15 Retries of the same packet in sequence. |
I have experienced this as well. For what it's worth, I was able to reboot ahoy remotely by calling /reboot in this state. Since it does look like a memory allocation issue, perhaps we should add an automatic reboot as a temporary workaround. Maybe the httpd portion of the code could keep an eye on its own responses and trigger a reboot if needed. |
@Argafal @stefan123t delving deeper into the programming I found the explanation for 1.: The timeout message is confusing because of the missing inverter no. and some hint like "last time ...", Sorry for wasting your time 😒 |
@beegee3 @Argafal I gave the timestamps in the sendTimePacket(s) some more attention: # 63 E2 43 D7
$ date --date="@$(echo 'ibase=16; 63E243D7'|bc)" '+%F %H:%M:%S'
2023-02-07 13:28:07
# 63 E2 44 E5
$ date --date="@$(echo 'ibase=16; 63E244E5'|bc)" '+%F %H:%M:%S'
2023-02-07 13:32:37 But this seems to have been somewhere around lunch time, so the inverters should have eventually received enough sunshine. |
I can rule out to little sun light as a part of the problem. Super sunny the last two days. However, I think you are onto something with the retransmits/intervals. With retransmits=5, interval=30s I waited a long time for the inverters and sometimes never caught them. With retransmits=50, interval=30s it was probably the same. Maybe worse, hard to measure. I now use retransmits=5, interval=10s, and boy that seems to have made for a big improvement. Inverters are now found much more reliably and quickly upon boot-up. Is there a reason to have a 30s default? @knickohr what is your interval time? Related question: does ahoy ever give up on an inverter entirely, and if so, when? On the negative side, whilst I now quickly find the inverters the webserver breaks away much more quickly than before. I don't make it through even a few minutes now before the website becomes an empty template (screenshot above). If there is anything to test or any additional debug to activate to narrow the issue down, I now seem to reproduce the problem easily. |
Additional symptom: I notice that the website occasionally recovers after a minute or two. |
I‘m still at default 30s interval time. |
@Argafal interval describes the time period between two requests. Since a loop iterates over the inverters it needs |
@Argafal please check again with latest version |
I confirm that this issue still occurs with 0.5.89. |
@Argafal How are you using Ahoy? Is more than one client accessing the website (maybe browser + a script collecting values from the API) and are you writing limit to the inverter using either MQTT or the API? |
|
I have run into the same issue with a single inverter and multiple clients, but you don't seem to be hammering the website nearly as much as I was at the time. It might not be a memory leak as such, but more an issue with memory fragmentation that can occur if one is creating and destroying larger String objects. Dynamically allocating and releasing blocks of memory will lead to 'holes' in the heap that are simply too small the accommodate the Strings that you're trying to put in there. I'm guessing that's also why the web server is returning 'null'. It can't allocate a large enough block of memory to hold the response. That would also explain why the site sometimes receivers, if a larger block of memory is released elsewhere. Quick primer on the subject: To diagnose the problem, the same site links to a code fragment that's able to check the size of the largest available block of memory and calculate the level of fragmentation on and ESP8266. It may be wise to include this piece of code to get a better idea of what's causing the issue. As a 'quick fix' that would be very quick to implement, we may want to check whether it's possible to allocate a large enough block of memory and reset the micro if that's not the case. With a bit more time and effort we could use the String.reserve() function for suspicious Strings to avoid much of the dynamic memory allocation. To properly fix the problem permanently, we may have to avoid dynamically allocated String altogether. |
@tastendruecker123: the latest version 0.5.89 worked on the problem you describe. It seems to be fine now🤞 @lumapu: because of @Argafal's comment
I looked at the code for creating the serial log output ( |
|
the messages at discarded or not collected. I don't think the serial buffer will lead to a problem |
I have the same problem as mentioned above. |
should be solved with current dev version eg. |
So cool, well done. I'd love to hear what the solution was in the end. Do you care to share? :) |
main issue were |
Great work, @lumapu! |
Problem ist scheinbar doch wieder aufgetaucht.... Schade. Könnten Betroffene (und auch Nicht-Betroffene) bitte kurz diese Fragen beantworten:
|
Absturz wäre der falsche Begriff: Meiner Einschätzung nach existiert ein Problem mit dem Port des Webservers der UI. |
Running 0.5.80 on an ESP8266. This problem was also present in 0.5.78 (and maybe before).
Error description: the website stops working. There is no obvious user interaction or inverter condition that I can identify as a possible cause of the problem. The problem shows like this:
MQTT data keeps coming, so AHOY seems to be running just fine. I do not get any error messages related to the webserver on the serial console. This might be related to #645, however in #645 it is reported that the website is not reachable at all, whereas I still see parts of it.
The text was updated successfully, but these errors were encountered: