-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wifi issues -never ending story- go back to non event based wifi? #1302
Comments
I really can't speak to this from a programming level but it seems to me from what I have been seeing is that with the exception of the static ip address thing is that when setting up a "brand new" unit the wifi seems to work fine. I have not seen any connection issues with "fresh" installs with the latest firmwares. Web pages load fast and the entire things seems fast and responsive. Its when you try to upgrade is when most of these issues seem to be happening. Seems like there is a corruption issue when upgrading to a newer firmware. I also notice that it seems to be a lot of user compiled firmwares are having wifi issues. Just from reading thru all these issue posts I get that impression. I could be completely wrong about that though. I am not trying to say that as a fact, but just a possibility. I can't speak to MQTT because I don't use it. Just my 2 cents worth..... |
If you are leaning towards option 3 I support you fully. I'd hate to see us drop the improvements your event based WiFi has given us. Core 2_4_x might be easier to revert/go to up stream? |
From the perspective of a user: |
Dont forget, core 2.4.x fixes some problems: |
As I see it, core 2_4_x will happen but maybe not necessary as of right now. We did a bad decision when we went ahead with core update and wifi event based approach at the same time. We should have made them one after another. When we then, at the same time, had an update in the global settings the problem got exceptionally hard to pinpoint. I strongly support the idea of going back to 2_3_0 during the fix of wifi stability + fix of settings corruption. After that we can hopefully release the v2.1.0 and then focus on getting core 2_4_x stable for v2.2.0 |
After clearing the settings and uploading the version from 22.04. So far everything is working. At least for now :) Only free memory is not enough, even in NORMAL. We'll see how it will go on. |
I have to agree with @Budman1758 and @melwinek : I also found that starting from a clean unit there are no problems at all with Wifi, static IP and settings. |
I guess we should not forget that officially we're still in the process of going from stable R120 to stable 2.1.0 and settings will not be converted between these two releases making you need to start from scratch anyway. What we did with the update of core 2_4_x was to make a "break point" yet again. If we can live with that then its not a problem. I agree that a clean install is really stable (at least on NORMAL, which I test most frequently). And NORMAL is the only part which will actually be in the release, test and dev is only in the development nightly release anyway. |
What I mean is: if the current developed firmware works and is stable on a clean setup, then it means that there is nothing wrong with it. I wouldn't go back to 2.3 or to old wifi. |
Yes I hear you and I kinda agree. The only thing is that we create another break point which I guess is okay since it's still beta. |
Although it is a step back that I do not like, I'm afraid that it would be really better to go back to core 2_3_0 for now as I think some strange issues may happen due to lack of free memory on 2_4_0. |
@giig1967g I agree with you there. I do believe there are some corruption issues going on though. Might be whats screwing up the wifi vs its having a lot of inherent problems. |
There are still options to get memory usage to an acceptable point. I will think about it today, what we should do, so please add more suggestions/arguments :) |
@TD-er You're right with SWITCH. |
It is also handling stuff very specific to MQTT and/or Domoticz. That should not be part of the plugin. |
Did you take a look, how wifi is realized in other projects (eg tasmota)? About memory: i told you 😄 If possible, the core should first be cleaned from that rarely used features (or transformed to a plugin) and then optimized. Also, one could think about additional interfaces for plugins, to allow swapping more functionality outside of the core. |
@M0ebiu5 Agree. And one thing I learned is to ask twice about what is observed, what should be observed and what version is used. That will make things a lot more clear and lead to less mistakes. Also plugins should be just plugins to interface a sensor to some output values.
But such a redesign will take quite some effort. |
@TD-er you are right, but i would make the changes in small steps - cause most parts are working stable and big changes could put this stability at risk. New interfaces to the core are one possible way. They will not influence the current behavior and only new or heavily changed plugins will use them. It will take more time to transform to a clean architecture, but with a lower risk and the effort will also be spread over time. |
I agree that these changes should be done at ease. |
However, node from 22.04 has lost the connection. |
Nope, last night I saw the problem (in the code and happening at my own units). |
related: #1064 |
I just flashed 6 devices with the current version ESP_Easy_mega-20180425_test_ESP8266_4096.bin. I think with this version we have reached an absolute low point. |
Very strange indeed, since I wonder how you can see the web page at all with such IP config. |
no, connected via network directly. ping and http works without issues, speedy, (see the client IP is 10.0.0.10 which is my laptop, internal net is 10.0.0.0/16)... yes, quite strange though.. could it be related to the fact, that I only enabled one controller (FHEM) and no MQTT enabled controller? I saw a lot of mqtt code in the sources, outside of the controller plugin... just a guess... but I think it's not really related... |
Maybe dhcp expired and not renewed? |
could very well be... probably the network stack still has an active IP but if renew fails the coresponding config gets zeroed... not sure how the DHCP code works though, but it could explain the state I0m seeing. also I found that when the server (in my case fhem) is not responding fast enough a numebr of times, the units start to reboot after some time. could be a problem in the plugin code or the underlying tcp stack... I did some performance tweaking on the server, since then the units run much more stable (som ehave uptimes over 48h now).. |
I've already seen 10+ days of connection-uptimes (uptime without any reconnect) with the latest versions. |
Today one node stop responding again. He disconnected from wifi. I could not connect to the "esp" network. He stopped sending data to the controller. I had to reboot him. Maybe a watchdog would be a good solution. If, for example, an hour is disconnected from the wifi, it reboots. Or maybe it can be done with rules, but I do not know how :) |
Today I experienced a lot of Watchdog actions while debugging a plugin. Is it possible that hanging node of yours was never rebooted after flashing? (press reset or power cycle) |
It is possible that there was no reboot after flashing. But it was a flashing via www, not a serial. |
OK, then it shouldn't matter, if you flashed OTA. |
Well, after struggling many stability issues and strange wifi troubles with latest firmware releases I had to get back to earlier versions in the end. For instance, until power outage happened recently, one old ESP12E node with mega-20180311dev was working for 70 days, sending temperature data to ThingSpeak. |
There is not much anyone can do if the issues are not reproduceable reliably. |
I know but I prefer a stable node without scheduled reboots. I don't know if the stability was significantly decreased due switching to core 2.4.1 (maybe which is not mature enough yet) or if it's related to ESP Easy redesign but it happened despite the maximal effort of all ESP Easy contributors. I really appreciate the hard work all of you but currently I can't use the latest ESP Easy releases anymore. |
I think it is also related to the used plugin or maybe combination of plugins. Last week I worked on looking into the effects of timings and I am sure it will have significant effect on time-critical tasks. I just looked at some of my nodes, all running official builds: Binary filename ESP_Easy_mega-20180513_normal_ESP8266_4096.bin Unit 3
Unit 5
Unit 6
Binary filename ESP_Easy_mega-20180619_test_ESP8266_4096.bin Unit 7
About 6 days ago, I had some issues with one of my WiFi accesspoints, which I had to restart. Unit 6 is connected to the same as unit 3 & 7, but it has a lot more reconnects. The only difference between them is that the one with more reconnects has the Senseair sensor. Could you give a list of plugins used? |
I am always using the official builds as I am not able to prepare and maintain the developping environment for these devices. |
Dont know if it is wifi issue - it not look like it, I have managed to set static IP adresses for wifi, but espeasy still fetch it by dhcp and set different.
|
@uzi18 Have you set all fields for static IP config? If so, then I am afraid it is a known issue (to me), where there is some previous session stored in a region where we don't (yet) erase data at a factory reset. |
@TD-er Yes, all data filled - as you see in log. |
@TD-er: Two thoughts on that:
|
As being described a few times and a screenshot shown here: letscontrolit#1302 (comment) It looks like a DHCP request may fail resulting in a cleared IP setup. The web server then still replies to requests, but no new connections can be made then. This patch should detect such a situation and then reset the wifi and make a new connection.
Would be nice if someone with rather unstable wifi could test this PR: #1562 |
@TD-er just stumbled upon esp8266/Arduino#4718 |
I always use the latest GIT Version from the esp8266.. that's why I probably don't see the 0.0.0.0 issue anymore... |
Yesterday again one node after the restart of the router lost contact with the network. To facilitate this in the future, I modified the rules:
Now life will be simpler :)) |
I do reboot every 24h. This revives one node with a firmware from 4 weeks ago twice a week. |
Is this still an issue? If so please reopen. |
Our longest thread on the issue list.... |
As a lot of you have noticed the last few weeks, there have been lots of issues with the wifi.
This all started when I changed the way wifi operates to be event based.
Some of these errors are core version related, and update to core 2.4.0 does introduce lots of other issues.
And then there is the problem with corrupted settings what was also in this period. That wasn't related to the wifi event based connect, but it made me look for a lot of other issues that were not really issues at all but just corrupted settings.
So at the moment the wifi state machine I wrote is overly complex due to the many fixes that were no fixes, because things were not broken.
And still there are other real issues, either caused by core 2.4.0 or still open wifi issues.
So now we have to choose:
Core 2.3.0 does seem to give a lot less issues and leaves more free memory.
So I guess that's my preferred base.
This means that for event based wifi, there is still some issue with respect to loading the setup page when initial config is needed.
Anyway, this has to stop now and get stable again.
There are currently way too many issues at hand that are quite hard to see as separate issues.
Any other suggestion?
The text was updated successfully, but these errors were encountered: