-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WiFi.status() is not reflecting the true state #7432
Comments
For those that may have access to the (closed source) SDK or at least knowledge of what's happening in there. My hypothesis: The enum values somewhat suggest the order of how events should happen: typedef enum {
STATION_IDLE = 0,
STATION_CONNECTING,
STATION_WRONG_PASSWORD,
STATION_NO_AP_FOUND,
STATION_CONNECT_FAIL,
STATION_GOT_IP
} station_status_t; What if the events of Also, different builds of the SDK can introduce some extra delays somewhere. And now for the possible fix. |
Can we get some info/help here? |
There are #6680 and #7391 pending. |
Well, I'm not entirely sure that will be the magic fix also, as it is the closed source part that's reporting the wrong state and perhaps some parts in there also use that wrong state. As a matter of fact, there are more bugs hidden in there, which I'm not yet able to fully detect, but I know they are there. So it is all very good to have a more uniform interface to "network" regardless of the physical interface, but I am afraid it won't fix the WiFi part here as it does appear to have fundamental issues in the closed source part. |
What would be nice is to read the commented firmware output in debug mode.
Because TCP bufferizes.
That can be fixed with the above PRs in which an event is triggered when an IP is assigned. |
But how does LWIP know the IP has assigned? The events do seem to work OK, or at least more reliable compared to the WiFi status. |
The logic is: Link layer (driver) calls a lwIP function when link is up (
It's nonos-sdk. But we can add ours callbacks (open source full control) and use them. What has to be done will be more clear after #6680 is merged (so we can make/fix things for any kind of interface). |
That's a sensible goal :) |
Have noticed the same issue while testing out what happens after the following on a Linux-based AP:
Should the status do the same that the current LwipIntfDev (for the various ethernet devices) does and simply check ip availability? fwiw the PRs above are merged. As a workaround... For example, SDK does send disconnection event pretty reliably e.g. WiFi.persistent(false);
WiFi.setAutoConnect(false);
WiFi.setAutoReconnect(false);
// ... do the setup for the ssid & pass ...
static auto disconnected = WiFi.onStationModeDisconnected([](const auto&) {
notifyDisconnected(); // <-- connection is dead, reconnect
}); By doing the above, I've noticed that station IP settings become un-set (localIP(), gatewayIP(), subnetMask(), but not dnsIP() for some reason) and the associated lwip's Also re. above, NONOS pdf specifically mentions that wifi_station_get_connect_status depends on the event system & autoreconnect / reconnection policy setting, and it seems it simply tracks the latest event it understands, but only for the connection routine so we should only track it when there is an actual connection in progress. And without disabling reconnection policy, SDK would try to be helpful with reconnections in the background, so any manual setup / loop that checks connectivity needs to disable both of those. |
I was just about to post this same issue and then found this post. This is still an issue even in 3.0.2. The workaround I came up with is simple, and similar to what @mcspr proposed above: void OnWiFiDisconnectedEvent(const WiFiEventStationModeDisconnected & event)
{
(void)event;
WiFi.disconnect();
}
WiFi.onStationModeDisconnected(OnWiFiDisconnectedEvent); Regardless of the inner workings of the core libraries, this makes isConnected and getStatus correctly return the connection loss! This particular bug has been a huge annoyance and significant time-waster to me before I finally isolated and tracked it down the other day.. and now I have a couple hundred WiFi lights to update. A band-aid would have been greatly preferable in my opinion -- i would never even have noticed the issue. |
Do you have any idea how often this disconnect event happens in succession? And I do feel your pain in lost hours, as I've spent way over a 1000 hours on this, maybe even over 2000 hours, as this has been an issue since 2.4.x (for sure 2.5.x, but I've got the feeling it has been bugging me for longer). |
@TD-er I'm experiencing the same exact problem and the disconnection happen every 5 or 6 hours more or less. |
5 or 6 hours sounds like regular interval. |
yes the problem is that we cannot detect it. |
@TD-er I still have some disconnections that I can't detect, my MQTT client enter the reconnection loop and it does not know that it can't connect since the WiFi is disconnected. All suggestions will be really appreciated. :) Thank you! |
What I do is that I count the number of failed connection attempts. |
@TD-er Holy crap, 2000 hours?? That is a LOT of hours. Update: Relying on the event does not work for all disconnections! It seems it's possible for the WIFi stack to hang up / get stuck in an inconsistent state for some situtation. But, I have found a different workaround that has worked perfectly for me for a couple of weeks now. @sblantipodi When WiFi is in this bad state (thinks it's connected but it has actually been disconnected), WiFi.RSSI() returns a positive value!! In fact, it returns 31.. instead of the negative values we usually see. So, I now check RSSI() and if it's positive, I call WiFi.disconnect(false) and then I wait for a second (actually I let other things in the loop run) and then I attempt to reconnect. So far no sustained connection losses. They always come back on their own now, so far. |
There are more situations where you might get a positive value for RSSI. You can also get this value during a connection process and even when performing a scan, or when running in AP only mode with no client connected to it. The key ingredients of why it is now working in your test setup are the explicit disconnect and a wait. |
Is there any update on this? In my application, I have built an ensureWiFi() that pings the gateway WiFi.gatewayIP() and if after a few tries cannot reach it, the WiFi is retried, then disconnected and retried, etc. Does anyone see anything wrong with this process? It seems to go completely around the ESP's knowledge of the WiFi stack and directly try to reach the WLAN. I'd love to hear better ideas, etc. Is there any type of "Layer 2 ping" that could see if the WiFi AP's MAC address is reachable without going to Layer 3? |
I would not consider checking on layer-2 only, as it could still lead to crashes if attempting to connect to some host where the IP stack is not ready. |
This has been an issue for quite some time and also lots of issues have been reported which are probably related to this incorrect state of the WiFi.
So this is merely a collection issue to gather all insights and link topics, as I keep finding my own replies in lost of those topics over and over again, but still feeling lost in this problem.
Related issues:
And lots more.
In essence all calls that may check the
WiFi.status()
and base their actions on it may run into these problems.First let's have a look at the enum-mapping performed here:
Note that the case of
STATION_CONNECTING
results inWL_DISCONNECTED
What I'm observing on some nodes (really hard to reproduce on some and happening almost always on others) is this:
Initial attempt to connect is stuck forever, as the WiFi status never gets to
WL_CONNECTED
I checked by calling
wifi_station_get_connect_status()
and see the state is stuck atSTATION_CONNECTING
.However the web server may serve pages and the
WiFiEventStationModeGotIP
event has fired.So all seems to be working already, but the state is not updated.
In one issue it was mentioned to call
WiFi.setAutoReconnect(true);
to fix this, but that's not the magic fix here.My work-around for this is to keep track of how long it takes to get a successful connection and if that times out, I call my own
resetWiFi()
function.The
initWiFi()
is also called as one of the first functions in mysetup()
The WiFi status is also incorrect when the unit gets disconnected.
For example when the ESP node is kicked from the access point (MikroTik AP allows you to disconnect a specific client via the web interface) or whatever other reason there may be to disconnect a node.
This is the code I use to detect if I have an IP-address:
N.B. the
CORE_POST_2_5_0
define is set by me when compiling with a specific core version.Some times, when the node gets disconnected, the
WiFiEventStationModeDisconnected
event is fired, but the WiFi state and/or the presence of the IP-address remains.The only way to get out of this, is to call my
resetWiFi()
function and start over to create a connection.For some reason, TCP/IP traffic is not causing crashes in this WiFi limbo state, but UDP is causing crashes.
So it would be really helpful if we could either fix this or at least explain it so we can use work-around which don't feel like "don't know why but it makes issues harder to reproduce", which has been the main modus operandi for the last 2 years with these WiFi issues.
The text was updated successfully, but these errors were encountered: