Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect WiFi (scan for strongest AP) #731

Open
hkhaka opened this issue May 13, 2020 · 41 comments
Open

Reconnect WiFi (scan for strongest AP) #731

hkhaka opened this issue May 13, 2020 · 41 comments
Labels
enhancement New feature or request integration: wifi
Milestone

Comments

@hkhaka
Copy link

hkhaka commented May 13, 2020

Describe the problem you have/What new integration you would like

I would like to have a function that disconnects the WiFi and then performs a new scan and connects to the strongest AP found.

Please describe your use case for this integration and alternatives you've tried:

I have an ESP32 that rides along my robot lawnmower and keeps track of some things.
I have 3 APs to be able to cover my house + garden. But the ESP is "sticky" and stays connected to which ever AP it decides on first.
I know it is possible to set the "reboot_timeout" to something quite low, but it seems unneccesary to reboot the whole ESP and loose track internally of stuff, when I only really want to try a reconnect to a better positioned AP.

Additional context

@randybb
Copy link

randybb commented May 13, 2020

But it would not be the cleanest solution. Probably more like a function for checking RSSI and if it will be lower than threshold, then it can initiate scan and reconnect to a stronger one.
For the proper implementation of roaming we have been waiting for years... https://esp32.com/viewtopic.php?t=3885 / espressif/esp-idf#3671

@brandond
Copy link

brandond commented May 13, 2020

It'd be nice if the ESP-IDF would support 802.11r (fast roaming) but they do not, as far as I know. Without fast roaming you have to go through the whole disassociate/scan/reassociate process. @randybb linked to the open esp-idf issue tracking fast roaming support.

If they're just now working on adding it to ESP32 I can't imagine we'll ever see it for esp8266.

@hkhaka
Copy link
Author

hkhaka commented May 14, 2020

Agree. Proper roaming would for sure be the best. But as that does not seem to be happening any time soon, this is a "something" (hopefully more feasible) rather than "nothing", in the meantime.
Even with proper, powerful devices and really good roaming implementation, wifi is wifi and connection will be lost or interrupted now and then. And communication IS interrupted while scanning for other APs, since there only one single radio chip and antenna. No physical way around that. And the more channels to scan through, the longer it takes. But with the right implementation, a scan could be really quick any way (scan 1ch,talk again for a bit, scan next ch, talk again, aso...). But an ESP does have limited capabilities. Good WiFi applications should always assume that wifi can be dropped, and buffer or resend data, but that can also be tricky to implement.
For me, the level of improvement, a hard "reconnect", would at least still give me better control (than a full reboot).

@mmiller7
Copy link

mmiller7 commented Jul 22, 2020

I would like something that would periodically rescan too.

I have several ESPHome smart plugs (S31) and 3 WiFi APs thru my house to provide sufficient coverage. Sometimes I will have to reboot an AP (for firmware updates, reconfigure channel/security/add-VLAN/etc, troubleshooting, power-failure not all are on UPSs) and then they will connect back to whichever happens to come up first. In some cases, this could be connecting to the far end of the house and latch on forever.

I have tried setting multiple networks with the BSSID set for priority to the expected nearest one but this still doesn't work well if the AP boots up slower than the S31 (which is most times). It would be much better if it could properly support roaming in some way.

Using a full reboot just to scan for WiFi is very annoying if it's connected to a light and worse if it's a TV, Radio, or other device that can not tolerate a brief blip without doing a full "reboot cycle" and probably not good for the relay contacts if it's a higher current load such as a washing-machine, dishwasher, etc.

An additional problem, it seems to pick the first AP it sees (by lowest channel number?) out of them to try and connect to regardless of signal strength as the plugs boot up. This complicates the issue farther...my 2-networks specifying BSSID with higher priority helps, but is still a poor workaround. Proper periodic scanning and selecting by signal periodically would correct this problem too.

@sergeolkhovik
Copy link

Vote for this too, I'd like to have any such functionality as well!

@glmnet
Copy link
Member

glmnet commented Jul 22, 2020

Remember to upvote FR using the 👍 on OP

@OttoWinter
Copy link
Member

I mean the current priority system already does do that, no?

If you have multiple matching networks with the same priority, they will automatically be chosen in a round-robin fashion because on each disconnect the previous network gets a -1 priority penalty.

If a device connects to a wrong network, it will stay there as long as it's still connected. But as soon as the wifi connection drops it will automatically choose the best network again.

@mmiller7
Copy link

mmiller7 commented Jul 25, 2020

I mean the current priority system already does do that, no?

No, it does not.

Critical distinction, "strongest AP" is NOT the same as "strongest network".
You can have may APs (BSSIDs) with one network (SSID). This is increasingly common with mesh networks and prosumer infrastructure.

If you have multiple matching networks with the same priority, they will automatically be chosen in a round-robin fashion because on each disconnect the previous network gets a -1 priority penalty.

This doesn't help when they are the same network, but multiple access points (same SSID, many BSSID to provide redundancy, distribute load, and improved coverage) and it locks onto a weaker BSSID when a stronger one is available and it won't ever switch to the stronger one. In my experience, on bootup the ESP also frequently does not pick the strongest BSSID for the specified SSID either (seems to go by lowest channel number not signal for which BSSID on a given SSID???)

If a device connects to a wrong network, it will stay there as long as it's still connected. But as soon as the wifi connection drops it will automatically choose the best network again.

That is part of problem, especially with multiple BSSIDs on the same SSID. It should periodically scan (or when it gets below a threshold and reconnect to the strongest one. If it doesn't initially pick the strongest BSSID to begin with, it never tries again either.

There is already a way to have it report back it's signal-strength, there should be a way to have that drop below a threshold and trigger a scan/reconnect without rebooting (which interrupts whatever it's doing, for example cycling the relay on a smart-plug making your TV/radio/whatever reboot too). At least then it would be possible to design something to make it retry periodically with an automation without having to reboot.

Additionally, if it is insisting on using an AP with a poor signal (say at the far end of your house as I often observe) and causing many re-transmitted packets due to low (but not enough to disconnect) signal quality, that introduces significant performance overhead and impact not only to the ESP device but all other devices on the same access point.

@glmnet
Copy link
Member

glmnet commented Nov 18, 2020

I'm suffering this now. Had anyone dig into required changes already?

@gknops
Copy link

gknops commented Nov 26, 2020

Tasmota fixed this a couple of years ago, but it is disabled by default... (SetOption56/SetOption57, see arendst/Tasmota#3173). Absolutely needed for ESPHome, without it the devices are nearly guaranteed to do the wrong thing on multi-AP networks. That is causing me big headaches currently.

@gknops
Copy link

gknops commented Nov 26, 2020

Ah, workaround for static setups (not the riding lawnmower mentioned above): define the bssid in the WiFi settings. Not as elegant as an automatic scan though, plus it requires one to guess in edge cases.

@gknops
Copy link

gknops commented Nov 26, 2020

Sorry, that was nonsense. The BSSID steers the network, not the specific AP.

@glmnet
Copy link
Member

glmnet commented Nov 27, 2020

I expect the bssid thing to work, however, it will stick to one AP
will test it

@mmiller7
Copy link

mmiller7 commented Nov 27, 2020

Sorry, that was nonsense. The BSSID steers the network, not the specific AP.

BSSID is what steers the AP (it's the wireless radio's MAC address) the SSID is the network.

Workaround for non-moving is to set the BSSID in network settings BUT if the access point goes offline you still need it to scan for any SSID as a backup (lower priority network). Then I had to make switches, sensors, and automations to reboot the ESPs any time an AP goes offline to force them to re-scan for the one they are supposed to be on.

@glmnet
Copy link
Member

glmnet commented Nov 27, 2020

Anyway what otro said should work. So this is a bug and not a feature request. Just putting it clear. Cannot debut now. No time and my problematic nodes are hard to serial debug for now.

@Pack3tL0ss
Copy link

as @randybb mentioned the most ideal would be typical roaming implementation, but if that's too much of a heavy lift and we are waiting for upstream.

-- 1 --
Perhaps adding a setting for minimum RSSI. AP sees multiple BSSIDs (same SSID).

For example one of mine is currently connected to an AP with RSSI -90db, while there is an AP far closer with -65db. If I was able to set a minimum, then the -90db would always be ignored during scan (unless it is the only one available and log if so).

That would likely be easier than the avg user determining the BSSID.

-- 2 --
Second thought is if the ESP is not selecting the strongest BSSID during initial SCAN... I would hope that's something that can be rectified. I have about 30 of these now, each client that has a weak connection (so lower data rates) slows the network for all clients on that channel...

If that's really the case (esp doesn't select BSSID based on signal) we should really add an alert/note in the wifi component documentation to warn users with multi-AP networks.

-- 3 --
Third: An action to wifi.rescan that can be called by on_value from the signal strength sensor.

Again the most ideal is a traditional scan/thresholds and proper roaming support.

@justlikeef
Copy link

Running into this issue also. Did the initial flash in the house, then moved esp32 to garage where there is enough signal to see the original AP, but the connection is very unreliable. Even when the esp32 completely falls off the network, it won't try to connect to the AP that's in the garage because it can still see the one inside. I had to take my laptop to the garage and do the initial programming on a different esp32 in the garage to get it to use the AP in there.

Forcing it to a specific BSSID will not resolve the issue because if that AP ever fails and it tries to connect to one of the APs furthest away, I'll be right back in the same situation.

@randybb
Copy link

randybb commented Feb 14, 2021

esp8266 have a bit better WiFi reception with PCB antennas than esp32. In house I don't have problems - I have an AP for every 50 m2, but outside it is another story. I have been using ESP32 with external antennas (ESP32-WROOM-32U) without any problems where ESP32 with PCB antennas would have problems.

@mmiller7
Copy link

as @randybb mentioned the most ideal would be typical roaming implementation, but if that's too much of a heavy lift and we are waiting for upstream.

-- 1 --
Perhaps adding a setting for minimum RSSI. AP sees multiple BSSIDs (same SSID).

For example one of mine is currently connected to an AP with RSSI -90db, while there is an AP far closer with -65db. If I was able to set a minimum, then the -90db would always be ignored during scan (unless it is the only one available and log if so).

That would likely be easier than the avg user determining the BSSID.

-- 2 --
Second thought is if the ESP is not selecting the strongest BSSID during initial SCAN... I would hope that's something that can be rectified. I have about 30 of these now, each client that has a weak connection (so lower data rates) slows the network for all clients on that channel...

If that's really the case (esp doesn't select BSSID based on signal) we should really add an alert/note in the wifi component documentation to warn users with multi-AP networks.

-- 3 --
Third: An action to wifi.rescan that can be called by on_value from the signal strength sensor.

Again the most ideal is a traditional scan/thresholds and proper roaming support.

I have observed #2 with my Sonoff S31 plugs on ESPHome...they go by the lowest channel number they see regardless of signal strength...which in some cases means picking the worst signal.

Really hope something can be figured out better than the mess of improvised BSSID and manually rebooting when it "might" be on a different AP...

@micronen
Copy link

micronen commented Mar 5, 2021

I have the same problem - 3 AP in mesh configuration & 1 AP from internet provider. The mesh does a periodic reset every night, so the ESP devices connects to the internet provider AP. Which is a much less stable connection.
The Priority system complicates things further - an AP that undergoes a reset cycle gets its priority lowered...
So even if on the first day the unit is connected to the best AP, on the next day it won't return to it. Only after a few connect/disconnect cycles the best AP can be selected again. The nodes will connect to every available AP before returning to the first one.

I tried to implement a background scan on the ESP8266 - as mentioned above it interrupts the connection. Each channel takes about 100mSec to scan, so I thought that by scanning one channel every 10sec I'll get a reasonable compromise. Small delay for events to be sent and a full scan every few minutes (14 channels, 10sec/channel = 140sec for full scan, 100mSec/10sec = 1% of event delay by up to 100mSec).

I thought that the TCP/IP stack will handle a small interrupt in the link - but the system became unstable.
The small interruption in the WiFi link, caused the TCP/IP link to disconnect - probably because of the HW reporting some problem when a operation was cancelled or refused during the scan.
Each disconnect leads to heap allocation, the stack waits about 10 minutes before releasing the old connection memory - which according to TCP/IP docs is a feature and not a bug, but for ESP8266 is catastrophic, because the heap is too small for this.

The same happens if a full scan is initiated too quickly, the current implementation supports re-scanning, but it must be used sparsely.

Why does the API uses TCP/IP? why not use UDP? It uses less resources, and the API messages are small, so a data-gram event driven approach is simpler than a streaming approach used by TCP/IP.
Use periodic PING to verify that the connection is alive, and ACK to make sure the data is received (if required by the sensor).

@mmiller7
Copy link

mmiller7 commented Mar 6, 2021

UDP would complicate other things, it is not a reliable protocol ("best effort" is often reliable enough in a quiet network, but if the packet is lost it will never retry...would need to rewrite applications to keep retrying and checking if it worked...much more effort). If it was controlling a light show with a nonstop stream of commands, UDP would be better but for simple on off and button press TCP is the reasonable option.

I'm curious what "unstable" meant - maybe there are some timeouts or buffers that can be adjusted somewhere? I would have thought it would "look" like high latency (which should be fine, and can naturally exist).

I wouldn't even be that upset if it couldn't do full-on roaming if there was a way to do a scheduled reboot say middle of the night and it properly found the best AP...but as it currently stands they seem to prefer the first by channel # rather than signal strength.

@micronen
Copy link

micronen commented Mar 6, 2021

See esp8266/Arduino#4213 or http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
~150bytes are not a lot, but when heap is small, and gets fragmented and eventually may lead to a malloc fail.

I might be wrong about why the heap is running out - but I got a strong correlation between network errors and heap running out. It sometimes comes back, if no network error occurs for a long time - so it isn't a normal memory leak

@jhenkens
Copy link

Seems like a super reasonable desired feature, but with some technical limitations. Perhaps attempting it via a rescan, only available as an action that can be called at first?

@tobox
Copy link

tobox commented Oct 7, 2021

As mentioned before, this has been solved by tasmota years ago. I am running 22 ESP8266 and 4 Mesh-APs successfully with tasmota and I am looking into migrating to ESPhome. But without some working "roaming" support it will definitely not work, because I know the troubles I had with tasmota until the "poor man's wifi roaming" was implemented.

It is probably not a very clean solution, but it works rock stable for years. All ESPs are always connected to the best AP, after AP reboot they switch to the next best AP, and after a fixed timespan (22 minutes iirc) they move back to the original AP if the RSSI is significantly better.

@pug306d
Copy link

pug306d commented Nov 22, 2021

I was checking the signal strength of some of my devices and they were not connected to the nearest AP even though the signal was poor, I assumed they would reconnect to the best AP every 5 mins like Tasmota does but then discovered this request!

Even a few reboots did not seem to work for me and ESPHome still connected to the distant AP even though there was a much closer one, I had to enter in the config fast_connect: off which should be the default I believe but now its connected to the nearest AP.

Will be good to get this feature in ESPHome, I was migrating from Tasmota for a few devices but need to pause that now...

@rradar rradar added the enhancement New feature or request label Mar 12, 2022
@0x3333
Copy link

0x3333 commented Mar 16, 2022

Any news on this? Anyone with a workaround or something?

@micronen
Copy link

For a workaround, using API callable services, you can try something like this:

api:
  services:
    - service: scan_wifi
      then:
        - lambda: |-
            wifi::global_wifi_component->start_scanning();
    - service: scan_reset
      then:
        - lambda: |-
            // Reset old priorities for known networks
            for (auto &scan : wifi::global_wifi_component->get_scan_result()) {
              if (wifi::global_wifi_component->has_sta_priority(scan.get_bssid())) {
                wifi::global_wifi_component->set_sta_priority(scan.get_bssid(), 0);
              }
            }

The priority of a network drops every time it disconnects from the AP. In my setup, the mesh routers reset every night (don't ask), so the priority score is useless and just causes the unit to choose the wrong AP.

I've tried multiple ways of calling start_scanning periodically to keep track of the AP with the strongest RSSI, but sometimes the scan causes the link to Home Assistant to disconnect. And sometimes even causes ESPHome reboot.

For now I've added the following patch to handle my the nightly AP resets.

time:
  - platform: homeassistant
    id: homeassistant_time
    on_time:
      # Every 30 minutes, at early morning
      - seconds: 0
        minutes: /30
        hours: 4-6
        then:
          - lambda: |-
              if (wifi::global_wifi_component->wifi_rssi() < -60) {
                // PATCH: Reset old priorities for known networks
                for (auto &scan : wifi::global_wifi_component->get_scan_result()) {
                  if (wifi::global_wifi_component->has_sta_priority(scan.get_bssid())) {
                    wifi::global_wifi_component->set_sta_priority(scan.get_bssid(), 0);
                  }
                }
                // Rescan
                wifi::global_wifi_component->start_scanning();
              }

@ThisIsTheOnlyUsernameAvailable

It looks like 802.11k & v are now supported, and I believe this would help.

Does the linked sample code aid in integrating into ESPHome?

@rradar rradar added this to the Top Requested milestone Jun 30, 2022
@nagyrobi
Copy link
Member

nagyrobi commented Jul 1, 2022

esphome/esphome#3600

@Kilowatt-W
Copy link

👍

@HausnerR
Copy link

@micronen hey, I have different (and think better) solution for you and maybe for others who came here too.

I have similar problem (in my case unreliable router) and I made esphome device that restarts this one mesh point by cutting off power for couple seconds if it's 2,4GHz network is unavailable.

During unavailability of closest mesh point, esp connects to any mesh point and periodically check if closest one is available.
In my case i want to go back to closest one asap so I scan every minute during unavailability, but you can adjusts scan time to longer periods (esp cant be connected and scan simultaneously).

Here is yaml:

esphome:
  name: router_restarter
  platform: ESP8266
  board: esp8285

wifi:
  networks:
  - ssid: !secret wifi_ssid
    password: !secret wifi_password
    bssid: 0A:8E:BC:81:25:1A
    priority: 1000000
  - ssid: !secret wifi_ssid
    password: !secret wifi_password
  reboot_timeout: 3min
  id: w

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption: !secret encryption_key
  reboot_timeout: 0s
  id: a

# Enable OTA updates
ota:
  password: !secret ota_password

binary_sensor:
  - platform: gpio
    pin:
      number: GPIO0
      mode: INPUT_PULLUP
      inverted: true
    id: b
    filters:
      - delayed_on_off: 10ms
    on_press:
      - switch.turn_off: r

sensor:
  - platform: template
    id: wifi_disconnected
    update_interval: 1s
    lambda: |-
      const uint8_t target_bssid[6] = { 0x0A, 0x8E, 0xBC, 0x81, 0x25, 0x1A };
      static int time_disconnected = 0;

      if (id(w).is_connected() && memcmp(id(w).wifi_bssid().data(), target_bssid, 6) == 0) {
        time_disconnected = 0;
      } else {
        time_disconnected++;

        // Every minute force reconnect to look for main network. 5 sec earlier because it takes time to start reconnect
        if (time_disconnected % 60 == 55) id(w).retry_connect();

        // After first 3 minutes restart Guest router
        if (time_disconnected > 180 && time_disconnected % 180 == 15) id(r).turn_off();
      }

      return time_disconnected;

switch:
  - platform: gpio
    pin: GPIO12
    id: r
    restore_mode: ALWAYS_OFF
    on_turn_off:
      - delay: 3s
      - switch.turn_on: r

status_led:
  id: led
  pin:
    number: GPIO13
    inverted: true

In my case it's sensor that flood logs every second, but I see no problem to make it as periodic automation.

@NODeeJay
Copy link

NODeeJay commented Jul 6, 2023

Aside of the new feature (since 2023.06) to disable and reenable WiFi for a moment if the RSSI drops unter certain value that may achieve this with OOB features also for Arduino, there is another option to satsify your exact use case from the antenna side, but you must be able to install OpenWRT and DAWN on the antennas. Depending on the used ESP this works from gently by asking the client to roam to refusing AUTH and ASSOC to make sure not so cooperative clients are forced to roam.

I had the issue with robots and devices moving around constantly and always sticked to the same antenna until they lost connection completely for a couple of seconds and the device was not reachable for ~20 seconds. If there is interest I can post my DAWN config

@mmiller7
Copy link

Aside of the new feature (since 2023.06) to disable and reenable WiFi for a moment if the RSSI drops unter certain value that may achieve this with OOB features also for Arduino, there is another option to satsify your exact use case from the antenna side, but you must be able to install OpenWRT and DAWN on the antennas. Depending on the used ESP this works from gently by asking the client to roam to refusing AUTH and ASSOC to make sure not so cooperative clients are forced to roam.

In my case, this original problem ESPHome would scan and then keep trying to connect to the lowest channel number which in some rooms was below that RSSI threshold, so then it ended up fighting with the AP sending reassociation packets and would fail to connect at all while sitting directly under another AP with full signal on a higher channel number

@NODeeJay
Copy link

I cannot confirm that, in my case they take a moment (up to 10 minutes) but then they are finally over to the other antenna. ESP32 seem to perform here better than 8266/8285. I restarted Balder2 15 minutes before I took the screenshot, all ESPs switched to Balder1, but those with better RSSI (does actually not really mean close to the antenna) at Balder2 were kicked several times and the PROBE request did the rest.
image

my DAWN part:

config metric 'global'
	option min_probe_count '3'
	option bandwidth_threshold '60'
	option use_station_count '0'
	option max_station_diff '1'
	option eval_probe_req '1'
	option eval_auth_req '0'
	option eval_assoc_req '0'
	option kicking '3'
	option kicking_threshold '15'
	option deny_auth_reason '1'
	option deny_assoc_reason '17'
	option min_number_to_kick '3'
	option chan_util_avg_period '3'
	option set_hostapd_nr '1'
	option duration '0'
	option rrm_mode 'pat'

I have an ESP with battery, I check it the next days and wander around with him, let's see what the open-wrt logs say

@clinta
Copy link

clinta commented Jul 20, 2023

I'm having the same issue, ESPs constantly connect to distant access points. I tried prioritizing the closest access points using multiple network definitions and specifying the bssid of the closest AP as the highest priority, but it still doesn't connect to it's closest AP.

wifi:
  networks:
  - ssid: !secret 'wifi_ssid'
    password: !secret 'wifi_password'
    priority: 0.0
  - ssid: !secret 'wifi_ssid'
    bssid: <closest AP mac address>
    password: !secret 'wifi_password'
    priority: 10.0
  fast_connect: false
  domain: .iot.<my-domain>
  power_save_mode: NONE
  ap:
    ssid: chicken-coop-light-hs
    ap_timeout: 1min
  reboot_timeout: 15min
  output_power: 20.0
  passive_scan: false
  enable_on_boot: true
  use_address: chicken-coop-light.iot.<my-domain>

One note is that the AP it is connecting to is on channel 1, the AP I want it to connect to is on channel 11. I wonder if it's prioritizing the lower channel, rather than prioritizing the bssid I want it to use.

@popy2k14
Copy link

Same here on all of my esphome devices when i'll update or restart one of my AP's.
I think this should be fixed in the framework rather than workarounds with rescanning.

@tbrasser
Copy link

tbrasser commented Jul 8, 2024

well, seems almost another year has passed, any updates or good workarounds?

@popy2k14
Copy link

popy2k14 commented Jul 8, 2024

+1

@strumf666
Copy link

AP roaming is needed indeed.

@vumaddibly
Copy link

As little seems to be happening to this feature request, I've just created FR #2981 which would add a workaround to the ESPHome dashboard.

@pug306d
Copy link

pug306d commented Jan 3, 2025

Any update on this, is it in the roadmap at all to be added?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request integration: wifi
Projects
None yet
Development

No branches or pull requests