-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WiFi client machines failing to (re)connect to RPi 4 internal hotspot w/ passwords in South Africa [hostapd generates EAPOL errors; usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] #2696
Comments
A new batch of data from @fisherbryan — problems appeared when the 4th tablet tried to connect to the RPi 4 IIAB's WiFi hotspot:
His Android 6.0 phone shows "Denied access to network" |
These are something I haven't seen yet from 2DjFUm?en:
https://www.spinics.net/lists/linux-wireless/msg182294.html suggests PSM's watchdog has fired! is a trap for an error |
These look normal, sans for the late showing of eth0 with an ip address, but should be able to reach box.lan via the WiFi hotspot.
I count 7
That is interesting, without context, is that after the 7 above are connected or before? |
FYI @fisherbryan added these 2 lines to the RPi 4 IIAB's
Based on https://stackoverflow.com/questions/32205140/hostapd-debug-level-configuration/32209729#32209729 |
This (might) be completely unrelated but FYI someone with the new 5.10 kernel is experiencing WiFi problems on a Raspberry Pi 3 B+: raspberrypi/linux#2522 (comment) |
Interesting, that is as a client, ap0 should already be in promiscuous mode as part of the bridge, and don't enable promiscuous mode on wlan0 if ap0 is present, did a quick test that will break ap0
|
|
Thanks to @fisherbryan and @jvonau who successfully tested 20 simultaneous WiFi client devices as also confirmed on the server (RPi 4 IIAB) with command: These WiFi client devices are all kicked off intermittently, approx every 30-60min (?) typically right after this appears in dmesg output: However sometimes the situation appears worse, e.g. with
Does this mean the WiFi firmware crashed and/or sdio bus is busy? In any case, it's now one of those times, so I'm pasting in a new batch of log files from @fisherbryan's RPi 4 IIAB:
|
Thanks to @jvonau who's added extra hostapd logging using
As compared to the original here: https://github.com/iiab/iiab/blob/master/roles/network/templates/hostapd/hostapd.service.j2 |
Normal
Crash point
Blocked?
|
EAPOL-Key timeout mentioned https://www.spinics.net/lists/linux-wireless/msg181537.html |
Maybe we really should try out the latest hostapd 2.9 (1.5 years old) after all? It's prepackaged as part of Ubuntu 20.10+ on RPi 4 if it's too hard to force 2.9 onto RaspiOS? (Whereas RaspiOS's current hostapd 2.7 is 2.25 years old, and potentially suspicious if we believe the above posting!) |
2.9 seems to work fine on ubuntu here, but I don't have the clients to do the needed testing but this more of a firmware/driver issue I would think, https://www.spinics.net/lists/linux-wireless/msg181417.html same thread
Suggests that a client "going out of range" can be a trigger, noted that was for a 43430 chip and not 43455 but is easily testable anyway, just walk away from the AP with a phone connected and see what occurs. |
@jvonau do you know if it's possibly relevant that hostapd 2.8 "fixed PTK rekeying with FILS and FT" ? FT = fast transition a.k.a. fast roaming (https://en.wikipedia.org/wiki/IEEE_802.11r-2008) from July 2008 FILS = fast initial link setup (https://en.wikipedia.org/wiki/IEEE_802.11ai) from June 2017 |
FYI this same problem seems to have occurred 17min after boot with a different RPi 4, running a fresh copy of IIAB on 64-bit Ubuntu Server 20.10 (with hostapd 2.9, on kernel 5.8). The number of connected WiFi clients fell suddenly from 20 to 1...and then a couple minutes later to zero connected WiFi clients:
Enhanced logging is needed to understand more what is happening when reproducing this. |
After putting this in /etc/systemd/system/hostapd.service : (Line 18, on this new RPi 4 = Bryan_RPI4_64_bit_Ubuntu_Server_20.10)
I then ran:
WiFi immediately started connecting. This leveled off at 16 — then within about 2-3 minutes of 16 WiFi clients connecting — some kind of crash occurred — with most all WiFi clients being kicked off:
Here's the new logging from file |
For now I've added the following to the bottom of Bryan_RPI4_64_bit_Ubuntu_Server_20.10's /etc/hostapd/hostapd.conf for additional logging:
And then rebooted (soon after). |
copied syslog to ~/syslog1
sudo grep "d8:a2:5e:96:18:8d" syslog1
|
|
Notes for the future, https://www.spinics.net/lists/linux-wireless/msg208259.html mentions unaligned data on the sdio bus while |
Anybody wanting to change (Likely the situation is exacerbated by high-capacity firmware put in place by |
Should somebody want to pursue the firmware debugging further as root (sudo won't cut it) |
We should also try to find out if we can break RPi passworded hotspots with a single WiFi client machine. i.e. further isolating what conditions reproduce this failure will be extremely easy (and enlightening!) if so — as outlined here: (Similar to Bryan in South Africa who can break his RPi 4 passworded hotspots within ~2min every time, by actively browsing on just a couple WiFi client devices...quite possibly it's not at all necessary to have 7-to-20 WiFi clients simultaneously connected?!) |
I have with no breakage, you are free to duplicate the effort.
Summarizing the net effect without gathering the brcmfmac debug data does nothing to advance what is the root cause. |
raspberrypi/firmware#1522 non mmc os root partition (ie usb boot) corrects slow wifi access? Easy to test, remove sdcard and insert into a sdcard to usb3 adapter and boot from usb. |
Enable brcmfmac debugging from cmdline.txt and list of options for debugging |
Theory "brcmfmac: brcmf_sdio_bus_txdata deferring pktq len" should increment before crash presents itself. |
FYI in a separate South African city with a nearly identical RPi 4 IIAB, hotspot passwords were turned off (
Output below cleaned up from
|
I see a couple of reboots with logging being out of order
With a reboot of iiab afterwards right? I'm going to assume the iiab was shutdown to move it.After the change to no passwords before shipping the client's used were to told to "forget network" and then re-select the desired SSID and everything was working OK? Given this is a different location how does the RF spectrum differ? Known good power supply? Capturing 'the why' is always the hardest and without out being at a console I can't poke for info... got to go for now I being a bad hockey fan... |
Good catch. And as a result of /etc/cron.hourly/fake-hwclock auto-saving the clock to disk 17min past every hour, time records very often (appear to) begin 17min past the hour. @fisherbryan: can you please keep test machines plugged into the Internet whenever possible? So their clocks get the correct time (from the Internet, right after boot). Which will make WiFi logging far more meaningful & understandable. |
A direct report from the woman running the training in South Africa yesterday afternoon, to clarify the participants' problematic experience:
Background: all tablets were "Point of View (China) with Android 10". |
FYI others (including @fisherbryan in South Africa) attempted over the past week, but have been unable to reproduce the issue of many tablets losing their WiFi connections to RPi 4's internal hotspot (when
|
This might be only very tangentially-related, but here is @fisherbryan's kolibri.txt log file from an IIAB on its way to deployment, that might/maybe shed a bit of light on one-or-the-other of those WiFi issues we were working on: (on-ticket above, 2-3 weeks ago) |
close |
We need more logs from @fisherbryan (the output of
dmesg
, and the pastebin resulting from iiab-diagnostics).So far we have what appears to be
/var/log/syslog
:We suspect these facts are not relevant to the problem:
host_country_code: ZA
was set in /etc/iiab/local_vars.ymliiab_gateway_enabled: True
was set in /etc/iiab/local_vars.ymlThese aspects however (might) be relevant:
/etc/hostapd/hostapd.conf
— originally the password contained an '@' — but FYI even after removing that special character, the problems persisteddmesg | grep brcm
Last week's output might (or might not) be related:
./iiab-network
)Refs: #823, #1737, #2610, PR #2686
The text was updated successfully, but these errors were encountered: