Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi client machines failing to (re)connect to RPi 4 internal hotspot w/ passwords in South Africa [hostapd generates EAPOL errors; usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] #2696

Closed
holta opened this issue Mar 1, 2021 · 34 comments

Comments

@holta
Copy link
Member

holta commented Mar 1, 2021

We need more logs from @fisherbryan (the output of dmesg, and the pastebin resulting from iiab-diagnostics).

So far we have what appears to be /var/log/syslog :

1 Android v6.0
1 -Tablet,
Then Adding another up to 10 [tablets..]
http://sprunge.us/2DjFUm?en

Result, some tablets are connecting and some [are] not...my desktop ubuntu machine also can't connect to wifi

We suspect these facts are not relevant to the problem:

  • host_country_code: ZA was set in /etc/iiab/local_vars.yml
  • iiab_gateway_enabled: True was set in /etc/iiab/local_vars.yml
  • RaspiOS includes hostapd 2.7 — hostapd 2.9 could easily be tested using Ubuntu on RPi 4 is nec

These aspects however (might) be relevant:

  • A wpa2 password is being used in /etc/hostapd/hostapd.conf — originally the password contained an '@' — but FYI even after removing that special character, the problems persisted
  • RaspiOS recently apt updated from Linux kernel 5.4 LTS to 5.10 LTS
  • RPi 4 IIAB is running WiFi firmware 7.45.18, as shown by dmesg | grep brcm
  • The very large number tablets (typically 10-to-20 per school) are "Point of View (China) with Android 10"
  • RaspiOS runs hostapd 2.7 from 2018-12-02 (conceivably hostapd 2.9 from 2019-08-07 might do better here?)

Last week's output might (or might not) be related:

Refs: #823, #1737, #2610, PR #2686

@holta holta added the bug label Mar 1, 2021
@holta holta added this to the 7.2 milestone Mar 1, 2021
@holta holta changed the title WiFi client machines failing to connect in South Africa WiFi client machines failing to connect to RPi 4 in South Africa Mar 1, 2021
@holta
Copy link
Member Author

holta commented Mar 1, 2021

A new batch of data from @fisherbryan — problems appeared when the 4th tablet tried to connect to the RPi 4 IIAB's WiFi hotspot:

His Android 6.0 phone shows "Denied access to network"

@jvonau
Copy link
Contributor

jvonau commented Mar 1, 2021

These are something I haven't seen yet from 2DjFUm?en:

7 Mar 1 12:28:31 box vmunix: [ 331.363179] ieee80211 phy0:brcmf_psm_watchdog_notify: PSM's watchdog has fired!
41 Mar 1 12:30:12 box vmunix: [ 431.679481] ieee80211 phy0: brcmf_netdev_wait_pend8021x: Timed out waiting for no pending 802.1x packets

https://www.spinics.net/lists/linux-wireless/msg182294.html suggests PSM's watchdog has fired! is a trap for an error
brcmf_netdev_wait_pend8021x is used in send_key_to_dongle in cfg80211.c, timeout might be with the sdio bus or firmware.

@jvonau
Copy link
Contributor

jvonau commented Mar 1, 2021

A new batch of data from @fisherbryan — problems appeared when the 4th tablet tried to connect to the RPi 4 IIAB's WiFi hotspot:

* http://sprunge.us/VcwbJj?en iiab-diagnostics

* http://sprunge.us/COIGTW?en dmesg (336 lines)

* http://sprunge.us/6WOORl?en /var/log/syslog (45 lines)

These look normal, sans for the late showing of eth0 with an ip address, but should be able to reach box.lan via the WiFi hotspot.

Bryans-MBP
HUAWEI_P8_lite
android-5f881822683a4de1
android-71b7d41439ab028c
android-5f881822683a4de1
android-27e5ffc9a563f0a8
android-c9152637b0fe8000

I count 7

His Android 6.0 phone shows "Denied access to network"

That is interesting, without context, is that after the 7 above are connected or before?
Doing some diffs against resent hostaptd.conf info looks like we differ in auth_algs=3 and wmm_enabled=1 is now the default, Might what to try some combination of auth_algs=3 and/or wmm_enabled=0.

@holta
Copy link
Member Author

holta commented Mar 1, 2021

FYI @fisherbryan added these 2 lines to the RPi 4 IIAB's /etc/hostapd/hostapd.conf about 1-2 hours ago: (for more verbose logging)

logger_syslog_level=0
logger_stdout_level=0

Based on https://stackoverflow.com/questions/32205140/hostapd-debug-level-configuration/32209729#32209729

@holta holta changed the title WiFi client machines failing to connect to RPi 4 in South Africa WiFi client machines failing to connect to RPi 4 in South Africa [espec when 7+ WiFi clients try to connect?] Mar 1, 2021
@holta
Copy link
Member Author

holta commented Mar 1, 2021

This (might) be completely unrelated but FYI someone with the new 5.10 kernel is experiencing WiFi problems on a Raspberry Pi 3 B+: raspberrypi/linux#2522 (comment)

@jvonau
Copy link
Contributor

jvonau commented Mar 1, 2021

This (might) be completely unrelated but FYI someone with the new 5.10 kernel is experiencing WiFi problems on a Raspberry Pi 3 B+: raspberrypi/linux#2522 (comment)

Interesting, that is as a client, ap0 should already be in promiscuous mode as part of the bridge, and don't enable promiscuous mode on wlan0 if ap0 is present, did a quick test that will break ap0

Feb 28 17:13:15 box vmunix: [ 7.998114] br0: port 1(ap0) entered blocking state
Feb 28 17:13:15 box vmunix: [ 7.998125] br0: port 1(ap0) entered disabled state
Feb 28 17:13:15 box vmunix: [ 7.998286] device ap0 entered promiscuous mode
Feb 28 17:13:15 box vmunix: [ 8.175959] 8021q: 802.1Q VLAN Support v1.8
Feb 28 17:13:15 box vmunix: [ 8.632707] br0: port 1(ap0) entered blocking state
Feb 28 17:13:15 box vmunix: [ 8.632719] br0: port 1(ap0) entered forwarding state

@holta holta changed the title WiFi client machines failing to connect to RPi 4 in South Africa [espec when 7+ WiFi clients try to connect?] WiFi client machines failing to connect to RPi 4 in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] Mar 1, 2021
@jvonau
Copy link
Contributor

jvonau commented Mar 1, 2021

iiab-admin@box:~ $ sudo cat /sys/kernel/debug/ieee80211/phy0/forensics
00391.274 ampdu_dbg: cwcur0-3 1f 1f 1f 1f bslots cur/0-3 12 0 0 0 0 ifs_boff 0
000391.274 ampdu_dbg: again1 ifsstat 0xaf nav_stat 0x0
000391.274 ampdu_dbg: again2 ifsstat 0xaf nav_stat 0x0
000391.274 wl0: wlc_ampdu_watchdog: cleaning up ini tid 6 due to no progress for 2 secs tx_in_transit 3
000391.274 wl0: wlc_ampdu_tx_send_delba: tid 6 initiator 1 reason 39
000392.270 ampdu_dbg: wl0.2 dead_cnt 2 tx_in_transit 28 fifordy 0x0 frmcnt 0x400 fifosel 0x7
000392.270 ampdu_dbg: ifsstat 0x9ea0 nav_stat 0x0 txop 4294958686
000392.270 ampdu_dbg: pktpend: 0 25 0 0 0 ap 1
000392.270 ampdu_dbg: txall 10 txbcn 10 txrts 0 rxcts 0 rsptmout 0 rxstrt 592
000392.270 ampdu_dbg: cwcur0-3 1f 1f 1f 1f bslots cur/0-3 12 0 0 0 0 ifs_boff 0
000392.270 ampdu_dbg: again1 ifsstat 0xaf nav_stat 0x0
000392.270 ampdu_dbg: again2 ifsstat 0xaf nav_stat 0x0
000392.270 wl0: wlc_ampdu_watchdog: cleaning up ini tid 4 due to no progress for 2 secs tx_in_transit 28
iiab-admin@box:~ $ _ampdu_tx_send_delba: tid 4 initiator 1 reason 39

@holta
Copy link
Member Author

holta commented Mar 1, 2021

Thanks to @fisherbryan and @jvonau who successfully tested 20 simultaneous WiFi client devices as also confirmed on the server (RPi 4 IIAB) with command: hostapd_cli -i ap0 list_sta | wc

These WiFi client devices are all kicked off intermittently, approx every 30-60min (?) typically right after this appears in dmesg output: ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Right afterwards, WiFi clients can (sometimes) all be made to reconnect automatically, if one does either systemctl restart hostapd or reboots to RPi 4 IIAB.

However sometimes the situation appears worse, e.g. with -110 appearing in dmesg output, and the following command taking more than 5 seconds to return to the prompt:

root@box:~# hostapd_cli -i ap0 list_sta | wc
      1       1      18

Does this mean the WiFi firmware crashed and/or sdio bus is busy? In any case, it's now one of those times, so I'm pasting in a new batch of log files from @fisherbryan's RPi 4 IIAB:

@holta holta changed the title WiFi client machines failing to connect to RPi 4 in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] WiFi client machines failing to (re)connect to RPi 4 in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] Mar 1, 2021
@holta
Copy link
Member Author

holta commented Mar 1, 2021

Thanks to @jvonau who's added extra hostapd logging using -dd verbose logging and -f to a custom log file location — see /etc/systemd/system/hostapd.service as follows:

ExecStart=/usr/sbin/hostapd -T -t -dd -f /var/log/hostapd.log -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf

As compared to the original here: https://github.com/iiab/iiab/blob/master/roles/network/templates/hostapd/hostapd.service.j2

@jvonau
Copy link
Contributor

jvonau commented Mar 1, 2021

Normal

Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.11: associated
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: event 1 notification
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: start authentication
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.1X: unauthorizing port
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: received EAPOL-Key frame (2/4 Pairwise)
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: sending 3/4 msg of 4-Way Handshake
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: received EAPOL-Key frame (4/4 Pairwise)
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.1X: authorizing port
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 RADIUS: starting accounting session 553A0624A8D6D983
Mar 1 23:23:29 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: pairwise key handshake completed (RSN)
Mar 1 23:23:29 box dnsmasq-dhcp[1775]: DHCPREQUEST(br0) 172.18.100.131 ae:b3:45:92:04:08
Mar 1 23:23:29 box dnsmasq-dhcp[1775]: DHCPACK(br0) 172.18.100.131 ae:b3:45:92:04:08 HUAWEI_P_smart_S-d8a68ca9
Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30:cd:78:d7 IEEE 802.11: associated

Crash point

Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: event 1 notification
Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: start authentication
Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30:cd:78:d7 IEEE 802.1X: unauthorizing port
Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: received EAPOL-Key frame (2/4 Pairwise)
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: invalid MIC in msg 2/4 of 4-Way Handshake
Mar 1 23:25:05 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:05 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:06 box vmunix: [ 899.719833] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: received EAPOL-Key frame (2/4 Pairwise)
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: invalid MIC in msg 2/4 of 4-Way Handshake
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: received EAPOL-Key 2/4 Pairwise with unexpected replay counter
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: PTKSTART: Retry limit 4 reached
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: event 3 notification
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 IEEE 802.1X: unauthorizing port
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 MLME: MLME-DEAUTHENTICATE.indication(0e:be:30:cd:78:d7, 2)
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 MLME: MLME-DELETEKEYS.request(0e:be:30:cd:78:d7)
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 IEEE 802.11: disassociated
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 WPA: event 2 notification
Mar 1 23:25:07 box hostapd: ap0: STA 0e:be:30:cd:78:d7 IEEE 802.1X: unauthorizing port

Blocked?

Mar 1 23:25:48 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 1 23:25:48 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: event 1 notification
Mar 1 23:25:48 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: start authentication
Mar 1 23:25:48 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.1X: unauthorizing port
Mar 1 23:25:48 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: EAPOL-Key timeout
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: received EAPOL-Key frame (2/4 Pairwise)
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: sending 3/4 msg of 4-Way Handshake
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: ignore retransmitted EAPOL-Key 2/4 Pairwise - SNonce did not change
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: received EAPOL-Key frame (4/4 Pairwise)
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.1X: authorizing port
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session C38345CC50BDD903
Mar 1 23:25:49 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 1 23:25:52 box dnsmasq-dhcp[1775]: DHCPDISCOVER(br0) d8:a2:5e:96:18:8d
Mar 1 23:25:52 box dnsmasq-dhcp[1775]: DHCPOFFER(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 1 23:25:52 box dnsmasq-dhcp[1775]: DHCPDISCOVER(br0) d8:a2:5e:96:18:8d
Mar 1 23:25:52 box dnsmasq-dhcp[1775]: DHCPOFFER(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 1 23:26:39 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA fa:06:ab:80:3d:a2 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA fa:06:ab:80:3d:a2 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA fa:06:ab:80:3d:a2 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA fa:06:ab:80:3d:a2 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA aa:de:7a:59:6e:cd IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA aa:de:7a:59:6e:cd WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA aa:de:7a:59:6e:cd IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA aa:de:7a:59:6e:cd IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA a6:1b:2d:06:5b:80 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA a6:1b:2d:06:5b:80 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA a6:1b:2d:06:5b:80 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA a6:1b:2d:06:5b:80 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA be:6d:ac:34:c7:f8 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA be:6d:ac:34:c7:f8 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA be:6d:ac:34:c7:f8 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA be:6d:ac:34:c7:f8 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 0a:5f:fb:30:da:49 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 0a:5f:fb:30:da:49 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 0a:5f:fb:30:da:49 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 0a:5f:fb:30:da:49 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA b2:f5:85:be:6f:2d IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA b2:f5:85:be:6f:2d WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA b2:f5:85:be:6f:2d IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA b2:f5:85:be:6f:2d IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA d2:43:fe:6a:d6:ce IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA d2:43:fe:6a:d6:ce WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA d2:43:fe:6a:d6:ce IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA d2:43:fe:6a:d6:ce IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 96:3b:ff:56:2a:84 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 96:3b:ff:56:2a:84 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 96:3b:ff:56:2a:84 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 96:3b:ff:56:2a:84 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA ca:7b:ed:d9:fe:32 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA ca:7b:ed:d9:fe:32 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA ca:7b:ed:d9:fe:32 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA ca:7b:ed:d9:fe:32 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 46:0c:5d:2b:73:2a IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA 46:0c:5d:2b:73:2a WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA 46:0c:5d:2b:73:2a IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA 46:0c:5d:2b:73:2a IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA c2:f7:24:05:a6:17 IEEE 802.11: disassociated
Mar 1 23:26:39 box hostapd: ap0: STA c2:f7:24:05:a6:17 WPA: event 2 notification
Mar 1 23:26:39 box hostapd: ap0: STA c2:f7:24:05:a6:17 IEEE 802.1X: unauthorizing port
Mar 1 23:26:39 box hostapd: ap0: STA c2:f7:24:05:a6:17 IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: event 2 notification
Mar 1 23:26:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.1X: unauthorizing port
Mar 1 23:26:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA ae:b3:45:92:04:08 WPA: event 2 notification
Mar 1 23:26:44 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.1X: unauthorizing port
Mar 1 23:26:44 box hostapd: ap0: STA ae:b3:45:92:04:08 IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA b4:69:21:60:69:d1 IEEE 802.11: disassociated
Mar 1 23:26:44 box hostapd: ap0: STA b4:69:21:60:69:d1 WPA: event 2 notification
Mar 1 23:26:44 box hostapd: ap0: STA b4:69:21:60:69:d1 IEEE 802.1X: unauthorizing port
Mar 1 23:26:44 box hostapd: ap0: STA b4:69:21:60:69:d1 IEEE 802.11: disassociated

@jvonau
Copy link
Contributor

jvonau commented Mar 2, 2021

Mar 1 23:25:03 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: received EAPOL-Key frame (2/4 Pairwise)
Mar 1 23:25:04 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: invalid MIC in msg 2/4 of 4-Way Handshake
Mar 1 23:25:05 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:05 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: EAPOL-Key timeout
Mar 1 23:25:06 box hostapd: ap0: STA 0e:be:30cd78:d7 WPA: sending 1/4 msg of 4-Way Handshake
Mar 1 23:25:06 box vmunix: [ 899.719833] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!

EAPOL-Key timeout mentioned https://www.spinics.net/lists/linux-wireless/msg181537.html

@holta
Copy link
Member Author

holta commented Mar 2, 2021

https://www.spinics.net/lists/linux-wireless/msg181537.html

Maybe we really should try out the latest hostapd 2.9 (1.5 years old) after all?

It's prepackaged as part of Ubuntu 20.10+ on RPi 4 if it's too hard to force 2.9 onto RaspiOS?

(Whereas RaspiOS's current hostapd 2.7 is 2.25 years old, and potentially suspicious if we believe the above posting!)

@jvonau
Copy link
Contributor

jvonau commented Mar 2, 2021

Maybe we really should try out the latest hostapd 2.9 (1.5 years old) after all?
It's prepackaged as part of Ubuntu 20.10+ on RPi 4 if it's too hard to force 2.9 onto RaspiOS?

2.9 seems to work fine on ubuntu here, but I don't have the clients to do the needed testing but this more of a firmware/driver issue I would think, https://www.spinics.net/lists/linux-wireless/msg181417.html same thread

Looking at the time the messages (at 589095.542690) are produced, what
seems to cause this is when I head out with my LineageOS (TI WiLink
based) phone to the car and drive off. I've suspected that's the case
with all the previous iterations of this problem too.

Suggests that a client "going out of range" can be a trigger, noted that was for a 43430 chip and not 43455 but is easily testable anyway, just walk away from the AP with a phone connected and see what occurs.

@holta
Copy link
Member Author

holta commented Mar 2, 2021

@jvonau do you know if it's possibly relevant that hostapd 2.8 "fixed PTK rekeying with FILS and FT" ?

FT = fast transition a.k.a. fast roaming (https://en.wikipedia.org/wiki/IEEE_802.11r-2008) from July 2008

FILS = fast initial link setup (https://en.wikipedia.org/wiki/IEEE_802.11ai) from June 2017

@holta
Copy link
Member Author

holta commented Mar 2, 2021

FYI this same problem seems to have occurred 17min after boot with a different RPi 4, running a fresh copy of IIAB on 64-bit Ubuntu Server 20.10 (with hostapd 2.9, on kernel 5.8).

The number of connected WiFi clients fell suddenly from 20 to 1...and then a couple minutes later to zero connected WiFi clients:

Enhanced logging is needed to understand more what is happening when reproducing this.

@holta
Copy link
Member Author

holta commented Mar 2, 2021

After putting this in /etc/systemd/system/hostapd.service : (Line 18, on this new RPi 4 = Bryan_RPI4_64_bit_Ubuntu_Server_20.10)

ExecStart=/usr/sbin/hostapd -T -t -dd -f /var/log/hostapd.log -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf

I then ran:

systemctl daemon-reload
systemctl restart hostapd

WiFi immediately started connecting. This leveled off at 16 — then within about 2-3 minutes of 16 WiFi clients connecting — some kind of crash occurred — with most all WiFi clients being kicked off:

root@box:~# hostapd_cli -i ap0 list_sta | wc -l
16
root@box:~# hostapd_cli -i ap0 list_sta | wc -l
2
root@box:~# hostapd_cli -i ap0 list_sta | wc -l    # This 1 WiFi client device (appears to be) holding on for many minutes.
1
root@box:~# hostapd_cli -i ap0 list_sta | wc -l    # Several minutes later
0

Here's the new logging from file /var/log/hostapd.log :

http://sprunge.us/KSfC6j?en

@holta
Copy link
Member Author

holta commented Mar 2, 2021

https://stackoverflow.com/questions/32205140/hostapd-debug-level-configuration/32209729#32209729

For now I've added the following to the bottom of Bryan_RPI4_64_bit_Ubuntu_Server_20.10's /etc/hostapd/hostapd.conf for additional logging:

logger_syslog=-1
logger_syslog_level=0
logger_stdout=-1
logger_stdout_level=0

And then rebooted (soon after).

@jvonau
Copy link
Contributor

jvonau commented Mar 2, 2021

copied syslog to ~/syslog1

Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:06 box vmunix: [ 836.856920] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session E7F44C181EB136C2
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:42:07 box vmunix: [ 838.257378] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:42:08 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.11: associated
Mar 2 15:42:09 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 RADIUS: starting accounting session AFD8BEB77DC47060
Mar 2 15:42:09 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 WPA: pairwise key handshake completed (RSN)
Mar 2 15:42:09 box vmunix: [ 840.138270] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:42:13 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) e2:2a:d0:e9:e6:96
Mar 2 15:42:13 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.101.48 e2:2a:d0:e9:e6:96
Mar 2 15:42:13 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) e2:2a:d0:e9:e6:96
Mar 2 15:42:13 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.101.48 e2:2a:d0:e9:e6:96
Mar 2 15:42:16 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) e2:2a:d0:e9:e6:96
Mar 2 15:42:16 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.101.48 e2:2a:d0:e9:e6:96
Mar 2 15:42:16 box dnsmasq-dhcp[2591]: DHCPREQUEST(br0) 172.18.101.48 e2:2a:d0:e9:e6:96
Mar 2 15:42:16 box dnsmasq-dhcp[2591]: DHCPACK(br0) 172.18.101.48 e2:2a:d0:e9:e6:96 android-860f6081e796aa71
Mar 2 15:42:44 box hostapd: ap0: STA 3a:02:c9:65:32:b0 IEEE 802.11: associated
Mar 2 15:42:44 box hostapd: ap0: STA 3a:02:c9:65:32:b0 RADIUS: starting accounting session 04AF94D257FD69F2
Mar 2 15:42:44 box hostapd: ap0: STA 3a:02:c9:65:32:b0 WPA: pairwise key handshake completed (RSN)
Mar 2 15:42:49 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) 3a:02:c9:65:32:b0
Mar 2 15:42:49 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.113.217 3a:02:c9:65:32:b0
Mar 2 15:42:49 box dnsmasq-dhcp[2591]: DHCPREQUEST(br0) 172.18.113.217 3a:02:c9:65:32:b0
Mar 2 15:42:49 box dnsmasq-dhcp[2591]: DHCPACK(br0) 172.18.113.217 3a:02:c9:65:32:b0 android-ac5cc2255e559ecb
Mar 2 15:42:51 box vmunix: [ 881.406873] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:43:04 box vmunix: [ 894.384167] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:43:04 box vmunix: [ 894.828082] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:43:44 box vmunix: [ 934.966822] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:43:45 box vmunix: [ 935.957811] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:43:47 box vmunix: [ 938.049938] ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!
Mar 2 15:44:47 box hostapd: ap0: STA 3a:02:c9:65:32:b0 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 3a:02:c9:65:32:b0 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA e2:2a:d0:e9:e6:96 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA aa:de:7a:59:6e:cd IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA aa:de:7a:59:6e:cd IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 96:51:11:c0:e6:55 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 96:51:11:c0:e6:55 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA ce:3a:fa:23:1d:6f IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA ce:3a:fa:23:1d:6f IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 1e:40:31:dc:83:31 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 1e:40:31:dc:83:31 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA ba:9d:1f:ff:78:2a IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA ba:9d:1f:ff:78:2a IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA d2:43:fe:6a:d6:ce IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA d2:43:fe:6a:d6:ce IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA d6:aa:e7:90:94:dd IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA d6:aa:e7:90:94:dd IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 22:94:c8:bd:bd:d6 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA b2:f5:85:be:6f:2d IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA b2:f5:85:be:6f:2d IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 0a:5f:fb:30:da:49 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 0a:5f:fb:30:da:49 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 26:a6:7b:f2:31:77 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 26:a6:7b:f2:31:77 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA a6:1b:2d:06:5b:80 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA a6:1b:2d:06:5b:80 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 2c:fd:ab:aa:13:8a IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA c2:f7:24:05:a6:17 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA c2:f7:24:05:a6:17 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 5a:98:07:d2:7d:f6 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 1a:a0:ea:06:e4:47 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 96:3b:ff:56:2a:84 IEEE 802.11: disassociated
Mar 2 15:44:47 box hostapd: ap0: STA 96:3b:ff:56:2a:84 IEEE 802.11: disassociated
Mar 2 15:49:00 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated due to inactivity
Mar 2 15:49:01 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

sudo grep "d8:a2:5e:96:18:8d" syslog1

Mar 2 15:29:01 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:29:01 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session 244A454A0603CD58
Mar 2 15:29:01 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:29:04 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) d8:a2:5e:96:18:8d
Mar 2 15:29:04 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 2 15:29:04 box dnsmasq-dhcp[2591]: DHCPDISCOVER(br0) d8:a2:5e:96:18:8d
Mar 2 15:29:04 box dnsmasq-dhcp[2591]: DHCPOFFER(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 2 15:29:05 box dnsmasq-dhcp[2591]: DHCPREQUEST(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 2 15:29:05 box dnsmasq-dhcp[2591]: DHCPACK(br0) 172.18.120.164 d8:a2:5e:96:18:8d Bryans-MBP
Mar 2 15:32:11 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:32:11 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:32:11 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:32:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:32:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session 779D4000B0F42750
Mar 2 15:32:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:33:46 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:33:46 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:33:46 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:33:50 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:33:50 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session D2B3566A28752B70
Mar 2 15:33:50 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:35:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:35:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:35:44 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:35:47 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:35:47 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session E65F2BD29B0555CD
Mar 2 15:35:47 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:36:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:36:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:36:14 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:36:17 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:36:18 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session F349122AD1FDDE4D
Mar 2 15:36:18 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:03 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session E7F44C181EB136C2
Mar 2 15:42:06 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 15:49:00 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated due to inactivity
Mar 2 15:49:01 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Mar 2 16:15:18 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 16:15:21 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 16:15:21 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session 3338B79B1CA3596C
Mar 2 16:15:21 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 16:15:51 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 16:15:51 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 16:15:51 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated
Mar 2 16:15:54 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: associated
Mar 2 16:15:55 box hostapd: ap0: STA d8:a2:5e:96:18:8d RADIUS: starting accounting session BC5D6DC890D31A83
Mar 2 16:15:55 box hostapd: ap0: STA d8:a2:5e:96:18:8d WPA: pairwise key handshake completed (RSN)
Mar 2 16:15:58 box dnsmasq-dhcp[8084]: DHCPREQUEST(br0) 172.18.120.164 d8:a2:5e:96:18:8d
Mar 2 16:15:58 box dnsmasq-dhcp[8084]: DHCPACK(br0) 172.18.120.164 d8:a2:5e:96:18:8d Bryans-MBP
Mar 2 16:21:22 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: disassociated due to inactivity
Mar 2 16:21:23 box hostapd: ap0: STA d8:a2:5e:96:18:8d IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

@holta
Copy link
Member Author

holta commented Mar 2, 2021

  1. While I don't understand the reason(s), FYI unlike the prior boots (where WiFi client devices were kicked off within 5-10 min every time, and also after systemctl restart hostapd gave them another chance to reconnect, see (2) below) this current boot (the 5th boot or restart of hostapd) of Bryan_RPI4_64_bit_Ubuntu_Server_20.10 (booted 147+ min ago) has suddenly proved VERY Reliable.

    It is holding 19-to-23 WiFi client devices (and correctly renewing their DHCP leases[*] perfectly every hour, thanks to dnsmasq) for the entire duration. Not that the WiFi client's browsers are actively pulling a lot of traffic, but still this current "rock solid" steady state is something impressive and new:

    root@box:/opt/iiab/iiab# hostapd_cli -i ap0 list_sta | wc -l
    20
    

    [*] Thanks to the 1-hour lease time hard-coded in /etc/dnsmasq.d/iiab.conf as follows: (Line 17)

    dhcp-range=172.18.100.1,172.18.126.254,1h

    ...per https://github.com/iiab/iiab/blob/master/roles/network/templates/network/dnsmasq.conf.j2#L21 -- which matches the dnsmasq default, explained by @jvonau.

  2. Earlier boots show what @jvonau suspects is a regression in recent kernel or similar, speculating "Why the handshake failed is up for discussion.. Slow bus or slow firmware." [7.45.18 in this case]

    The failure happens when "there's a timeout present in the same line that there is issues" e.g. EAPOL lines in /var/log/hostapd.log and /var/log/syslog? @jvonau explains: "Gets four chances to make the handshake and then gives up."

@jvonau
Copy link
Contributor

jvonau commented Mar 2, 2021

Notes for the future, https://www.spinics.net/lists/linux-wireless/msg208259.html mentions unaligned data on the sdio bus while
https://www.spinics.net/lists/linux-wireless/msg208028.html is the fix for the brcmsmac driver, but we use brcmfmac, off to check the source for brcmfmac

@holta
Copy link
Member Author

holta commented Mar 3, 2021

Anybody wanting to change hostapd_secure: False in /etc/iiab/local_vars.yml, please see this WARNING about RPi internal hotspot passwords in general: PR #2697

(Likely the situation is exacerbated by high-capacity firmware put in place by wifi_hotspot_capacity_rpi_fix: True also in /etc/iiab/local_vars.yml — keeping in mind that the vast majority of IIAB schools/libraries/etc prefer to have no-password [hostapd_secure: False] on their high-capacity [wifi_hotspot_capacity_rpi_fix: True] internal hotspots — which is just a reminder that these 2 community defaults are set as they are to service this predominant use case, by popular demand and for very good reason!)

@holta holta changed the title WiFi client machines failing to (re)connect to RPi 4 in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] WiFi client machines failing to (re)connect to RPi 4 internal hotspot w/ passwords in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output; hostapd generates EAPOL errors] Mar 3, 2021
@holta holta changed the title WiFi client machines failing to (re)connect to RPi 4 internal hotspot w/ passwords in South Africa [usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output; hostapd generates EAPOL errors] WiFi client machines failing to (re)connect to RPi 4 internal hotspot w/ passwords in South Africa [hostapd generates EAPOL errors; usually happens right after "ieee80211 phy0: brcmf_psm_watchdog_notify: PSM's watchdog has fired!" in dmesg output] Mar 3, 2021
@jvonau
Copy link
Contributor

jvonau commented Mar 3, 2021

Should somebody want to pursue the firmware debugging further as root (sudo won't cut it) echo 100006 > /sys/module/brcmfmac/parameters/debug on a running machine one can gather an overwhelming amount of data into syslog.

@holta
Copy link
Member Author

holta commented Mar 3, 2021

We should also try to find out if we can break RPi passworded hotspots with a single WiFi client machine.

i.e. further isolating what conditions reproduce this failure will be extremely easy (and enlightening!) if so — as outlined here:

#2697 (comment)

(Similar to Bryan in South Africa who can break his RPi 4 passworded hotspots within ~2min every time, by actively browsing on just a couple WiFi client devices...quite possibly it's not at all necessary to have 7-to-20 WiFi clients simultaneously connected?!)

@jvonau
Copy link
Contributor

jvonau commented Mar 3, 2021

We should also try to find out if we can break RPi passworded hotspots with a single WiFi client machine.

I have with no breakage, you are free to duplicate the effort.

i.e. further isolating what conditions reproduce this failure will be extremely easy (and enlightening!) if so — as outlined here:

#2697 (comment)

(Similar to Bryan in South Africa who can break his RPi 4 passworded hotspots within ~2min every time, by actively browsing on just a couple WiFi client devices...quite possibly it's not at all necessary to have 7-to-20 WiFi clients simultaneously connected?!)

Summarizing the net effect without gathering the brcmfmac debug data does nothing to advance what is the root cause.

@jvonau
Copy link
Contributor

jvonau commented Mar 4, 2021

raspberrypi/firmware#1522 non mmc os root partition (ie usb boot) corrects slow wifi access? Easy to test, remove sdcard and insert into a sdcard to usb3 adapter and boot from usb.

@jvonau
Copy link
Contributor

jvonau commented Mar 4, 2021

Enable brcmfmac debugging from cmdline.txt and list of options for debugging

@jvonau
Copy link
Contributor

jvonau commented Mar 4, 2021

Theory "brcmfmac: brcmf_sdio_bus_txdata deferring pktq len" should increment before crash presents itself.

@holta
Copy link
Member Author

holta commented Mar 4, 2021

FYI in a separate South African city with a nearly identical RPi 4 IIAB, hotspot passwords were turned off (hostapd_secure: False was put in /etc/iiab/local_vars.yml then ./iiab-network was run). During this live training event earlier this afternoon, 10 WiFi client devices (tablets) were attempted over several hours. We're told that only 3 of those 10 WiFi clients/connections to the RPi 4 proved reliable — and there's some partial evidence for that further below.

Output below cleaned up from grep asso /var/log/syslog | sort -k8 shows 7 total WiFi client machines were involved:

  • 4 devices with WiFi connections of 40-min-of-less
  • 3 devices with WiFi connections of about-2-hours
Mar 4 00:54:18 box hostapd: ap0: STA 02:e9:29:f7:2b:72 IEEE 802.11: associated    [for 0m54s]
Mar 4 00:55:12 box hostapd: ap0: STA 02:e9:29:f7:2b:72 IEEE 802.11: disassociated
Mar 4 00:55:12 box hostapd: ap0: STA 02:e9:29:f7:2b:72 IEEE 802.11: disassociated

Mar 4 01:17:26 box hostapd: ap0: STA 12:2b:53:40:1a:6f IEEE 802.11: associated    [for 40m02s]
Mar 4 01:57:28 box hostapd: ap0: STA 12:2b:53:40:1a:6f IEEE 802.11: disassociated
Mar 4 01:57:28 box hostapd: ap0: STA 12:2b:53:40:1a:6f IEEE 802.11: disassociated
Mar 4 01:57:52 box hostapd: ap0: STA 12:2b:53:40:1a:6f IEEE 802.11: associated    [for 24m46s]
Mar 4 02:22:38 box hostapd: ap0: STA 12:2b:53:40:1a:6f IEEE 802.11: disassociated

Mar 4 01:06:22 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: disassociated
Mar 4 01:06:22 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: disassociated
Mar 4 01:06:38 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: associated    [for 15m43s]
Mar 4 01:22:21 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: disassociated
Mar 4 01:22:21 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: disassociated
Mar 4 01:57:56 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: associated    [for 25m09s]
Mar 4 02:23:05 box hostapd: ap0: STA 36:f8:c2:ea:2e:71 IEEE 802.11: disassociated

Mar 4 00:10:08 box hostapd: ap0: STA 9e:8e:ec:6b:b3:78 IEEE 802.11: associated    [for 116m20s total?]
Mar 4 01:33:29 box hostapd: ap0: STA 9e:8e:ec:6b:b3:78 IEEE 802.11: associated
Mar 4 02:06:28 box hostapd: ap0: STA 9e:8e:ec:6b:b3:78 IEEE 802.11: disassociated
Mar 4 02:06:28 box hostapd: ap0: STA 9e:8e:ec:6b:b3:78 IEEE 802.11: disassociated

Mar 4 00:09:58 box hostapd: ap0: STA ba:e4:53:73:5f:9b IEEE 802.11: associated    [for 119m30s total?]
Mar 4 01:36:34 box hostapd: ap0: STA ba:e4:53:73:5f:9b IEEE 802.11: associated
Mar 4 02:09:28 box hostapd: ap0: STA ba:e4:53:73:5f:9b IEEE 802.11: disassociated
Mar 4 02:09:28 box hostapd: ap0: STA ba:e4:53:73:5f:9b IEEE 802.11: disassociated

Mar 4 00:11:40 box hostapd: ap0: STA dc:89:83:c3:ec:57 IEEE 802.11: associated    [for 17m45s]
Mar 4 00:29:25 box hostapd: ap0: STA dc:89:83:c3:ec:57 IEEE 802.11: disassociated
Mar 4 00:29:25 box hostapd: ap0: STA dc:89:83:c3:ec:57 IEEE 802.11: disassociated

Mar 4 00:09:44 box hostapd: ap0: STA fe:45:5f:72:88:e6 IEEE 802.11: associated    [for 133m48s total?]
Mar 4 01:33:29 box hostapd: ap0: STA fe:45:5f:72:88:e6 IEEE 802.11: associated
Mar 4 02:23:32 box hostapd: ap0: STA fe:45:5f:72:88:e6 IEEE 802.11: disassociated

@jvonau
Copy link
Contributor

jvonau commented Mar 5, 2021

I see a couple of reboots with logging being out of order

Mar 4 01:17:12 box vmunix: [ 0.000000] Booting Linux on physical CPU 0x0
Mar 4 02:17:13 box vmunix: [ 0.000000] Booting Linux on physical CPU 0x0

FYI in a separate South African city with a nearly identical RPi 4 IIAB, hotspot passwords were turned off (hostapd_secure: False was put in /etc/iiab/local_vars.yml then ./iiab-network was run).

With a reboot of iiab afterwards right? I'm going to assume the iiab was shutdown to move it.After the change to no passwords before shipping the client's used were to told to "forget network" and then re-select the desired SSID and everything was working OK? Given this is a different location how does the RF spectrum differ? Known good power supply? Capturing 'the why' is always the hardest and without out being at a console I can't poke for info... got to go for now I being a bad hockey fan...

@holta
Copy link
Member Author

holta commented Mar 5, 2021

I see couple of reboots with logging being out of order

Good catch. And as a result of /etc/cron.hourly/fake-hwclock auto-saving the clock to disk 17min past every hour, time records very often (appear to) begin 17min past the hour.

@fisherbryan: can you please keep test machines plugged into the Internet whenever possible? So their clocks get the correct time (from the Internet, right after boot). Which will make WiFi logging far more meaningful & understandable.

@holta
Copy link
Member Author

holta commented Mar 5, 2021

A direct report from the woman running the training in South Africa yesterday afternoon, to clarify the participants' problematic experience:

  • 8 of 10 tablets worked initially, after she rebooted tablets repeatedly trying to get their WiFi connections to the RPi 4 IIAB to work (also to clear our each tablet's prior password to that SSID, as they ran yesterday's training without the RPi 4's usual hotspot password, i.e. by setting hostapd_secure: False in the RPi 4 IIAB server and then running ./iiab-network).
  • 3 of those 8 tablets soon stopped connecting to the RPi 4's hotspot, leaving 5 of 10 operational for the duration of the training.
  • She had also opened another box of 10 tablets to see if that might help establish more stable WiFi connections (always turning off all tablets that were not in use, to avoid any risk of overloading the RPi 4's hotspot) but these other tablets did not work any better.
  • "Couldn't Load Web Page" is the error message she mentioned repeatedly.

Background: all tablets were "Point of View (China) with Android 10".

@holta
Copy link
Member Author

holta commented Mar 12, 2021

FYI others (including @fisherbryan in South Africa) attempted over the past week, but have been unable to reproduce the issue of many tablets losing their WiFi connections to RPi 4's internal hotspot (when hostapd_secure: False).

  1. The original issue of RPi 4 internal hotspots kicking off all WiFi clients (when hostapd_secure: True) remains however. As of yet, it's quite unclear what conditions cause this (several outside of South Africa have been completely unable to reproduce this, even with 20 WiFi client machines, so more on-location specifics are critical).

    One Possible Pattern: As reborted by @fisherbryan 10 days ago, clicking on certain Kolibri links (IIAB links or actual Internet links) appears to possibly lead rapidly to the failure of RPi 4's internal hotspot?

  2. Separately: there is some issue with iiab_gateway_enabled: True appearing very slow (4mbit/sec or less), intermittently and/or not working at times (or with certain client machines?) As RPi 4's WiFi client devices...try to browse Public Internet web site(s?) @fisherbryan: please provide cause-and-effect context, including:

    1. Which tablets and how many are connected to the RPi 4's hotspot during testing?
    2. How many work (earlier you showed me 4 mbit/sec with an Internet bandwidth test) and how many do not work?
    3. Run "ping 8.8.8.8" from a NUC-or-laptop WiFi client device-to-RPi4IIAB-to Internet, and share the output (times in milliseconds?)
    4. Speed of files downloaded from IIAB-to-WiFi client device? (in mbit/sec)
    5. Speed of LTE router-DIRECTLY-to-WiFi client device, tested repeatedly? (in mbit/sec)
    6. Speed of LTE router-to-RPi4IIAB-to-WiFi client device, tested repeatedly? (in mbit/sec)

@holta
Copy link
Member Author

holta commented Mar 19, 2021

This might be only very tangentially-related, but here is @fisherbryan's kolibri.txt log file from an IIAB on its way to deployment, that might/maybe shed a bit of light on one-or-the-other of those WiFi issues we were working on: (on-ticket above, 2-3 weeks ago)

https://paste.ubuntu.com/p/tfjBd525n6/

@jvonau
Copy link
Contributor

jvonau commented Jul 21, 2021

close

@holta holta closed this as completed Jul 21, 2021
@holta holta modified the milestones: 8.0, 7.2 Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants