Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCPDISCOVER every 5 minutes, ignores server lease time #7020

Open
4 of 6 tasks
brownleeb opened this issue Jan 17, 2020 · 13 comments
Open
4 of 6 tasks

DHCPDISCOVER every 5 minutes, ignores server lease time #7020

brownleeb opened this issue Jan 17, 2020 · 13 comments

Comments

@brownleeb
Copy link

brownleeb commented Jan 17, 2020

Basic Infos

  • This issue complies with the issue POLICY doc.
  • I have read the documentation at readthedocs and the issue is not addressed there.
  • I have tested that the issue is present in current master branch (aka latest git).
  • I have searched the issue tracker for a similar issue.
  • If there is a stack dump, I have decoded it.
  • I have filled out all fields below.

Platform

  • Hardware: ESP-8266
  • Core Version: 2.6.1
  • Development Env: N/A
  • Operating System: N/A

Settings in IDE

  • N/A

Problem Description

I was sent here from the Tasmoda wiki.
I am still investigating, but figured I would post while I worked on it.

I am using Tasmota-lite (v8.1) on a Gosund WP3. When I clear settings (RESET 5), the device gets a DHCP address, and seems to follow the server lease time (24h in my case). However, after the first lease renewal approx 12h later, it requests a lease approx every 5 minutes. Rebooting does not go back to the server config lease time, it immediately starts with the 5 minute renewals...

One weird thing may be that it send DHCPREQUEST and immediately a DHCPDISCOVER Then the handshake proceeds. It looks like it might be requesting a lease renewal (without going through the whole DORA process) but it doesnt wait long enough for the server to answer the initial request. The server ACKs the request, but the device has already sent the DISCOVER.

I have verified with wireshark that the lease time being sent to the device is correct. The device just seems to ignore it after the first renewal...

I am using dnsmasq (v2.80) on RaspberryPi running arch (Linux dns 4.19.76-1-ARCH #1 SMP PREEMPT Tue Oct 8 03:34:09 UTC 2019 armv7l GNU/Linux). My Tasmoda devices are the only one exhibiting this behavior. All over devices wait the standard "one half of lease time" to start requesting again. I have verified with wireshark that the devices are being given the proper initial lease time.
I suspect that not a lot of people are checking their DHCP logs (probably on a networking device) so not sure how many even know this is happening...

MCVE Sketch

#include <Arduino.h>

void setup() {

}

void loop() {

}

Debug Messages

Debug messages go here
@TD-er
Copy link
Contributor

TD-er commented Jan 22, 2020

This is interesting, as I am also looking for DHCP issues here on the network.
Do you happen to know what timeout the ESP may use for answers to a DHCP request?

The reason I am asking this is that the Ziggo Connect box I have here does seem to reply quite slow to DHCP renewals (lease time set to 3600 sec here) and it does clearly show that the ESP reacts terribly slow on some other network requests to the ESP itself. (5 seconds longer than usual)
Also the Connect box does not list the ESP node in the list of active leases although this ESP is still connected and reacting to calls to its IP.
So if this DHCP is somewhat blocking (or maybe something else is happening here, like lost ARP cache?) then that would explain why it is terribly slow for some requests.

N.B. the DHCP server on this Connect box is a bit strange/buggy, as some other devices like my Windows laptop sometimes cannot get an IP.

Edit:
I am pretty sure the delay I am seeing here is not related to ARP cache, as the added delay is 5 seconds for almost any new URL I am accessing, even within seconds from each other.
The Connect box still doesn't list the ESP in its DHCP table.

@brownleeb
Copy link
Author

This is what I am seeing in my dnsmasq log:
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPREQUEST(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPACK(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0 sonoff-3280
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPDISCOVER(eth0) 2c:f4:32:a3:2c:d0
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPOFFER(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPREQUEST(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:24:38 dns dnsmasq-dhcp[10869]: DHCPACK(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0 sonoff-3280
Jan 22 11:29:48 dns dnsmasq-dhcp[10869]: DHCPDISCOVER(eth0) 2c:f4:32:a3:2c:d0
Jan 22 11:29:48 dns dnsmasq-dhcp[10869]: DHCPOFFER(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:29:48 dns dnsmasq-dhcp[10869]: DHCPREQUEST(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:29:48 dns dnsmasq-dhcp[10869]: DHCPACK(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0 sonoff-3280
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPREQUEST(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPACK(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0 sonoff-3280
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPDISCOVER(eth0) 2c:f4:32:a3:2c:d0
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPOFFER(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPREQUEST(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0
Jan 22 11:34:55 dns dnsmasq-dhcp[10869]: DHCPACK(eth0) 192.168.75.33 2c:f4:32:a3:2c:d0 sonoff-3280

This is a renewal request, note the DHCPREQUEST that includes the address the devices wishes to renew. The server responds back with the DNSACK, but the devices starts back on a full request. It is important that when I although in the log, the server ACKs before the device resends the DISCOVER, in a PCAP, I can see that the device resends the DISCOVER just a moment before the server sends the ACK. By this, I am only guessing that the device is not waiting long enough for its renewal ACK to come back.

I am reviewing the code trying to follow the processing thread. So far, I dont see where there are any problems...

@SynAckFin
Copy link

Does your DHCP server set the T1 and T2 values in the response?
If it does and it sets them to the same value then that might cause the client to send a discover immediately after the request.

@brownleeb
Copy link
Author

To clarify, it is only my ESP8266 devices that have this issue. All of the other devices on my network follow the renewal process correctly....
Perhaps simplest is to show this, to get an idea of the timeframe we are talking here:
Untitled1

Both the initial ACK (to the first REQUEST) and the final OFFER set the following:
Lease Time: 10 Days (correct)
Renewal Time (T1): 5 Days
Rebinding Time (T2): 8.75 Days

@SynAckFin
Copy link

Other devices running different code might change or ignore T1 and T2 values passed by the server resulting in different behaviour for the ESP8266 devices.

The DHCP REQUEST is being broadcast from 0.0.0.0 which should only happen in response to an OFFER. When sent to renew a lease (after T1) it should be sent from its own IP address directly to the DHCP server that gave it the lease. When sent to rebind (after T2) it should be broadcast from its own IP address.

I've had issues with devices running tasmota that would reboot every 5 minutes and produce similar output in the DHCP logs. I moved the devices to a different AP on the same network and the problem disappeared. I suspect that I was hitting some sort of limit on the original AP, due to the number of WiFi devices I had connected, and this was causing the ESP device to reboot.

@SawKyrom
Copy link

Similar issue here with NodeMCU running WLED that uses user_interface.h. No other ESP8266 on network has this issue and I'm assuming it may be related.

image

Issue continues with Node despite blank bin flash and different sketch not using extern "C" {
#include <user_interface.h>}. The CWMODE was retained as either AP or both. I was able to change the mode with wifi setup "WiFi.mode(WIFI_STA);" command to get the device to stop broadcasting an SSID. Since then, it appears to have stopped the boot loop and maintains a connection. However, I don't know if this is related or even a solution.

@dwillmore
Copy link

I ran into this issue with an WLED device. My situation may be different, so I'll outline it here.

I started with a network with an ISC DHCP server using a 24h TTL for leases. I needed to reallocate address spaces in my network, so I set the TTL to 300s (10m) and left it for over a day to ensure old leases were either expired or active hosts were on the 10m cycle. After reallocating the address spaces, I monitored the network to ensure everyone moved to their new addresses. Once that was complete, I changed the TTL to 24h and monitored the logs to ensure everyone transitioned to the new TTL.

It was at this point that the WLED device started to misbehave. It started to ask for a lease renewal every 300s. Neither a warm restart nor a cold restart fixed the bahavior. The eventual workaround was to configure the WLED device in its GUI to a static allocation and then back to a dynamic allocation. At that point, the device did a DISCOVER and accepted the lease it was given--with the 24h TTL--and hasn't reverted to the 300s renewal timing. I will continue to monitor it.

As an asside, there are two other ESP8266 devices on the network. One is a simple clock written in the arduino environment (with some older version of this code) and the other is running esp-link v3.0.14-g963ffbb. Neither of these devices exibited this issue.

It's unfortunate that I picked 10m as my 'short' TTL value as that means I can't tell if the 5m time is a product of my TTL or something innate in the library like the original issue poster observed.

@richard-scott
Copy link

richard-scott commented Jan 22, 2023

I have WLED, Tasmota and WThermostatBeca devices and all 3 suffer the same problem of dropping from WiFi when using DHCP. ESP32 seems to suffer this less, but the ESP8266 is more problematic.

Things seem to have got worse since upgrading my APs to WiFi 6E ones.
I've not tried static IP's, I'll look at that.

@richard-scott
Copy link

I am thinking that having a slow DHCP server may be part of it. I've moved my DHCP service from the very slow NAS box it was on to a different (more powerful) system. So far things are looking more stable but only time will tell.

Maybe the part of Arduino code that asks for and waits for a DHCP response may be part of the issue? Does anyone know if there Is there a short timeout in that area? If it's not the same code used for WiFi & Ethernet DHCP clients, for me the issue is on WiFi and not Ethernet.

@d-a-v
Copy link
Collaborator

d-a-v commented Jan 22, 2023

Lease time is defined by your DHCP server.
Rebind timeout is a percentage of this lease time.

Try to start one of your devices with enabled debug output so more informative messages can help understand.

@richard-scott
Copy link

@d-a-v I wasn't talking about the lease time, but the time between a dhcp client asking for a lease and it getting sent one from the dhcp server. Surely there must be a timeout for that?

Anyway, I've figured out that my devices only like a 2min lease time. Anything higher than that and they disconnect after a while. It is strange that they manage the first few lease renewals, and then at some point during the day, they will drop off the network.

This has only started happening since I upgraded to a WiFi 6E mesh network. I have TP-Link Deco XE75s, and I've also tried Eero 6E devices with the same problems. Currently I have the XE75s with Beamforming off, and Fastroam off, and a 2m dhcp lease time and this is the most stable I can get it.

@SynAckFin
Copy link

The DHCP RFC is here.
The basic communication is:
DHCPDISCOVER(client) -> DHCPOFFER(server) -> DHCPREQUEST(client) -> DHCPACK(server).

The DHCPREQUEST and DHCPACK are repeated when it comes to renewal time. By default the renewal time (known as T1) is half the lease time but can be specified by the server. If it hasn't renewed by time T2 (usually 0.875 * lease time) then it will start broadcasting DHCPDISCOVER requests.

If you can capture the traffic and you see this pattern then your DHCP is working.

What you may be experiencing is the WiFi connection dropping for a couple of seconds forcing the device to reset the interface and causing the DHCP to start again.

If I am remembering correctly then the WiFi library on the ESP devices used to save some metrics for the WiFi connection so as to improve the connection and help with stability. Of course, if you change the AP then the old metrics are no longer valid and could maybe cause instability (I don't know enough to be sure). On the devices that allow you, you could try wiping the permanent storage and reconfiguring from scratch to see if that resolves the issue.

 --------                               -------
|        | +-------------------------->|       |<-------------------+
| INIT-  | |     +-------------------->| INIT  |                    |
| REBOOT |DHCPNAK/         +---------->|       |<---+               |
|        |Restart|         |            -------     |               |
 --------  |  DHCPNAK/     |               |                        |
    |      Discard offer   |      -/Send DHCPDISCOVER               |
-/Send DHCPREQUEST         |               |                        |
    |      |     |      DHCPACK            v        |               |
 -----------     |   (not accept.)/   -----------   |               |
|           |    |  Send DHCPDECLINE |           |                  |
| REBOOTING |    |         |         | SELECTING |<----+            |
|           |    |        /          |           |     |DHCPOFFER/  |
 -----------     |       /            -----------   |  |Collect     |
    |            |      /                  |   |       |  replies   |
DHCPACK/         |     /  +----------------+   +-------+            |
Record lease, set|    |   v   Select offer/                         |
timers T1, T2   ------------  send DHCPREQUEST      |               |
    |   +----->|            |             DHCPNAK, Lease expired/   |
    |   |      | REQUESTING |                  Halt network         |
    DHCPOFFER/ |            |                       |               |
    Discard     ------------                        |               |
    |   |        |        |                   -----------           |
    |   +--------+     DHCPACK/              |           |          |
    |              Record lease, set    -----| REBINDING |          |
    |                timers T1, T2     /     |           |          |
    |                     |        DHCPACK/   -----------           |
    |                     v     Record lease, set   ^               |
    +----------------> -------      /timers T1,T2   |               |
               +----->|       |<---+                |               |
               |      | BOUND |<---+                |               |
  DHCPOFFER, DHCPACK, |       |    |            T2 expires/   DHCPNAK/
   DHCPNAK/Discard     -------     |             Broadcast  Halt network
               |       | |         |            DHCPREQUEST         |
               +-------+ |        DHCPACK/          |               |
                    T1 expires/   Record lease, set |               |
                 Send DHCPREQUEST timers T1, T2     |               |
                 to leasing server |                |               |
                         |   ----------             |               |
                         |  |          |------------+               |
                         +->| RENEWING |                            |
                            |          |----------------------------+
                             ----------
          Figure 5:  State-transition diagram for DHCP clients

@TD-er
Copy link
Contributor

TD-er commented Jan 23, 2023

If you suspect this to be related to newer APs, which do in some way support WiFi mesh (doesn't have to be enabled), then you should really try connecting the ESP using 802.11g mode and not 802.11n.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants