Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First UDP packet always lost #3481

Closed
AndreKR opened this issue Jul 31, 2017 · 34 comments · Fixed by #5978
Closed

First UDP packet always lost #3481

AndreKR opened this issue Jul 31, 2017 · 34 comments · Fixed by #5978

Comments

@AndreKR
Copy link

AndreKR commented Jul 31, 2017

Hardware: ESP-12F
Core Version: 2.3.0

I'm sending four UDP packets, with payload "aaa", "bbb", "ccc" and "ddd".

The first of those packets never appears on the receiving machine. It happens absolutely every time, the first packet is always missing.

void setup() 
{ 
  Serial.begin(115200);
  Serial.println("Start");
  Serial.println("Re-setting Wifi credentials");
  WiFi.begin(ssid, password); 

  while(WiFi.status() != WL_CONNECTED)
    delay(50);
  
  Serial.println("My IP address:");
  Serial.println(WiFi.localIP());
  Serial.println("Sending");
  Serial.println(Udp.beginPacket("192.168.10.2", 7827));
  Serial.println(Udp.print("aaa"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.10.2", 7827));
  Serial.println(Udp.print("bbb"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.10.2", 7827));
  Serial.println(Udp.print("ccc"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.10.2", 7827));
  Serial.println(Udp.print("ddd"));
  Serial.println(Udp.endPacket());
  delay(100); // TODO: find out when sending is finished
  goToSleep();
}

Serial output:

Start
Re-setting Wifi credentials
My IP address:
192.168.10.196
Sending
1
3
1
1
3
1
1
3
1
1
3
1
Going to sleep

TCP dump on the receiving machine:

21:27:29.474470 IP 192.168.10.196.4097 > 192.168.10.2.7827: UDP, length 3
        0x0000:  4500 001f 0003 0000 8011 a4b4 c0a8 0ac4  E...............
        0x0010:  c0a8 0a02 1001 1e93 000b 76ca 6262 6200  ..........v.bbb.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
21:27:29.476750 IP 192.168.10.196.4097 > 192.168.10.2.7827: UDP, length 3
        0x0000:  4500 001f 0004 0000 8011 a4b3 c0a8 0ac4  E...............
        0x0010:  c0a8 0a02 1001 1e93 000b 74c9 6363 6300  ..........t.ccc.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
21:27:29.477491 IP 192.168.10.196.4097 > 192.168.10.2.7827: UDP, length 3
        0x0000:  4500 001f 0005 0000 8011 a4b2 c0a8 0ac4  E...............
        0x0010:  c0a8 0a02 1001 1e93 000b 72c8 6464 6400  ..........r.ddd.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
@luffykesh
Copy link
Contributor

I also have faced this issue... First, I thought there was some issue with my code, but there is an issue with the library or the wifi chip..

@hridder
Copy link

hridder commented Sep 6, 2017

As a test, try putting a 1 sec delay between UDP sends. It's possible that the library buffers (or drops) the packet until the ARP request/response completes in order to find the receivers MAC address. Note that by definition, UDP is unreliable and packet loss should be expected. It might be surprising but the library might technically be correct if even if it dropped packets until it gets an ARP reply...

@AndreKR
Copy link
Author

AndreKR commented Sep 6, 2017

It seems that even a delay(1) after endPacket() can fix it.

@hridder
Copy link

hridder commented Sep 6, 2017

Which suggests it's the delay required for the ARP response. If so, then there's probably a race between getting the response and your sketch making the next call (either beginPacket() or endPacket(), you'd have to look at the code). Note that if the receiving system is slow, or the network is busy you might find a 1 ms delay isn't quite enough... and you'll lose a packet again. TCP is there to avoid these kind of issues.

In any case, it's probably not a bug.

@liebman
Copy link
Contributor

liebman commented Sep 6, 2017

In order to get more accurate timestamps with NTP I have taken to "ping" the IP address I'm sending to first to force the arp request.

@AndreKR
Copy link
Author

AndreKR commented Sep 6, 2017

Unfortunately the Espressif SDK is not open source, so we can't just look at what's going on. I would have expected endPacket() to block until the packet is sent. Can it be related to having to yield the CPU to the firmware?

@suculent
Copy link
Contributor

suculent commented Sep 6, 2017 via email

@herrold
Copy link
Contributor

herrold commented Sep 6, 2017

@AndreKR said:
--- Unfortunately the Espressif SDK is not open source

I thought that their SDK, which seems to be based on the 15 year old 'lwIP' stack
https://savannah.nongnu.org/projects/lwip/

was here:
https://github.com/espressif/ESP8266_RTOS_SDK

While it purports to have a 'this hardware only' MIT License, I don't really see how that would not be a derivative work from the plain, non-restricted lwIP one

@AndreKR
Copy link
Author

AndreKR commented Sep 7, 2017

@herrold
The ARP resolution happens in this netif->output() function and if I see this correctly that comes from libnet80211.a which is binary and no one knows where it comes from.

@suculent It's unclear to me how to use it with Arduino.

@devyte
Copy link
Collaborator

devyte commented Nov 13, 2017

@AndreKR is this still valid with latest git? with latest git and lwip2? (you can select lwip2 from the Arduino IDE menu).

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Nov 13, 2017
@AndreKR
Copy link
Author

AndreKR commented Nov 17, 2017

How to use the git version?
I deleted
%LOCALAPPDATA%\Arduino15\packages\esp8266\hardware\esp8266\2.3.0
and instead downloaded the master branch and put it in %LOCALAPPDATA%\Arduino15\packages\esp8266\hardware\esp8266com\esp8266
as described in the README. Then I cd'ed into tools and ran python get.py which completed without errors.
When I now start the Arduino IDE, there are no ESP8266 board available in Tools -> Boards.

@igrr
Copy link
Member

igrr commented Nov 17, 2017 via email

@AndreKR
Copy link
Author

AndreKR commented Nov 18, 2017

Yes, this issue still happens with the current git master and "IwIP Variant" set to "v2", which was the default.

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 18, 2017

On current master, I don't have the issue with lwip2.
endPacket() returns 0 if the packet is not sent. Logs below.

However it indeed does not work with lwip1.4.
Lots of packets are lost and endPacket() returns always 1. Only the two or three last packets are received. Like if the new packets would overwrite the still not sent previous ones.

Here is my complete sketch.
Can you double check with lwip2, and, if the problem persists, post your complete sketch ?

#include <ESP8266WiFi.h>
#include <WiFiUdp.h>

WiFiUDP udp;

void setup()
{
  Serial.begin(115200);
  Serial.println("Start");
  WiFi.begin("my", "wifi");

  while (WiFi.status() != WL_CONNECTED)
    delay(50);

  udp.begin(1234);

  Serial.println("My IP address:");
  Serial.println(WiFi.localIP());
  Serial.println("Sending");

  char* data = "xxx";

  for (int i = 0; i < 10; i++)
  {
    data[0] = 'a' + i;
    Serial.print(data);
    Serial.print(" ");
    Serial.print(udp.beginPacket("192.168.1.8", 7827));
    Serial.print(" ");
    Serial.print(udp.print(data));
    Serial.print(" ");
    Serial.print(udp.endPacket());
    Serial.println(" ");
  }
  //delay(100); // TODO: find out when sending is finished
  //goToSleep();
}

void loop() {
  // put your main code here, to run repeatedly:

}
Start
My IP address:
192.168.1.239
Sending
axx 1 3 1 
bxx 1 3 1 
cxx 1 3 1 
dxx 1 3 1 
exx 1 3 1 
fxx 1 3 1 
gxx 1 3 1 
hxx 1 3 1 
ixx 1 3 0 
jxx 1 3 0 
15:58:16.882875 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 0007 0000 ff11 377f c0a8 01ef  E.........7.....
        0x0010:  c0a8 0108 04d2 1e93 000b 7eb2 6178 7800  ..........~.axx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.883610 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 0008 0000 ff11 377e c0a8 01ef  E.........7~....
        0x0010:  c0a8 0108 04d2 1e93 000b 7db2 6278 7800  ..........}.bxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.884651 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 0009 0000 ff11 377d c0a8 01ef  E.........7}....
        0x0010:  c0a8 0108 04d2 1e93 000b 7cb2 6378 7800  ..........|.cxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.889014 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 000a 0000 ff11 377c c0a8 01ef  E.........7|....
        0x0010:  c0a8 0108 04d2 1e93 000b 7bb2 6478 7800  ..........{.dxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.889689 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 000b 0000 ff11 377b c0a8 01ef  E.........7{....
        0x0010:  c0a8 0108 04d2 1e93 000b 7ab2 6578 7800  ..........z.exx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.897363 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 000c 0000 ff11 377a c0a8 01ef  E.........7z....
        0x0010:  c0a8 0108 04d2 1e93 000b 79b2 6678 7800  ..........y.fxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.903304 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 000d 0000 ff11 3779 c0a8 01ef  E.........7y....
        0x0010:  c0a8 0108 04d2 1e93 000b 78b2 6778 7800  ..........x.gxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
15:58:16.905036 IP 192.168.1.239.1234 > 192.168.1.8.7827: UDP, length 3
        0x0000:  4500 001f 000e 0000 ff11 3778 c0a8 01ef  E.........7x....
        0x0010:  c0a8 0108 04d2 1e93 000b 77b2 6878 7800  ..........w.hxx.
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............

@AndreKR
Copy link
Author

AndreKR commented Nov 18, 2017

Sketch:

#include <ESP8266WiFi.h>
#include <WiFiUdp.h>
#include "config.h"

WiFiUDP Udp;

void goToSleep() {
  Serial.println("Going to sleep");
  ESP.deepSleep(2 * 60 * 1000000);
}

void setup()
{
  Serial.begin(115200);
  Serial.println("Start");
  Serial.println("Re-setting Wifi credentials");
  WiFi.begin(SSID, PASSWORD); 

  while(WiFi.status() != WL_CONNECTED)
    delay(50);

  Serial.println("My IP address:");
  Serial.println(WiFi.localIP());
  Serial.println("Sending");
  Serial.println(Udp.beginPacket("192.168.1.100", 1234));
  Serial.println(Udp.print("aaa"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.1.100", 1234));
  Serial.println(Udp.print("bbb"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.1.100", 1234));
  Serial.println(Udp.print("ccc"));
  Serial.println(Udp.endPacket());
  Serial.println(Udp.beginPacket("192.168.1.100", 1234));
  Serial.println(Udp.print("ddd"));
  Serial.println(Udp.endPacket());
  delay(100); // TODO: find out when sending is finished
  goToSleep();
}

void loop() {}

Serial output (without bootup gibberish):

Start
Re-setting Wifi credentials
My IP address:
192.168.1.3
Sending
1
3
1
1
3
1
1
3
1
1
3
1
Going to sleep

Dump:

listening on enp2s0, link-type EN10MB (Ethernet), capture size 9999 bytes
22:45:56.675077 IP 192.168.1.3.49154 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 0006 0000 ff11 3810 c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 c002 04d2 000b f2e8 6262 623f  ...d........bbb?
        0x0020:  aa0a 8be3 17f7 1fdd f4d8 ef00 7793       ............w.
22:45:56.675913 IP 192.168.1.3.49154 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 0007 0000 ff11 380f c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 c002 04d2 000b f0e7 6363 6386  ...d........ccc.
        0x0020:  6b33 dc69 0272 1002 84b9 4900 0204       k3.i.r....I...
22:45:56.676941 IP 192.168.1.3.49154 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 0008 0000 ff11 380e c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 c002 04d2 000b eee6 6464 6457  ...d........dddW
        0x0020:  545a 72cf 8463 4bc2 e288 7300 1703       TZr..cK...s...

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 18, 2017

@AndreKR it works for me. I receive aaa with lwip2.
I also confirm that I receive only the three last with lwip1.4.
Please add #include <lwip/init.h> and Serial.println(LWIP_VERSION_MAJOR); in your sketch (it shows 1 or 2). So you can verify you indeed run with lwip2.

@AndreKR
Copy link
Author

AndreKR commented Nov 18, 2017

Strange. I added

  Serial.println("LwIP version:");
  Serial.println(LWIP_VERSION_MAJOR);

Output:

LwIP version:
2

But still no "aaa" in the dump.

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 18, 2017

Agreed, strange. Can you try my sketch ?

@AndreKR
Copy link
Author

AndreKR commented Nov 18, 2017

Output:

Start
My IP address:
192.168.1.3
Sending
axx 1 3 1 
bxx 1 3 1 
cxx 1 3 1 
dxx 1 3 1 
exx 1 3 1 
fxx 1 3 1 
gxx 1 3 1 
hxx 1 3 1 
ixx 1 3 1 
jxx 1 3 1 

Dump:

23:45:09.571951 IP 192.168.1.3.1234 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 000c 0000 ff11 380a c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 04d2 04d2 000b 9203 6878 78c2  ...d........hxx.
        0x0020:  37b3 ae20 45f4 bbe4 5b44 4d00 9ded       7...E...[DM...
23:45:09.573466 IP 192.168.1.3.1234 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 000d 0000 ff11 3809 c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 04d2 04d2 000b 9103 6978 7830  ...d........ixx0
        0x0020:  57f2 41b0 d3ff a9a9 d785 d500 1703       W.A...........
23:45:09.575220 IP 192.168.1.3.1234 > 192.168.1.100.1234: UDP, length 3
        0x0000:  4500 001f 000e 0000 ff11 3808 c0a8 0103  E.........8.....
        0x0010:  c0a8 0164 04d2 04d2 000b 9003 6a78 7866  ...d........jxxf
        0x0020:  ea8d 413b 934f da62 317f b400 1603       ..A;.O.b1.....

Yes, that's the complete dump. :)

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 18, 2017

I have no clue. I sent you a mail to do further tests.

Can someone else try this with master/lwip2 and report back ?
edit: I just redownloaded a brand new master with tools again, and it is still working.
edit2: I am on linux and tried on my son's windows gaming pc from scratch, works too.

@igrr
Copy link
Member

igrr commented Nov 19, 2017

The issue might be caused by the link layer dropping the packet after netif output function has returned OK. The fact that this happens in one setup (@AndreKR's) and doesn't happen in the other (@d-a-v's) might be caused by different WiFi data rates, for example (send queue overflows in one case but not the other).

@AndreKR
Copy link
Author

AndreKR commented Nov 19, 2017

@d-a-v I responded to your mail.

By the way, when I add yield() at the end of the loop, I get the packets fxx, gxx, hxx, ixx and jxx, so 2 more. When I instead add delay(1) at the end of the loop, I get all 10 packets.

@d-a-v
Copy link
Collaborator

d-a-v commented Nov 19, 2017

@AndreKR @igrr Thanks, I just misleaded myself believing that the sketch could work in any situation. I had answered in some other issues that UDP must not be used like this, but I just omitted to apply that rule to myself :]

@devyte devyte added this to the 2.6.0 milestone Jan 7, 2018
@devyte devyte added component: network and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels Jan 7, 2018
@devyte
Copy link
Collaborator

devyte commented Jan 7, 2018

@d-a-v since a long time there has been the question of why when sending UDP packets too fast, some are dropped. It is understood that it is expected for some UDP packets to get dropped in transit, but we shouldn't be dropping them before even getting them out the door.
There was a suggestion by @igrr somewhere about checking the pbuf chain length or something to try and figure out if there were queued outgoing packets. I don't remember the details, and I can't find it after a short while searching.
Setting this to 2.6.0 to provide some time to investigate.

@LechnerRobert
Copy link

LechnerRobert commented Mar 13, 2018

is this related to #3095 ??

i have three esp8266 devices and "both" problems..

original Problem was, after a while (some hours), the first UDP packet was not received..
so i send "ping" ever 10 seconds. so it was "more stable"

but (maybe since wlan router firmaware update) it got worth..

sorry to ask, but is there any resolution expectable?

i also think, ping from NAS (my DHCP server) wich does not need ARP
is about 1 Second (500 to 1500ms) (if esp ist not reachable from other PCs)
while it is 4ms when they work "normal"

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 15, 2018

Can you try with WiFi.setSleepMode(WIFI_NONE_SLEEP); in your setup() ?

@mikep01
Copy link

mikep01 commented Feb 12, 2019

Still seems to be a problem in 2.5. If multiple UDP packets are sent just after the wifi connection is established only the last one is actually transmitted. Adding a delay(1000) (my guess would be based on chip or xmit speed as 500 does not work for me) after the first packet allows it to go an then any number following will also work without needing any delay.

It works fine in 2.4.2 without a delay, all packets are sent.

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 7, 2019

Some update,

It happens that when sent too quickly (D=0 below), udp packets are not reaching the output queue (except for the last).
(I use netdump)

#include <ESP8266WiFi.h>
#include <WiFiUdp.h>

WiFiUDP udp;
#define D 100

#include <NetDump.h>
#include <lwipopts.h> // get global handler phy_capture

void dump (int netif_idx, const char* data, size_t len, int out, int success) {
  (void)success;
  Serial.print(out ? F("out ") : F(" in "));
  Serial.printf("%d ", netif_idx);

  // optional filter example: if (netDump_is_ARP(data))
  {
    netDump(Serial, data, len);
    //netDumpHex(Serial, data, len);
  }
}

void setup()
{

  Serial.begin(115200);
  phy_capture = dump;
  Serial.println("Start");
  Serial.println("Re-setting Wifi credentials");
  WiFi.mode(WIFI_STA);
  WiFi.begin(STASSID, STAPSK);

  while (WiFi.status() != WL_CONNECTED)
    delay(50);

  Serial.println("My IP address:");
  Serial.println(WiFi.localIP());
  Serial.println("Sending");

  delay(D);
  Serial.println(udp.beginPacket(IPAddress(10, 0, 1, 7), 7827));
  Serial.println(udp.print("aaa"));
  Serial.println(udp.endPacket());
  delay(D);
  Serial.println(udp.beginPacket(IPAddress(10, 0, 1, 7), 7827));
  Serial.println(udp.print("bbb"));
  Serial.println(udp.endPacket());
  delay(D);
  Serial.println(udp.beginPacket(IPAddress(10, 0, 1, 7), 7827));
  Serial.println(udp.print("ccc"));
  Serial.println(udp.endPacket());
  delay(D);
  Serial.println(udp.beginPacket(IPAddress(10, 0, 1, 7), 7827));
  Serial.println(udp.print("ddd"));
  Serial.println(udp.endPacket());
}

void loop ()
{
}

@aboulfad
Copy link

aboulfad commented Apr 10, 2019

@d-a-v Small progress. I am posting here instead of my issue #5955 as its more relevant.

Observation: ARP behaviour post boot is different between 2.4.2 and master (lwip 2.0.3 & 2.1.2)

In 2.4.2, post esp reset, ARP queries my OTA host roughly 10s after reset. In master (and r2.5.0, 2.5.0b3), there is never an ARP request to my OTA host post boot.

When the OTA host initiates the OTA handshake using UDP FLASH command, esp immediately ARP queries the OTA host and gets a reply, BUT there is never the UDP reply OK back to the OTA host causing OTA to fail first time. This is repeatable 100% and confirmed using wireshark and netdump.

I think @hridder in his comment above #3481 (comment) almost two years ago eluded to this. It may be that UDP or other packet type are/should be dropped if there's no ARP entry for the dest addr. Some systems implement one packet queues.

I can think of few solutions:

  1. Figure out the ARP behaviour in 2.4.2 and see if that could be reused
  2. Do an ARP send/query on the client before attempting to do any other IP comms
  3. Use reliable transport (TCP instead of UDP)

@aboulfad
Copy link

aboulfad commented Apr 11, 2019

ARP_QUEUEING

Looks promising, it was changed to 0 on Nov 27, 2018 in commit d-a-v/esp82xx-nonos-linklayer@ea83a83#diff-1b7c800f71ea90dc45a57dc398b8bf24

The original lwip distros always had ARP_QUEUEING set to 0, but it was set to 1 in v2.4.2 and is 0 in current master lwipopts.h
@d-a-v any ideas why? I will set it to 1 to see if that's what caused the ARP behaviour change. Yes it works 👍

@d-a-v
Copy link
Collaborator

d-a-v commented Apr 11, 2019

@d-a-v any ideas why?

When switching from lwIP-2.0 to lwIP-2.1 there were lots of changes in lwipopts.h. I reworked the whole file to be able to keep track of our configuration and future changes (using diffs or meld as you may have seen). ARP_QUEUEING change is an omission. Thanks for investigating. If this fixes OTA or "First UDP packet always lost", then feel free to PR on lwip2 repository !

@aboulfad
Copy link

aboulfad commented Apr 11, 2019

If this fixes OTA or "First UDP packet always lost", then feel free to PR on lwip2 repository !

It indeed fixes the OTA issue #5955 which is one use case for the "First UDP packet always lost".

Using your torture test above with D=0 & ARP_QUEUEING=0, only the last packet makes it. When ARP_QUEUEING=1, the last three make it because of #define ARP_QUEUE_LEN 3. I dont think its wise to increase this especially for a resource limited esp's. I did a test w ARP_QUEUE_LEN=10, and all four packets made it thru even w D=0.

So in conclusion, we will always have a first UDP packet loss for a client that is transmitting many UDP packets > ARP_QUEUE_LEN before the IP addr are resolved.

A PR on lwip2 repo was created, all that work for one line change, typical ;-)
PS: Very interesting read from RFC 1122 section 2.3.2.2:

     2.3.2.2  ARP Packet Queue

        The link layer SHOULD save (rather than discard) at least
        one (the latest) packet of each set of packets destined to
        the same unresolved IP address, and transmit the saved
        packet when the address has been resolved.

        DISCUSSION:
             Failure to follow this recommendation causes the first
             packet of every exchange to be lost.  Although higher-
             layer protocols can generally cope with packet loss by
             retransmission, packet loss does impact performance.
             For example, loss of a TCP open request causes the
             initial round-trip time estimate to be inflated.  UDP-
             based applications such as the Domain Name System are
             more seriously affected.

@token47
Copy link

token47 commented Nov 9, 2020

I understand the question of the number of packets queued per the ARP_QUEUE_LEN parameter, and also the problem in general (packets will be overwritten if prior one has not been sent out yet because arp was not answered). But none of this seem to really fix the issue. If I understand right, having ARP_QUEUE_LEN=1 only makes the problem go to the second packet, but we still have the problem.

How can I make sending the first packet out block until arp comes back? Or is there a flag that I can check to avoid sending the next until the one before was really sent out? (This way I can do other stuff and only block on the network).

In my tests, this happens in the middle of communications too, it is just a matter of expiring the arp cache and you lose more packets again.

@token47
Copy link

token47 commented Nov 9, 2020

I see there is a function to get entries from the arp cache:

http://www.nongnu.org/lwip/2_0_x/etharp_8c.html#a8038c9e819e16bfdb4c94e2dcdd66784

Maybe there is a way to check if the entry is not there and block? (I can't find a good example of how to use this function).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.