Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server closes connection after client shuts down the sending side of its socket #2569

Closed
chschu opened this issue Oct 2, 2016 · 48 comments · Fixed by #7096
Closed

Server closes connection after client shuts down the sending side of its socket #2569

chschu opened this issue Oct 2, 2016 · 48 comments · Fixed by #7096
Assignees
Labels
component: network type: troubleshooting waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Milestone

Comments

@chschu
Copy link
Contributor

chschu commented Oct 2, 2016

Basic Infos

Hardware

Hardware: ESP-12F on NodeMCU clone (Geekcreit Doit)
Core Version: 2.3.0

Description

A WiFiClient instance (or the underlying TCP/IP lib) obtained from WiFiServer::available() seems to close the TCP connection when it receives a FIN packet. According to the TCP spec, it should go to the CLOSE_WAIT state, which would still allow data to be sent in the opposite direction.

This happens if a client connects to a server running on the ESP, sends some data and shuts down the writing side of the socket before receiving a response. The client socket's reading side is still open in this case, and the server should be able to send a response to the client.

This issue also occurs with ESP8266WebServer, because it uses WiFiServer/Client.

Settings in IDE

Module: NodeMCU 1.0 (ESP-12E Module)
Flash Size: 4MB (3M SPIFFS)
CPU Frequency: 80Mhz
Upload Using: SERIAL

Sketch

#include "ESP8266WiFi.h"

#define WIFI_SSID "..."
#define WIFI_PASS "..."

WiFiServer server(1400);

void setup() {
  Serial.begin(115200);
  delay(10);

  WiFi.begin(WIFI_SSID, WIFI_PASS);
  Serial.print("Connecting to WiFi.");
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.print(".");
  }
  Serial.println("OK");

  server.begin();
}

void loop() {
  WiFiClient client = server.available();
  if (client) {
    Serial.printf("status before delay: %d\n", client.status());
    delay(1000);
    client.write("Hello, World!\n");
    Serial.printf("status after delay: %d\n", client.status());
    client.stop();
    Serial.printf("status after stop: %d\n", client.status());
  }
}

Debug Messages

Client that shuts down the sending side immediately

Shell command:

sleep 0 | nc 192.168.178.42 1400

(sending side is shut down immediately, because input pipe of netcat is closed after sleep 0)

Client output:

<nothing>

Serial output:

status before delay: 4
status after delay: 0
status after stop: 0

(4 is ESTABLISHED, 0 is CLOSED)

Wireshark capture:
screen shot 2016-10-02 at 15 03 01

Client that waits for the server to close the connection (works correctly)

Shell command:

sleep 2 | nc 192.168.178.42 1400

(this gives the server enough time to respond and close the connection)

Client output:

Hello, World!

Serial output:

status before delay: 4
status after delay: 4
status after stop: 0

(4 is ESTABLISHED, 0 is CLOSED)

Wireshark capture:
screen shot 2016-10-02 at 15 02 01

@chschu
Copy link
Contributor Author

chschu commented Oct 2, 2016

Here comes a small C client that connects to the server running on the ESP (change the IP address accordingly), shuts down the sending side of its socket (unless SHUTDOWN_WRITE is 0) and then receives the response:

#include <arpa/inet.h>
#include <stdio.h>
#include <string.h>
#include <sys/errno.h>
#include <unistd.h>

#define SHUTDOWN_WRITE 1

int main() {

    int sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) {
        fprintf(stderr, "Failed to create socket: %s\n", strerror(errno));
        return 1;
    }

    struct sockaddr_in serv_addr;
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = inet_addr("192.168.178.42");
    serv_addr.sin_port = htons(1400);
    if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
        fprintf(stderr, "Failed to connect to host: %s\n", strerror(errno));
        return 1;
    }

#if SHUTDOWN_WRITE
    if (shutdown(sockfd, SHUT_WR) < 0) {
        fprintf(stderr, "Failed to shut down writing side of socket: %s\n", strerror(errno));
        return 1;
    }
#endif

    char response[256];
    while (1) {
        memset(response, 0, sizeof(response));
        int n = read(sockfd, response, sizeof(response) - 1);
        if (!n) {
            // EOF (server closed connection)
            break;
        }
        if (n < 0) {
            fprintf(stderr, "Failed to read from socket: %s\n", strerror(errno));
            return 1;
        }
        printf("%s", response);
    }

    close(sockfd);

    return 0;
}

If the shutdown is enabled, the call to read fails with the error Connection reset by peer, indicating that the server closed the connection. If the shutdown is disabled, the response is received correctly.

@chschu
Copy link
Contributor Author

chschu commented Oct 2, 2016

Looks like the issue is caused by the receive callback recv_ret_t ClientContext::_recv(tcp_pcb* pcb, pbuf* pb, err_t err), which is called by lwIP with pb == 0 if the remote side sends a FIN. Because this function returns ERR_ABRT, lwIP will abort the whole connection.

@igrr
Copy link
Member

igrr commented Oct 2, 2016

This function returns ERR_ABRT because it first calls tcp_abort for the PCB. I think I did this on purpose as a workaround for some issue, but I don't remember what the issue was.

@me-no-dev
Copy link
Collaborator

it's there because that is the only indication we have that the connection is closed from the remote end. Else we will have pcb objects around and fill the maximum of 5 quite easily.
As a matter of fact, I'm currently fighting this same problem on ESP32.
I had the impression that CLOSE_WAIT is to be set on connections when we are the closing side (we send the firs FIN)

@chschu
Copy link
Contributor Author

chschu commented Oct 4, 2016

In the "active close" (we're the closing side) case, the socket will be go to the FIN_WAIT1 state. CLOSE_WAIT is used only for the "passive close" (they're the closing side) case.

After a "passive full close" (both directions closed), the remote side will respond with RST packets to anything we're sending.

After "passive half close" (them->us closed by them, us->them open), the remote side will respond with the normal ACKs.

I don't think there is a way to distinguish the two without sending something to the remote side and handling the RST in case of a "passive full close".

@me-no-dev
Copy link
Collaborator

me-no-dev commented Oct 4, 2016

Thanks for clarifying :)
Now do you have a real use case? Because getting that to work would require changes probably in LwIP, so proper disconnect is correctly handled. If we comment those lines now, every connection that we do not explicitly close, will stay open indefinitely.
And there is no way for us to know when the client actually closed the sockets receiving end.
We can't just send probing data as that would interfere with any protocol.
How does linux handle this?

@chschu
Copy link
Contributor Author

chschu commented Oct 12, 2016

I'm currently building both client and server, so I can just keep both directions of the socket open until the response has been received. The browsers and tools I tested also seemed to behave that way.

Strictly speaking, the server must close the socket explicitly once it is done with the client. If a client dies (power button, network cable), it might never send a FIN, so the server will probably run out of resources if it doesn't close the socket itself.

Most of the time, sockets are closed automatically in the destructor of some higher-level class. If they're not, I'd rather consider it a bug in that higher-level class (or in the application) than in the socket implementation.

I think linux just keeps the socket half-open indefinitely. Running out of resources is not much of an issue there.

@me-no-dev
Copy link
Collaborator

we can have a total of 4 or 5 client connections at any one time on ESP8266. Keepnig unnecessary connections open is really not an option here. In AsyncTCP I have implemented timeouts and so on that can coop with sudden loss of connection, but imagine this: You have 4 connections already open, one drops and then tries to reconnect. It will be stopped in SYN because the ESP has all connections open (I have been there many times)

@devyte
Copy link
Collaborator

devyte commented Oct 12, 2017

@chschu @me-no-dev @igrr
is there anything that can be done here, or should I just close the issue?

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Oct 12, 2017
@me-no-dev
Copy link
Collaborator

This is where it happens: https://github.com/esp8266/Arduino/blob/master/libraries/ESP8266WiFi/src/include/ClientContext.h#L463

A setting could be introduced to not close the connection, but it would be user's responsibility to do so. That is the only solution I can come up with.

@igrr
Copy link
Member

igrr commented Oct 12, 2017 via email

@devyte
Copy link
Collaborator

devyte commented Feb 28, 2018

@igrr @me-no-dev code has advanced, including changes in ClientContext. Is there anything further to look into for this one?

@coddicted
Copy link

coddicted commented Apr 28, 2018

By not sending response to client, client.status() remains 4 but client data doesn't get printed after that. Is it mandatory to close connection after every query by a client?
In AT command mode, i am able to send data from client continuously until the server goes down after a minute or so on its own.

@devyte
Copy link
Collaborator

devyte commented Nov 10, 2019

@d-a-v do you know anything about this?

@d-a-v d-a-v added this to the 2.7.0 milestone Nov 10, 2019
@d-a-v d-a-v self-assigned this Nov 10, 2019
@d-a-v d-a-v added component: network type: troubleshooting and removed waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. labels Nov 10, 2019
@kirillandy
Copy link

Hi all! This has been a very useful thread, thanks for bringing this up @chschu!
One thing I'd like to clarify (in the scenario where we call shutdown) is: does ESP effectively move into the CLOSED state upon receiving a FIN? So it's like it ignores one of the TCP state flows (CLOSE_WAIT -> LAST_ACK) and jumps to CLOSED, which means that calling client.stop will send that RST that @chschu showed in his WireShark capture. I didn't see any FIN and ACK exchange after the first (FIN,ACK) sent by the PC app, so CLOSE_WAIT and LAST_ACK never happened.
As a result, is there any reason for me to even do a client.close in the ESP code? Ivan mentioned that memory deallocation would occur once the client object goes out of scope, so why should I bother with it and also create a useless RST message in the network? Or is that RST actually helpful and it will prevent my Client app on my PC from hanging in the active_close branch (because it was the one that sent the first FIN)
Is my reasoning solid here?

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 18, 2020

I didn't try to follow your reasoning, I have first a simple question for you @kirillandy,
Did you have the issue too ?
Are you able to reproduce it ?
When OP was reported, the core was using lwIP-v1.4. We are now using lwIP-v2.1.2 and that kind of low-level bug is supposed to be managed (and fixed) by the TCP/IP layer. It's now a long time since we didn't have such issue and the latests we had have been fixed in other issues.

@d-a-v d-a-v added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Feb 18, 2020
@kirillandy
Copy link

kirillandy commented Feb 18, 2020

Hi! Yes, I was able to recreate it and it was exactly as @chschu described it. I monitored the whole thing in WireShark just like he did. Although this was a couple months ago when I asked the question. It was a little bit of an inconvenient problem but I managed to change my project concept to bypass the RST packet, but it would be helpful to get a clear explanation of what the issue was about. I believe that the core at the time of my post here was already using LWIP 2 (not sure which version exactly), based on the verbose output of the compiler and linker.
If necessary, I can recreate my experiment and post the source code here (I will also get a chance to check if the issue is still actually there after these 2 months). Will that be convenient?

EDIT: also, looking at my installed Arduino platforms in the Arduino IDE, I've got ESP8266 2.5.2, whereas the lastest one is 2.6.3. Should I update that as well?

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 18, 2020

You can update, but lwIP is I believe the exact same version stable-2.1.2.
There has not been much changes on this side since core-2.5.2.
So yes, it will be convenient to have an MCVE ! Thanks

@kirillandy
Copy link

So here is a simple check. It is somewhat similar to what@chschu did in the original test.
Hardware: A UNO + ESP8266 board (https://robotdyn.com/uno-wifi-r3-atmega328p-esp8266-32mb-flash-usb-ttl-ch340g-micro-usb.html)
Core Version: 2.6.3

IDE Settings
Module: Generic ESP8266 Module
Flash size: 1MB
CPU Freq: 80 Mhz
Upload using SERIAL

Sketch for ESP8266 Server

WiFiServer cicadaServer(1);
WiFiClient serverClient;
void setup()
{
	WiFiClient::setDefaultNoDelay(true);
	WiFiClient::setDefaultSync(true);
	delay(5000);
	Serial.begin(115200);
	WiFi.begin(router_ssid, router_pass, 6);
	while (WiFi.status() != WL_CONNECTED) {
		Serial.print(".");
		delay(500);
	}
	Serial.println("Connected!");
	Serial.print("IP address, gateway: ");
	Serial.println(WiFi.localIP());
	Serial.println(WiFi.gatewayIP());
	cicadaServer.begin();
	while (cicadaServer.status() != LISTEN) {
		delay(100);
	}
}
void loop()
{
Mretry:
	serverClient = cicadaServer.available();
	while (!serverClient) {
		delay(100);
		goto Mretry;
	}
	Serial.println("Got connection!");
	//serverClient.stop();
	while (serverClient.status() != CLOSED) {
		delay(100);
	}
	Serial.println("Closed connection");
	goto Mretry;
}

There are 2 versions of the sketch. 1) where the client is the first to disconnect and 2) where the server is the first to disconnect. To change them, simply uncomment the serverClient.stop() line in the loop() function (commented = sketch 1, uncommented = sketch 2)
I plug in my ESP8266 server and try connecting to it using a netcat client (on the Windows Subsystem for Linux, in case that makes any difference). I also have WireShark running to monitor what is going on with the TCP session.
My PC IP address is 192.168.0.30, my ESP8266 address is 192.168.0.27:1

Closing the connection on the client side
Using sketch 1, I execute echo "one" | nc -q 5 192.168.0.27 1. The server waits until the connection is closed, while the nc client disconnects after about 5 seconds. WireShark sees this activity and logs it. When the client sends a FIN packet, we can see a RST being sent back. This doesn't really go along with the TCP flow diagram
ClientClosesConnWireshark

Closing the connection on the server side
Using sketch 2, I execute sleep 4 | nc -q 5 192.168.0.27 1 to guarantee that the server on my ESP initiates the disconnect. WireShark notices a FIN coming from the server, and from there everything is great, just as it should be according to the TCP specification
ServerClosesConnWireshark

To clarify, my Linux command for this experiment is different because if I use the echo "one" | nc -q 5 192.168.0.27 1 command, I get an understandable RST due to me closing the connection on the server before sending out an ACK for the received data from the "echo"
ServerClosesConnWireshark2
I needed to create a situation where there would be no pending ACKs, so I used "sleep" just like @chschu

@JAndrassy
Copy link
Contributor

could you try to read all the data sent by the client?

@kirillandy
Copy link

You mean to try a serverClient.read() on my ESP after the connection has been terminated (so after the "while serverClient.status() != CLOSED" loop)?

@JAndrassy
Copy link
Contributor

JAndrassy commented Feb 18, 2020

You mean to try a serverClient.read() on my ESP after the connection has been terminated (so after the "while serverClient.status() != CLOSED" loop)?

no. before it.

@kirillandy
Copy link

Oh, oops, I actually did everything before seeing your reply @JAndrassy, but there are some things to note, nonetheless. I don't see where I can put a read before the connection closes. That's just a normal read in the ESTABLISHED state...
So I added a small read after the exchange and there were all my bytes waiting for me to read them!
If I recall correctly, at the time of writing my initial question, a RST would free the read buffer and any bytes which were stored in the server's reception buffer would be lost, which was the problem that gave me real trouble, but now it looks like the RST doesn't free any resources on the ESP side and I can read anything that I still haven't.
Here's the code:

void setup()
{
	WiFiClient::setDefaultNoDelay(true);
	WiFiClient::setDefaultSync(true);

	delay(5000);
	Serial.begin(115200);
	WiFi.begin(router_ssid, router_pass, 6);
	while (WiFi.status() != WL_CONNECTED) {
		Serial.print(".");
		delay(500);
	}
	Serial.println("Connected!");
	Serial.print("IP address, gateway: ");
	Serial.println(WiFi.localIP());
	Serial.println(WiFi.gatewayIP());

	cicadaServer.begin();
	while (cicadaServer.status() != LISTEN) {
		delay(100);
	}
}

void loop()
{
	unsigned int work1;
	unsigned char buf[10];
Mretry:
	serverClient = cicadaServer.available();
	while (!serverClient) {
		delay(100);
		goto Mretry;
	}

	Serial.println("Got connection!");
	//serverClient.stop();
	while (serverClient.status() != CLOSED) {
		delay(100);
	}
	Serial.println("Closed connection");

	delay(5000)
	work1 = serverClient.available();
	if (work1 != 0) {
		if (work1 > 10)
			work1 = 10;
		Serial.println("There are ");
		Serial.print(work1);
		Serial.print(" bytes in the buffer!");
		serverClient.readBytes(buf, serverClient.available());
		Serial.println();
		Serial.write(buf, work1);
	}
	else {
		Serial.println("No bytes in reception buffer!");
	}
	
	goto Mretry;
}

And here is a screenshot showing a couple of tests.
ClientClosesConnWithBufferWireshark

I sent 2 separate messages (so I connected to the server twice). WireShark at the top only shows what happened for the 2nd message (I forgot to open it :)). I read everything in the reception buffer after the connection is aborted. I also added a long delay(5000) to be extra sure.
The RST is still there but as you can see in the Serial comm monitor to the right, I managed to read the message and send it to my PC via Serial. This is great!
So this is similar to what the TCP protocol implies: if the client closes the connection by sending a FIN, the server still has access to bytes that it hasn't read from the reception buffer. The RST is still there but it isn't a big problem now, at least nothing is being lost.
Are there any things I should keep in mind about this change? Like possible memory overflows? For example, when will the read buffer be freed? Do I still need to do serverClient.stop() on my ESP even after the RST happens and the client disconnects? Or am I free to use serverClient together with cicadaServer.available() again without worrying about cleaning things up?

@JAndrassy
Copy link
Contributor

JAndrassy commented Feb 18, 2020

you can read the available data even the peer closed the connection. they are buffered in esp8266. but after stop() you can't read data.

for some time now in ESP8266WiFi library connected() for not secure connection returns false even if data are available. #6701

@kirillandy
Copy link

This didn't use to be the case. I strongly remember the server losing any unread data after the client closes the connection with a FIN which is why I had to change my algorithm so that the server always returned 4 predefined bytes to indicate that it had gotten and read everything from the client. I guess this was changed by version 2.6.3 and I was using 2.5.2. (Sidenote: I couldn't close the connection on the server side (active close) because then it would take a few seconds for me to be able to connect to the server again. I think this is because the server stays in the TIME_WAIT state for 2xMSL. This was inconvenient because I needed the server to be available asap after a session closes, so I could only close connections on the client side).
Also, @JAndrassy, I don't typically use connected(), I prefer to use available() to check data availability and status() for the TCP state, so no problem, all's good.
Just in case, to fully simulate my use case, I will try and repeat this experiment on 2 ESPs (without using netcat).
Another thing to keep in mind, as @chschu mentioned, is if the client terminates its write connection, then he gets a "Connection Reset" error. This is/was also true for Winsock 2 (at the time of posting my original question). Calling shutdown() with the SD_SEND flag (which I assume is exactly the same as the SHUT_WR flag in the C standard library) causes a FIN to be sent (so we move to the PASSIVE CLOSE stage in the TCP diagram, see Remarks at https://docs.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-shutdown). As we have seen, this makes the server on the ESP reply with an RST causing a "Connection Reset" type error (WSAECONNRESET 10054 in Winsock 2). The only trouble this could cause is if you expect the server to send something back after you execute this type of shutdown. However, what we have seen here is very similar to what RFC1122 4.2.2.13 mentions about a "half-duplex" close sequence, where calling CLOSE on the server when there is data in the reception buffer should be followed by an RST. The only difference I can see is that I don't need to actually call "CLOSE" (or stop(), which I assume is the same thing) on my server myself to cause this behavior. It is done automatically the very instant a FIN is received. Maybe the reason for this is because the ESP library doesn't really have a shutdown()-close() function pair like Winsock or Standard C lib. It only has stop() and doesn't provide additional functionality like the SD_SEND or SD_RECEIVE flags. Not to say this is good/bad, the library is awesome, just trying to figure out the difference :)

@kirillandy
Copy link

So, to summarize, I carried out the same experiment with 2 ESPs and got the same result. If the client calls stop(), the server will reset but any unread data will stay in the server reception buffer. Which is fine by me.
The initial problem reported by @chschu might still be there: doing a partial "write" shutdown with the special sockets flags available in Standard C Sockets or Winsock2 will not allow you to receive any data back from the server and will cause a reset, freeing the reception buffer on the client side as well. I don't know if this is the case but it seems highly likely because the actual exchange that we can see in the WireShark trace is exactly the same as it has always been and the reaction we get from Winsock/Std C will probably still be the same.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 19, 2020

This is linked to #6701.

Also, @JAndrassy, I don't typically use connected(), I prefer to use available() to check data availability and status() for the TCP state, so no problem, all's good.

@JAndrassy this is not forgotten by maintainers and is likely to be changed back for core-v3.0.0.

@kirillandy WiFiClient::stop() calls lwIP's tcp_close()=tcp_close_shutdown() (=both sides). But lwIP also has tcp_shutdown(shutdown_rx, shutdown_tx) that we don't use.

Since you have a fine view of the initialization and de-initialization of a TCP link, you might find a way to improve the logic under the hood, or propose new WiFiClient method to half-close the link when relevant.

@kirillandy
Copy link

@d-a-v I can't say I'll be able to offer anything worth considering. On the one hand, this fact does make writing error-checking code using different socket libraries a bit awkward (where a "reset" error is actually not a real error and should be ignored because that's just how the ESP8266 is programmed to work). On the other hand it's not that big of a problem for me and I do have some breathing space.
I am not familiar with the underlying logic and I am a bit reluctant to offer fixes for code that is unknown to me. Regardless, I will give it a try and see how far I can go, but I can't promise anything. I'm just a simple user, and it will take a bit of time researching everything for me to able to contribute anything. Thanks for the offer anyway! I'll keep it in mind.

@kirillandy
Copy link

If I may, I really don't want to flood this issue with posts, but I have just encountered another side-effect of this problem, which I think is the original reason why I posted here in the first place. Even though I have established that the receive buffer is not freed by the server after the client disconnects, I thought that this was just too simple, it was not exactly the problem which I had 2-3 months ago. Apologies for my erratic change in opinion.
I have just done another experiment with my 2 ESPs. Both of them are deliberately blocked by me. Their code will only execute once they receive at least 1 byte over Serial (which I send using a Serial port monitor program on my PC).
The experiment is to see what happens when a client calls .connect() but the server never manages to get the client data using .available().

  1. I send my client 1 byte over Serial (which it uses to choose a message from an array, it could be simpler than this)
  2. The client connects, sends the message and disconnects
  3. I send my server 1 byte over Serial
  4. My sever tries to call server.available() to get the client
  5. It fails to find anything because available returns 0

So this is what I was frustrated about back in November. And it doesn't seem to fit in with what we have discovered these past 2 days: if a client disconnecting does NOT free the server-side buffer AFTER the server.available() was done, why would you free the buffer BEFORE I even execute available?
Here is my code. It is a bit clunky to reproduce so I have also included detailed labelled images of my Serial monitor output. I hope this makes sense. The bottom line is, from where I'm standing, this is EXACTLY the same situation as was demonstrated earlier, but the algorithm is different simply because I didn't call server.available() when the client was connected. But if the underlying logic ALLOWED the connection to go through AND data to be transferred, why not make them available on the server EVEN IF I haven't yet called server.available()? Just pass me the (already closed) Client object, with status == CLOSED and I'll read the data and discard it myself.

Code

//GLOBALS
IPAddress cicadaIP(0x1000A8C0);
WiFiServer cicadaServer(1);
WiFiClient serverClient;
char m1[] = "message1";
char m2[] = "note2";
char m3[] = "a formal invitation 3";
char m4[] = "a strange hieroglyph 4";
char* m[] = { m1,m2,m3,m4 };

//I'm using DHCP, so change the server IP in the client code accordingly
//CLIENT SIDE CODE//
void setup()
{
	WiFiClient::setDefaultNoDelay(true);
	WiFiClient::setDefaultSync(true);

	delay(5000);
	Serial.begin(115200);
	WiFi.begin(router_ssid, router_pass, 6);
	while (WiFi.status() != WL_CONNECTED) {
		Serial.print(".");
		delay(500);
	}
	Serial.println("Connected!");
	Serial.print("IP address, gateway: ");
	Serial.println(WiFi.localIP());
	Serial.println(WiFi.gatewayIP());
}

void loop()
{
	unsigned char buf[10];
Mretry:
	while (Serial.available() == 0) { //Only begin execution when I tell you to by sending a byte over Serial
		delay(200);
		goto Mretry;
	}
	
	//The byte is a symbolic index into the 'm' array of messages (see above)
	//It must be a HEX value from 0x30-0x33, otherwise you'll jump back to the wait loop above
	Serial.println("Got a command over Serial!");
	Serial.read((char*)buf, 1);
	buf[0] ^= 0x30;
	if (buf[0] > 3) { 
		Serial.println("Invalid command!");
		goto Mretry;
	}

	//1) Connect to the server
	//2) Send the chosen message
	//3) Disconnect and go to the Serial wait loop above
	if (!serverClient.connect(cicadaIP, 1)) {
		goto Mretry;
	}
	Serial.println("Connected to server!");

	serverClient.write(m[buf[0]]);

	Serial.println("Wrote bytes!");

	serverClient.stop();

	while (serverClient.status() != CLOSED) {
		delay(100);
	}
	Serial.println("Closed connection");
	
	goto Mretry;
}

//SERVER SIDE CODE
void setup()
{
	WiFiClient::setDefaultNoDelay(true);
	WiFiClient::setDefaultSync(true);

	delay(5000);
	Serial.begin(115200);
	WiFi.begin(router_ssid, router_pass, 6);
	while (WiFi.status() != WL_CONNECTED) {
		Serial.print(".");
		delay(500);
	}
	Serial.println("Connected!");
	Serial.print("IP address, gateway: ");
	Serial.println(WiFi.localIP());
	Serial.println(WiFi.gatewayIP());

	cicadaServer.begin();
	while (cicadaServer.status() != LISTEN) {
		delay(100);
	}
}

void loop()
{
	unsigned int work1;
	unsigned char buf[10]; //A receive buffer, just 10 bytes long
Mretry:
	if (Serial.available() == 0) { //wait until I send a byte over Serial
		delay(500);
		goto Mretry;
	}
	Serial.readBytes(buf, 1);

	Serial.println("waiting for connections...");
Mretry2:
	serverClient = cicadaServer.available(); //Try getting a connected client
	if (!serverClient) {
		delay(100);
		goto Mretry2;
	}

	Serial.println("Got connection!");
	//serverClient.stop();
	while (serverClient.status() != CLOSED) { //Wait until the connection status is closed
		delay(100);
	}
	Serial.println("Closed connection");

	delay(5000);
	work1 = serverClient.available(); //Check receive buffer and place first 10 bytes into our buffer. Send bytes over Serial
	if (work1 != 0) {
		if (work1 > 10)
			work1 = 10;
		Serial.println("There are ");
		Serial.print(work1);
		Serial.print(" bytes in the buffer!");
		serverClient.readBytes(buf, serverClient.available());
		Serial.println();
		Serial.write(buf, work1);
	}
	else {
		Serial.println("No bytes in reception buffer!");
	}
	
	serverClient.stop();

	goto Mretry; //Go back to the Serial wait loop above
}

And here are two screenshots of the exchange
ServerClientExperiment2_1_Label
ServerClientExperiment2_2_Label

@JAndrassy
Copy link
Contributor

and what is recorded in Wireshark?

@kirillandy
Copy link

kirillandy commented Feb 20, 2020

You had me worried there,@JAndrassy, I always forget an obvious part of every experiment I do and things turn out to be simpler than I think they are. Unfortunately, this wasn't the case 🤔
Here is what WireShark caught. I'm using netcat this time, because WireShark can't catch the 2 ESPs communicating with each other (promiscuous mode doesn't help much...).
ServerClientExperiment2_3_Label

Same exact situation. The connection is set up, the data is transmitted, then a RST happens. We've seen it all before. But just because I didn't call Server.available() all information on the server-side is lost and the server has no idea that anything happened. So we've got 2 completely different behaviors for the exact same scenario: if .available() WAS called, then the rec buffer is saved, whereas if .available() WASN'T called, the rec buffer vanishes. But in either case the connection is allowed to go through and data transmission occurs.
Wouldn't it be much more convenient and proper to either return the already closed client from the .available() function with its rec buffer intact, or to simply make Client.connect() return an error on the client side and NOT allow the connection to go through at all if the server didn't call .available()?

EDIT: (Personally, I suppose the first is much more logical and preferable than the second :))

@JAndrassy
Copy link
Contributor

I am afraid that in the scenario where you call available(), data larger then some internal buffer (or one tcp packet?) would not be preserved. So it is the same scenario. With RST esp8266 indicates that it is 'not happy' with the situation? (connection closed by the client too early)

@kirillandy
Copy link

I do understand the worries about overflow and discarding the buffer if we have more data than some segment size. But we're not talking about data that is larger than an internal buffer... I'm sending a measly 5-8 bytes in all my experiments :)
Why wouldn't ESP discard my 5 message bytes even after .available() was called? Either discard them or not. Why is the available() call so important? The low-level logic already handled the whole TCP exchange in both cases! Except, in one case, .before .available(), the buffer is always discarded, whereas in the other case, it is sometimes discarded (depending on the size, if I understood you correctly @JAndrassy.

Here's a better explanation of what bothers me.
In both experiments my client does connect->write->stop and none of those functions return any sort of error. That's what we mean by "identical scenario" The difference is only in the server code. It either managed to call available() or never got the chance to do that because it was busy with something else. With that in mind...
When I write my client logic I can only do 1 of 2 things after connect->write->stop:

  1. assume that everything was read by the server (actually taken out of the rec buffer) and continue doing my work based on this assumption. This is correct for the "after .available() case because the server retains the rec buffer and reads the data after the connection was closed, but wrong for the "before .available()" case because the buffer vanishes and the server gets nothing.
  2. I can be paranoid and say that the server didn't get my message (which is silly, but I can do it). This is correct for the "before .available()" case because the server didn't get anything, but wrong for the "after available()" case because the server got everything and reads the data from the saved rec buffer.
    So, in effect, for the EXACT SAME logic on the client (connect->write->stop), I can get 2 different, opposite reactions on the server, and I have NO WAY of knowing which one actually occurred. I don't see how this is possible to solve "algorithmically". What should the client do next? Execute code for the "data was received" case or execute code for the "data was not received" case? It all revolves around the fact if the server called .available(). The client has NO WAY of knowing this. All it knows is that if the handshake was successful and an ACK was received for the data bytes (see Wireshark), then all's good. But it isn't... sometimes.

Finally, there are 2 possible solutions I can think of to avoid this ambiguity:

  1. Let available() return the already closed client with the data that was sent. This means that the server will always be able to read the data that the lower level logic received and prepared for me, regardless of whether .available() was called. If the received data is larger than the buffer, then discard it, just like you already do for "after .available()" cases.
  2. Do not let the client connect and transmit data until available() is called, in effect making it a similar feature to the "so_conditional_accept" flag in Winsock (https://docs.microsoft.com/ru-ru/windows/win32/api/winsock/nf-winsock-setsockopt). The connect function on the client will then return false.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 20, 2020

We have ::flush() which I believe has not been tried so far: It only waits until the send buffer gets empty. It doesn't wait for the matching ack though.
It is not automatically called on destructor. Can you try to call it before getting rid of the sending client ?
(I have yet to do my own tests to understand better what is happening)

@kirillandy
Copy link

Sorry for the slight delay. I've got SetDefaultSync(true) in the setup() function on both sides which is the same, basically, isn't it? According to the docs: https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/client-class.html

I also remember using flush() but it used to immediately return and didn't help. Either way, what will it prove? If it only assures that the data left the send buffer (as you claim), then that isn't much help at all. If it guarantees that the ACK for all the sent data was received, then we still have the ambiguity problem which I described in my previous post.

@devyte
Copy link
Collaborator

devyte commented Feb 20, 2020

Some thoughts (@d-a-v please correct me if I'm wrong):

  • WiFiServer::available() "accepts" an incoming connection, and builds a WiFiClient to actually receive data. The other side of the connection shouldn't be able to send data until this new WiFiClient is created. If that is somehow not holding true, then either the other side is doing something wrong (e.g.: not waiting for the connection to be ready, or calling write(), but it fails and the error isn't being checked, etc), or there really is some bug.
  • WiFiClient::available() means exactly that: is there data available for reading out of the WiFiClient object. It has nothing to do with the connection status.
    • The normal case is when there is a connection open, and there is data available (available() returns true, connected() returns true)
    • It is possible to have the connection open, but no data available for reading (available() returns false, but connected() returns true). This means data is being read faster than it is coming in (i.e.: the other side stopped sending, but the connection is still there, so more data could come in later).
    • It is possible to have the connection dropped on the other side, but there is still data pending to be read out of the WiFiClient object buffers (available() returns true, but connection() is false).
      -WiFiClient::connected() means exactly that: is the connection established. It has nothing to do with whether there is data to be read.
      -Given the above:
    • Both available() and connected() should be used on the user side. How they are used depends on the user application's needs/protocol/data exchange semantics/etc, there are several ways to handle this, some simpler than others. For example, a small messaging model data exchange with tiny few-byte messages could have simpler handling compared to a large data exchange that requires data reassembly in the user's application, or compared to a permanent connection case such as a websocket implementation.
    • available() || connected() is the condition to use to check for incoming data during the lifetime of the connection => if this condition becomes false, it means data can't come in anymore.
    • connected() is the condition to use for writing to this side of the connection. In this case, flush() should be used after writing and before closing/destroying the WiFiClient.
  • I don't think you need to worry about overflowing a WiFiClient's internal buffers. Handling of the case should happen automatically inside the internals, i.e.: the other side of the connection shouldn't be able to send more data until the application side reads some data from this side's WiFiClient, thereby clearing space to receive more. This has been tested by a bandwidth stress test.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 20, 2020

@kirillandy

TBH I don't really understand the Arduino's Client::flush() documentation while Serial::flush()'s one is very clear and fixed (Prior ...) (they are both Stream::) and that's what esp8266-arduino's WiFiClient::flush() tries to do. It however does not wait for an ack, and that because we can't be sure an ack will come because they come every other packet. It will always come at closing time though, but ::flush() is not ::stop().

About the ambiguity you raised, I agree with 1) but 2), while being wrong today, is doable even though I don't like it because 1) is possible and more async, more fluent.
There is no possibility of flooding the server :

  • A client can't send too much data because TCP will prevent it to send more than two packets before receiving an ack (edit: that is true with git master only since the backlog-limit PR)
  • The very recent PR about TCP backlog-limit prevents to have too many clients connected with the right to send before one of them is sketch-acknowledged with WiFiServer::available().

(I have still not made the test myself)

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 21, 2020

I traced data within lwIP.
All data received when pcb->recv is not set (which is the case until WiFiServer::available() is called) are cancelled (tcp_recved() acknowleges data (= mark them as read), then pbuf is released without having been given to user because there is no way to do that).

@kirillandy
Copy link

Thanks for looking into it David! I see. So, if I'm reading this right, are we talking about a lower-level problem involving the specifics of the lwIP code, and it's more of a question for the devs of that library?

@JAndrassy
Copy link
Contributor

Thanks for looking into it David! I see. So, if I'm reading this right, are we talking about a lower-level problem involving the specifics of the lwIP code, and it's more of a question for the devs of that library?

I see it as a problem of mapping the LWIP API to Arduino API. Arduino API is not assync so the sketch must poll server and clients as all examples show it.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 21, 2020

@JAndrassy The underlying TCP management is async.

@kirillandy Assuming you are working with latest git, can you try to change false to true in WiFIServer.cpp L202 and comment L128 (//_unclaimed->acceptPCB();) ?

With that I always get "note2" and I can't explain right now why you didn't get it (before the backlog limit PR).
I however have a segfault on server when client sends data without server, then server is enabled.

edit:
@kirillandy please try with this around L128 (no more segfault):

        //_unclaimed->acceptPCB();
        if (_unclaimed->getPCB())
            tcp_backlog_accepted(_unclaimed->getPCB());

That's because when client is trying to make connections but no server is listening, lwIP seems to be recording the attemps, then when server is started the attemps are registering then immediately closed (closed: no pcb: segfault). That's odd, still under investigation.

update: that's not odd since the server is server::begin() in the sketch, only the connections are not accepted (server::available()) (well, they are tcp-accepted but not arduino-accepted yet and tcp-closed in the meantime)

update2: This fixes a bug in tcp backlog use.
But it does not fix the initial issue in which data sent by peer are lost if peer closes its connection before server::available() is called. But in that case, we cannot indefinitely store data of thousands connection saying hi before sketch decides to check server::available() (under investigation).

@kirillandy
Copy link

In other words, @JAndrassy, you mean I need to write my code to maximize the probability of my server getting the Client object from .available() during the brief period when my client is connected?

The most obvious idea would be to implement a "server .stop()s first" pattern where the server disconnects first, triggering a normal "active close" in the TCP diagram. I can't really use this because it takes time for my clients to be able to connect to the server again, because it goes into the TIME_WAIT state. Not saying it's terrible, it might be ideal for other projects, to each their own.

Otherwise, if I stick to my "client .stop()s first" preference, then I must write my server code in a way for it to almost always get Clients from .available() during the period when the client executes its connect->write->stop code. I agree this might work well with the examples you mentioned (you're referring to the examples in the docs of this repository, right? https://github.com/esp8266/Arduino/tree/0a58172c6af6c96b8caa1fe5aa4aaa44963d1bc3/doc/esp8266wifi)
But if my server ESP also has a lot of other work to do, then there is a good chance that I will miss this connect->write->close window, fail to call .available() and lose the data transmitted...

@kirillandy
Copy link

I'm on it @d-a-v
It might take a while, since I need to go through the Win 10 Git installation process (described in the docs) and I might not have time to get back to you today. Thanks again for taking the time, I hope this works :)

@JAndrassy
Copy link
Contributor

JAndrassy commented Feb 21, 2020

@JAndrassy The underlying TCP management is async.

But it does not fix the initial issue in which data sent by peer are lost if peer closes its connection before server::available() is called. But in that case, we cannot indefinitely store data of thousands connection saying hi before sketch decides to check server::available() (under investigation).

that is what I attempted to say: async api behind 'poll' api problem

@kirillandy
Copy link

kirillandy commented Feb 21, 2020

@d-a-v
re: update2
What's the upper limit of open receive buffers?
Yes, keeping the data received from an already closed connection for the server to read is easily exploited by flooding the server with simple connect->write->close, that was a glaring weakness but shouldn't that fact be left to the user's discretion? I can optimize my code to avoid this problem if the amount of allocated buffers is large enough. On average, I expect 3-5 separate devices to be able to connect to the server in such a fashion. And I doubt some evil madman will want to spend his time flooding my server this way on a local net, so personally, I'm not fazed by the problem. Still, it's up to the developers to decide what to include in the API.
My messages aren't what you'd call 'hi' messages. They're usually various commands that don't warrant a reply from the server, so I can immediately disconnect on the client and continue my work.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 21, 2020

My messages aren't what you'd call 'hi' messages. They're usually various commands that don't warrant a reply from the server, so I can immediately disconnect on the client and continue my work.

Of course. And TCP is here to guarantee to the client that the message has been acknowledged by server. In that case server should get it. If for any reason that cannot be honored, then client should not be able to connect in the first place.

@d-a-v
Copy link
Collaborator

d-a-v commented Feb 21, 2020

@kirillandy
You can try and use instead the beta release (v0.0.1) from https://d-a-v.github.io (having git is always better though) and modify sources from this version.

@kirillandy
Copy link

Thanks. No, Git is fine. It's about time I got it all set up. Haven't had to do this type of fine-tuning work yet, so it's all a beneficial learning process for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: network type: troubleshooting waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants