Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close the tcp connection, when server doesn't end the websocket gracefully (IDFGH-9274) #10657

Open
3 tasks done
ankit-thealchemist opened this issue Jan 30, 2023 · 10 comments
Open
3 tasks done
Assignees
Labels
Awaiting Response awaiting a response from the author Status: In Progress Work is in progress Type: Bug bugs in IDF

Comments

@ankit-thealchemist
Copy link

ankit-thealchemist commented Jan 30, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v4.4.3

Operating System used.

Linux

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

esp32 devtkit c v4

Power Supply used.

USB

What is the expected behavior?

We are developing the EV charger, using the OCPP protocol 1.6 which needed to be used over the websocket. A instance of a server having the issue, it sometimes closes the websocket without closing the TCP connection, as you can see in the attacched log.
After that our websocket connection doesn't work. It just donot connect to the server again. I tried manually disconnecting the websocket and try to reconnect it, but it doesn't work.

On the other hand other linux based devices doesn't have the same issue, so its hard for me to convince the server team about the websocket server fault.

As a workaround, whenever we receive such error, we need to disconnect and reconnect the wifi. This seems to resolve the issue, but it takes some time in time critical event.

Is there is a solution to reset the TCP connection without resetting the wifi?

What is the actual behavior?

image

Steps to reproduce.

\

Debug Logs.

No response

More Information.

No response

@ankit-thealchemist ankit-thealchemist added the Type: Bug bugs in IDF label Jan 30, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 30, 2023
@github-actions github-actions bot changed the title Close the tcp connection, when server doesn't end the websocket gracefully Close the tcp connection, when server doesn't end the websocket gracefully (IDFGH-9274) Jan 30, 2023
@ESP-YJM
Copy link
Collaborator

ESP-YJM commented Feb 1, 2023

@ankit-thealchemist Could you please offer the simple code that when your websocket client disconnected from the server, how do you re-connect? If you can offer some log, it may be more helpful.

@ankit-thealchemist
Copy link
Author

@ESP-YJM Thanks for the speedy response!
If I gave you the simple code, it won't work, because it gives error on my particular server. You can take the simple websocket example as the code.
I just then disconnect the wifi by esp_wifi_disconnect(), then reconnect the wifi, it resovles the problem in reconnecting the websocket.

As far as I can understand about the problem is that, In the normal circumstances, if we close the websocket, we terminate the websocket connection by sending the close frame and then close the underlying TCP connection as a norm. In my case, server send the close frame, but donot send the request to close the TCP connection. Esp wait for closing the Tcp connection, and its wait forever to close the tcp connection.
When I disconnect from wifi connection, it destroy the whole netif stack associated with the wifi adapter.It destroy all the tcp connection and due to this, the underlying tcp connection closes.
After reconnecting the wifi, the whole tcp connection stack is recreated, which solves the problem, of reconnection with the web socket connection.

Please let me know if you need anything more.

@ESP-YJM
Copy link
Collaborator

ESP-YJM commented Feb 1, 2023

@ankit-thealchemist I check the code, ESP not wait forever to close the TCP connection, it will wait about 1s and then close TCP itself. You can refer to code https://github.com/espressif/esp-idf/blob/v4.4.3/components/esp_websocket_client/esp_websocket_client.c#L700-L703, it will break the while cycle, and call esp_transport_close(client->transport) to close TCP. And then you only need call esp_websocket_client_start, it may connect websocket server. You can add some log to track.

@chipweinberger
Copy link
Contributor

chipweinberger commented Feb 3, 2023

ESP-IDF only closest sockets using LRU purge.

this is generally what you want.

if you want some other behavior, you can convert the request object that you get in the HTTP handler into fd. see httpd_req_Xxx() functions

you can then close that socket whenever you like.

I also recommend taking a look at the http code. it's not that complicated. see here #10594 (comment)

@ESP-YJM
Copy link
Collaborator

ESP-YJM commented Feb 6, 2023

Hi @chipweinberger , If I understand @ankit-thealchemist correct, this is not an http server issue, this is an esp_websocket_client component issue.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Jun 14, 2023
@ankit-thealchemist
Copy link
Author

ankit-thealchemist commented Jul 28, 2023

@ESP-YJM

@ankit-thealchemist I check the code, ESP not wait forever to close the TCP connection, it will wait about 1s and then close TCP itself. You can refer to code https://github.com/espressif/esp-idf/blob/v4.4.3/components/esp_websocket_client/esp_websocket_client.c#L700-L703, it will break the while cycle, and call esp_transport_close(client->transport) to close TCP. And then you only need call esp_websocket_client_start, it may connect websocket server. You can add some log to track.

image

yes it will break the while cycle and clear tcp layer and delete the task. The problem it never call out the 'WEBSOCKET_EVENT_CLOSED' event(when ret = 0, and unfortunately thats my case) and I will never be able to catch this event in the websocket event handler in main application . Due to this I never able to start the websocket connection again.
Please provide the alternative resolution to this.

@ankit-thealchemist
Copy link
Author

@ESP-YJM Any update on the issue?

@ESP-YJM
Copy link
Collaborator

ESP-YJM commented Aug 16, 2023

@ankit-thealchemist You can remove break when ret is 0 and try.

@espressif-bot espressif-bot added the Awaiting Response awaiting a response from the author label Oct 18, 2023
@hitecSmartHome
Copy link

hitecSmartHome commented Jun 27, 2024

I have this exact same problem. I got a close frame from the server and it just drops the connection. My ESP stucks in this state and can not reconnect. Also it does not see any close so it thinks its still connected and every other send fails.

void WebsocketClient::handler(void* event_handler_arg, esp_event_base_t event_base, int32_t event_id, void* event_data) {
    esp_websocket_event_data_t* data = (esp_websocket_event_data_t*)event_data;
    
    switch (event_id) {
        case WEBSOCKET_EVENT_CONNECTED:
            if (connectCB) { connectCB(); }
            return;
        
        case WEBSOCKET_EVENT_DISCONNECTED:
            if (disconnectCB) { disconnectCB(); }
            return;
        
        case WEBSOCKET_EVENT_ERROR:
            if (errorCB) { errorCB(); }
            return;
        
        case WEBSOCKET_EVENT_DATA:
            handleWebSocketData(data);
            return;
        
        default:
            return;
    }
}

void WebsocketClient::handleWebSocketData(esp_websocket_event_data_t* data) {
    if (data->op_code == 0x08 && data->data_len == 2) {
        printf("[WebsocketClient] - Got Close data\n");
        return;
    }

    // Check for ping data
    if (data->op_code == 0x09) {
        printf("[WebsocketClient] - Got Ping data\n");
        return;
    }

    // Check for pong data
    if (data->op_code == 0x0A) {
        printf("[WebsocketClient] - Got Pong data\n");
        return;
    }

}

This is how I got here. I can't reinit. esp_websocket_client_close and esp_websocket_client_destroy can not be called from the event handler and they also does nothing outside of the handler either.

I also can't reinit the wifi because I'm using wired ethernet.

@adityachd
Copy link

Hi. I was having the same issue. The websocket server sends opcode 8 but does not close the tcp connection (after 1 hour as configured on the server I used). The server did not close the TCP connection and ESP would wait for TCP connection to get closed, which would never happen. To solve this, I configured the server to close the TCP connection after sending ws close request, opcode 8. Now ESP retries to make a websocket connection. For anyone else who has the same issue, I also achieved this same functionality by tweaking the esp IDF code a bit.
Here is the function(inside esp_websocket_client.c) that had to be changed for achieving reconnection in such a case,

static void esp_websocket_client_task(void *pv)
{
    const int lock_timeout = portMAX_DELAY;
    esp_websocket_client_handle_t client = (esp_websocket_client_handle_t) pv;
    client->run = true;

    //get transport by scheme
    client->transport = esp_transport_list_get_transport(client->transport_list, client->config->scheme);

    if (client->transport == NULL) {
        ESP_LOGE(TAG, "There are no transports valid, stop websocket client");
        client->run = false;
    }
    //default port
    if (client->config->port == 0) {
        client->config->port = esp_transport_get_default_port(client->transport);
    }

    client->state = WEBSOCKET_STATE_INIT;
    xEventGroupClearBits(client->status_bits, STOPPED_BIT | CLOSE_FRAME_SENT_BIT);
    int read_select = 0;
    while (client->run) {
        if (xSemaphoreTakeRecursive(client->lock, lock_timeout) != pdPASS) {
            ESP_LOGE(TAG, "Failed to lock ws-client tasks, exiting the task...");
            break;
        }
        switch ((int)client->state) {
            case WEBSOCKET_STATE_INIT:
                if (client->transport == NULL) {
                    ESP_LOGE(TAG, "There are no transport");
                    client->run = false;
                    break;
                }
                if (esp_transport_connect(client->transport,
                                          client->config->host,
                                          client->config->port,
                                          client->config->network_timeout_ms) < 0) {
                    ESP_LOGE(TAG, "Error transport connect");
                    esp_websocket_client_abort_connection(client);
                    break;
                }
                ESP_LOGD(TAG, "Transport connected to %s://%s:%d", client->config->scheme, client->config->host, client->config->port);

                client->state = WEBSOCKET_STATE_CONNECTED;
                client->wait_for_pong_resp = false;
                esp_websocket_client_dispatch_event(client, WEBSOCKET_EVENT_CONNECTED, NULL, 0);

                break;
            case WEBSOCKET_STATE_CONNECTED:
                if ((CLOSE_FRAME_SENT_BIT & xEventGroupGetBits(client->status_bits)) == 0) { // only send and check for PING
                                                                                                          // if closing hasn't been initiated
                    if (_tick_get_ms() - client->ping_tick_ms > client->config->ping_interval_sec*1000) {
                        client->ping_tick_ms = _tick_get_ms();
                        ESP_LOGD(TAG, "Sending PING...");
                        esp_transport_ws_send_raw(client->transport, WS_TRANSPORT_OPCODES_PING | WS_TRANSPORT_OPCODES_FIN, NULL, 0, client->config->network_timeout_ms);

                        if (!client->wait_for_pong_resp && client->config->pingpong_timeout_sec) {
                            client->pingpong_tick_ms = _tick_get_ms();
                            client->wait_for_pong_resp = true;
                        }
                    }

                    if ( _tick_get_ms() - client->pingpong_tick_ms > client->config->pingpong_timeout_sec*1000 ) {
                        if (client->wait_for_pong_resp) {
                            ESP_LOGE(TAG, "Error, no PONG received for more than %d seconds after PING", client->config->pingpong_timeout_sec);
                            esp_websocket_client_abort_connection(client);
                            break;
                        }
                    }
                }

                if (read_select == 0) {
                    ESP_LOGV(TAG, "Read poll timeout: skipping esp_transport_read()...");
                    break;
                }
                client->ping_tick_ms = _tick_get_ms();

                if (esp_websocket_client_recv(client) == ESP_FAIL) {
                    ESP_LOGE(TAG, "Error receive data");
                    esp_websocket_client_abort_connection(client);
                    break;
                }
                break;
            case WEBSOCKET_STATE_WAIT_TIMEOUT:

                if (!client->config->auto_reconnect) {
                    ESP_LOGE(TAG, "Auto reconnect off after timeout");
                    client->run = false;
                    break;
                }
                if (_tick_get_ms() - client->reconnect_tick_ms > client->wait_timeout_ms) {
                    client->state = WEBSOCKET_STATE_INIT;
                    client->reconnect_tick_ms = _tick_get_ms();
                    ESP_LOGD(TAG, "Reconnecting...");
                }
                break;
            case WEBSOCKET_STATE_CLOSING:
                // if closing not initiated by the client echo the close message back
                if ((CLOSE_FRAME_SENT_BIT & xEventGroupGetBits(client->status_bits)) == 0) {
                    ESP_LOGD(TAG, "Closing initiated by the server, sending close frame");
                    esp_transport_ws_send_raw(client->transport, WS_TRANSPORT_OPCODES_CLOSE | WS_TRANSPORT_OPCODES_FIN, NULL, 0, client->config->network_timeout_ms);
                    xEventGroupSetBits(client->status_bits, CLOSE_FRAME_SENT_BIT);
                }
                break;
            default:
                ESP_LOGD(TAG, "Client run iteration in a default state: %d", client->state);
                break;
        }
        xSemaphoreGiveRecursive(client->lock);
        if (WEBSOCKET_STATE_CONNECTED == client->state) {
            read_select = esp_transport_poll_read(client->transport, 1000); //Poll every 1000ms
            if (read_select < 0) {
                ESP_LOGE(TAG, "Network error: esp_transport_poll_read() returned %d, errno=%d", read_select, errno);
                esp_websocket_client_abort_connection(client);
            }
        } else if (WEBSOCKET_STATE_WAIT_TIMEOUT == client->state) {
            // waiting for reconnecting...
            vTaskDelay(client->wait_timeout_ms / 2 / portTICK_RATE_MS);
        } else if (WEBSOCKET_STATE_CLOSING == client->state &&
                  (CLOSE_FRAME_SENT_BIT & xEventGroupGetBits(client->status_bits))) {
            ESP_LOGD(TAG, " Waiting for TCP connection to be closed by the server");
            // int ret = esp_transport_ws_poll_connection_closed(client->transport, 1000);
            int ret = esp_websocket_client_abort_connection(client);
            vTaskDelay(client->wait_timeout_ms / 2 / portTICK_RATE_MS);
            client->state = WEBSOCKET_STATE_INIT;
            client->reconnect_tick_ms = _tick_get_ms();
            ESP_LOGD(TAG, "Reconnecting..., run = %d, transport =%d",client->run,client->transport==NULL?0:1); 
            if (ret == 0) {
                // still waiting
                // break;
            }
            if (ret < 0) {
                ESP_LOGW(TAG, "Connection terminated while waiting for clean TCP close");
            }
            // client->run = false;
            // client->state = WEBSOCKET_STATE_UNKNOW;
            esp_websocket_client_dispatch_event(client, WEBSOCKET_EVENT_CLOSED, NULL, 0);
            // break;
        }
    }

    esp_transport_close(client->transport);
    xEventGroupSetBits(client->status_bits, STOPPED_BIT);
    client->state = WEBSOCKET_STATE_UNKNOW;
    vTaskDelete(NULL);
}

Hope this helps someone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response awaiting a response from the author Status: In Progress Work is in progress Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

6 participants