Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle ConnectionReset errors on windows UDP sockets #1448

Closed
manikawnth opened this issue Aug 9, 2022 · 8 comments · Fixed by #1662
Closed

handle ConnectionReset errors on windows UDP sockets #1448

manikawnth opened this issue Aug 9, 2022 · 8 comments · Fixed by #1662
Labels
bug Something isn't working os:windows

Comments

@manikawnth
Copy link

manikawnth commented Aug 9, 2022

Note: I'm a novice in both Rust and QUIC.

I'm attempting to open a send stream from QUIC server which periodically sends some data back to client which accepts the receive stream. However when the client is terminated (abruptly or neatly), server is not getting notification of the connection Error on the stream API. Instead Server panics with

A fatal IO error occurred (ConnectionReset): An existing connection was forcibly closed by the remote host.

What's the right way to handle such errors?

Server side code:

while let Some(connection) = server.accept().await {
        // spawn a new task for the connection
        println!("Debug: new connection from {}", connection.remote_addr()?);
        tokio::spawn(async move {
            uni_send_mode(connection).await;
            //uni_recv_mode(connection).await;
        });
 }


async fn uni_send_mode(mut connection: Connection) {
    let mut stream = connection.open_send_stream().await.expect("something wrong");
      
    eprintln!("Debug: unidi send stream created - {}", stream.id());
    tokio::spawn(async move {
        let mut scheduler = tokio::time::interval(std::time::Duration::from_secs(5));
        let mut counter = 0;

        'sendloop: loop { 
            scheduler.tick().await;
            counter = counter + 1;
            
            let b = Bytes::from(format!("server tick: {}\n", counter));
            match stream.send(b).await {
                Ok(()) => {eprintln!("Debug: data: {} sent on stream - {}", counter, stream.id());}
                Err(_) => {eprintln!("stream error"); break 'sendloop;} //EXPECTING THIS WOULD BE REACHED. BUT NOT
            }                   
        };
    });
}

Client side code:

    //open a receive stream
    if let Some(mut st2) =  connection.accept_receive_stream().await? {
        eprintln!("Debug: unidi stream accepted, id - {}", st2.id());
        tokio::spawn(async move {
            let mut stdout = tokio::io::stdout();        
            let _ = tokio::io::copy(&mut st2, &mut stdout).await;
        });
    }

    while let _ = tokio::io::stdin() {};
@camshaft
Copy link
Contributor

camshaft commented Aug 9, 2022

That message shouldn't be printing in normal execution. It appears that the OS is telling s2n-quic that the UDP socket has been reset, which we currently consider as a fatal error and shut down the server.

What platform are you on? Not sure what your stdin while loop is doing either. What happens if you remove that?

@manikawnth
Copy link
Author

What platform are you on? Not sure what your stdin while loop is doing either. What happens if you remove that?

I'm on Windows platform and stdin loop is just to keep the client side terminal running without terminating. Behavior is same even when the loop is not there. s2n_quic::stream::Error even has ConnectionError variant which is not returned in pattern matching (atleast for send functions).

My original statement, "the server panics" could be wrong. Seems the error is coming from here . Looks like the event loop ends, error message is printed and the guard is dropped.

But the same errors are caught on the Connection apis. Because the accept_ streams are in a loop and connection api is returning the error, we could either handle it or gracefully shutdown. The issue is only with open_stream apis where we're not getting those errors

@camshaft
Copy link
Contributor

camshaft commented Aug 9, 2022

You're correct that the message is being printed on that line. When that line prints an error, it's considered fatal and will shut down the entire s2n-quic endpoint, which is why your application code isn't notified of the error.

The source of the error is coming from the socket call in

. It seems to be that windows is returning an error if the peer UDP socket is closed, which is a bit surprising. I think we'll need to add an exception to the error handling code to filter out ConnectionReset errors and basically just ignore them.

@camshaft camshaft added bug Something isn't working os:windows labels Aug 9, 2022
@camshaft camshaft changed the title How to handle connection errors on stream api handle ConnectionReset errors on windows UDP sockets Aug 9, 2022
@manikawnth
Copy link
Author

Update:
The error is coming from rx.rx instead of tx. This is because of Windows behavior as documented in recvfrom API.
For WSAECONNRESET, it's documented:

The virtual circuit was reset by the remote side executing a hard or abortive close. The application should close the socket; it is no longer usable. On a UDP-datagram socket this error indicates a previous send operation resulted in an ICMP Port Unreachable message.

The behavior can be disabled but WSAIOctl win API needs to be exposed somewhere in std library. Ref

Tokio is also exposing the ICMP rejections and I confirmed from wireshark that the rx errors are indeed 'coz of ICMP rejections (for connected UDP sockets)

To circumvent for now, I did as you suggested, to filter out the conn reset error codes from breaking the event_loop.

//in std.rs

fn connection_reset(&self) -> bool {
        self.kind() == std::io::ErrorKind::ConnectionReset
}

     //inside rx function
     Err(err) if err.connection_reset() => {
            break;
     }

Beyond this I had to either flush() after I send() on the stream or wait for timeout so that I get the right error back from my app.
I don't believe there's any other way to receive the rx error immediately except to wait for the timeout.
Is my understanding correct?

@camshaft
Copy link
Contributor

I believe the correct path forward is to disable this behavior, as it seems from the documentation that the socket becomes unusable after receiving the error?

In terms of your fix, I don't think you want to break but instead just do count += 1 and continue to receive from the socket. Otherwise the batch will be cut short and end up waking up immediately after anyway.

Beyond this I had to either flush() after I send() on the stream or wait for timeout so that I get the right error back from my app.

If you client is closing before the server is able to fully ACK the data, you need to block the client after you've connected to the server with wait_idle. This will stop the main async task from exiting too soon and shutting down the tokio runtime.

@manikawnth
Copy link
Author

Thanks. Incrementing the count is just working fine.

If you client is closing before the server is able to fully ACK the data

Mine is not this case as I'm sending (send()) from server. Client initiates the connection, but server opens a send-only stream to client (like a publisher) and client accepts the Recv stream. I need to assume as if I've no control on how client behaves.

Other thing I notice: Is this supposed to publish a PlatformTxError instead of Rx?

@PeteAudinate
Copy link
Contributor

I think I'm seeing this issue too, and judging by the comments or code I don't think it was actually fixed.
Or was this fixed some other way and I'm missing something?

@camshaft
Copy link
Contributor

No I don't think it was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working os:windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants