Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP socket leaks #608

Open
tubzby opened this issue Aug 28, 2024 · 15 comments
Open

UDP socket leaks #608

tubzby opened this issue Aug 28, 2024 · 15 comments

Comments

@tubzby
Copy link

tubzby commented Aug 28, 2024

I have a program working as a proxy to convert RTSP stream to WebRTC stream so I can visit my cameras on the browser, normally a connection is closed when I close the tab, and I'm monitoring the ICE state to close PeerConnection, every time I call PeerConnection.close().await it complains:

Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  INFO avlib::pc_wrapper: close
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc::peer_connection::peer_connection_internal: Failed to accept RTCP SessionSRTP has been closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc::peer_connection::peer_connection_internal: Failed to accept RTP SessionSRTP has been closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc_ice::agent::agent_internal: [controlled]: Failed to close candidate tcp4 host 192.168.0.28:9: the agent is closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc_ice::agent::agent_internal: [controlled]: Failed to close candidate udp4 host 192.168.0.28:63422: the agent is closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc_ice::agent::agent_internal: [controlled]: Failed to close candidate udp4 srflx 58.250.221.39:63422 related 192.168.0.28:63422: the agent is closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  WARN webrtc_ice::agent::agent_internal: [controlled]: Failed to close candidate udp4 relay 112.74.41.31:64537 related 58.250.221.39:63422: the agent is closed
Aug 28 14:46:56 debian10.toybrick rtsp2webrtc[23297]:  INFO webrtc_ice::agent::agent_internal: [controlled]: Setting new connection state: Closed

and all the UDP socket is never closed, I have confirmed that by lsof -n -p $pid.

@tubzby
Copy link
Author

tubzby commented Aug 28, 2024

img_v3_02e6_b526a383-bfb3-49eb-8450-68d8298c727g

From the output of tokio-console, there are 2 tasks in warning state: "2 tasks have lost their waker", this might be the cause of leaks.

---EDIT 1-------

After adding some debug log, it seems there are an orphan task that never quit:

https://github.com/webrtc-rs/webrtc/blob/master/dtls/src/conn/mod.rs#L322

@tubzby
Copy link
Author

tubzby commented Aug 29, 2024

I'm stuck, but with some findings:

  1. turn/src/client/mod.rs: Client created in gather_candiates_relay was never dropped.
  2. PeerConnection Agent AgentInternal were dropped.
  3. Candidate.close() was called, but for UdpSocket, it does nothing, it has to be dropped.

It might related to the screenshot above, for some reason, the task created at agent_gather.rs:773 lost its waker, so the UdpSocket never released.

@tubzby
Copy link
Author

tubzby commented Aug 30, 2024

Update:

I have been using Arc::strong_count to monitor Arc<CandidateBase> and find out that the count is 6.

I created a tokio task to spy on it, so the actual reference count is 5 which causing this problem.

@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

Finally, It leads to one possible line:

async fn close(&self) -> std::result::Result<(), util::Error> {

AgentConn::checklist should be cleared on close.

But there is still 1 leaking UdpSocket.

@rainliu
Copy link
Member

rainliu commented Aug 31, 2024

Great catch, could you submit a PR to fix it? Thanks

@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

No problem, will submit a PR if there are no more leaks.

@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

Anyway, I was doing Arc::strong_count to tackle this, is there a better way to do that?
image

@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

Addition to: #608 (comment)

AgentConn.selected_pair should also be cleared.

@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

//TODO: hook up _close_query_signal_tx to Agent or Candidate's Close signal?

mdns.query is never quit.

@rainliu There's a TODO comment line here, what's your suggestion?

tubzby pushed a commit to tubzby/webrtc-rs that referenced this issue Aug 31, 2024
@tubzby
Copy link
Author

tubzby commented Aug 31, 2024

image

DTLSConn task stucked:

tokio::spawn(async move {

@tubzby
Copy link
Author

tubzby commented Sep 11, 2024

I'm quite confused about why we should release these structures manually, is there a reference loop for Arc?

@dbidwell94
Copy link

I'm also seeing some odd behavior here. I have to manually impl Drop for my struct so the connections can be dropped. But once a connection has been facilitated, this code never returns and is stuck forever on waiting for the RTCPeerConnection to finish its call to close()

impl<'a> Drop for P2PConnection<'a> {
    fn drop(&mut self) {
        futures::executor::block_on(async move {
            let _ = self.data_channel.close().await;
            println!("Data Channel has been closed");
            let _ = self.connection.close().await;
            println!("Connection has been closed");
        });
    }
}

@dbidwell94
Copy link

This is the code in question that I believe is causing hangups in my project: https://github.com/dbidwell94/rust_p2p/blob/master/src/p2p_connection.rs#L187-L196

And the test in question is here: https://github.com/dbidwell94/rust_p2p/blob/master/src/p2p_connection.rs#L241

You should be able to easily pull the code down and run cargo test -- --no-capture to see the log outputs. Notice on the test in question, it drops the data_channel, but it never finishes dropping the RTCPeerConnection.

@mutexd
Copy link
Contributor

mutexd commented Nov 19, 2024

@dbidwell94 I see that you're using futures::executor:block_on in the drop. The catch is that futures::executor::block_on would block the current caller until the future has completed. In the test, there is only one thread unless it's specified. You could use #[tokio::test(flavor = "multi_thread")] in your test and see if the problem persists.

@dbidwell94
Copy link

@dbidwell94 I see that you're using futures::executor:block_on in the drop. The catch is that futures::executor::block_on would block the current caller until the future has completed. In the test, there is only one thread unless it's specified. You could use #[tokio::test(flavor = "multi_thread")] in your test and see if the problem persists.

Confirmed, the tests now pass. Thanks for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants