Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP RST on Stale Socket #65

Open
simonlg opened this issue Mar 22, 2021 · 2 comments
Open

TCP RST on Stale Socket #65

simonlg opened this issue Mar 22, 2021 · 2 comments

Comments

@simonlg
Copy link

simonlg commented Mar 22, 2021

Setup

Alice using TCP when communicating to the VIP (Virtual IP) of Bob

Scenario

  1. Execute a SIP call
  2. TCP socket is established between Alice and Bob
  3. Failover of the VIP to a new Bob instance. New Bob instance isn't aware of currently opened sockets
  4. Execute a SIP call
  5. Alice reuses the previously opened TCP socket and receives TCP RST
  6. No retries are sent.

Logs

[2021-02-26 13:55:18.486] DEBUG [CmdProcessor-14] enableTimeoutTimer gov.nist.javax.sip.stack.SIPClientTransactionImpl@ebea8133 tickCount 64 currentTickCount = -1 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.486] DEBUG [CmdProcessor-14] sendBytes TCP local inAddr 10.136.20.138 remote inAddr 10.136.20.139 port = 5055 length = 1876 isClient true - (unknown.jul.logger:243)
[2021-02-26 13:55:18.492] DEBUG [TCPMessageChannelThread] IOException closing sock java.net.SocketException: Connection reset - (unknown.jul.logger:243)
[2021-02-26 13:55:18.493] DEBUG [TCPMessageChannelThread] Closing socket tcp:10.136.20.139:5055 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.493] DEBUG [PipelineThread-1] Received CRLF - (unknown.jul.logger:243)
[2021-02-26 13:55:18.493] DEBUG [PipelineThread-1] KeepAlive Double CRLF received, sending single CRLF as defined per RFC 5626 Section 4.4.1 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.493] DEBUG [PipelineThread-1] ~~~ setting isPreviousLineCRLF=false - (unknown.jul.logger:243)
[2021-02-26 13:55:18.495] DEBUG [TCPMessageChannelThread] Closing my parser gov.nist.javax.sip.parser.PipelinedMsgParser@4fba0b56 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.495] DEBUG [TCPMessageChannelThread] Closing pipelinedmsgparser gov.nist.javax.sip.parser.PipelinedMsgParser@4fba0b56 threadname PipelineThread-1 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Closing client output stream java.net.SocketOutputStream@4b859679 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Closing TCP socket 10.136.20.139:5055 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] removed Socket and Semaphore for key 10.136.20.139:5055 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Closing message Channel (key = tcp:10.136.20.139:5055)gov.nist.javax.sip.stack.TCPMessageChannel@29e6bf37 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Thread[TCPMessageChannelThread,5,CmdProcessor] removing tcp:10.136.20.139:5055 for processor /10.136.20.138:14466/tcp - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Thread[TCPMessageChannelThread,5,CmdProcessor] Removing incoming channel tcp:10.136.20.139:5055 for processor /10.136.20.138:14466/tcp - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [TCPMessageChannelThread] Closing pipelinedmsgparser gov.nist.javax.sip.parser.PipelinedMsgParser@4fba0b56 threadname PipelineThread-1 - (unknown.jul.logger:243)
[2021-02-26 13:55:18.496] DEBUG [PipelineThread-1] thread ending for threadname PipelineThread-1 - (unknown.jul.logger:243)

The IOException is handled by ConnectionOrientedMessageChannel within the catch (IOException ex) at line 595.

Experiments

We've disabled gov.nist.javax.sip.CACHE_CLIENT_CONNECTIONS, it helps for new dialogs but we have the problems on an existing dialog where the SIP Option is sent to a stale TCP socket.

We've switched to NIO and we have the same behavior.

How Should We Handle It?

In SIPTransactionImpl, the retransmission is only activated when the connection is UDP. When using TCP, the error is caught silently. I see 2 options

  • User starts a retransmission timer
  • Jain-SIP retries when socket is RST.

Should it be fixed in Jain-SIP? What should we do?

@vladimirralev
Copy link
Collaborator

Hi,

I think JSIP used to do reconnect attempts but a local side disconnect can force massive amount of sockets to start reconnecting with each retransmission and causes sudden havoc because it takes time to recycle sockets on some OSes. I think it makes sense to have this behind a flag perhaps, but I am not sure how hard is to test it properly.

If I understand correctly, you are sending out request on a stale socket. This transaction should timeout(32 secs by default if you are transaction or dialog stateful stack, configurable) and on the user level you should be able to handle the transaction timeout and it's up to you at this point if you want to send a request again.

Many applications send periodic OPTIONS requests as a heartbeat. You can also try to send periodic TCP keepalive (it's an OS level setting), but the success of TCP keepalives varies by OS, I haven't tested this in years now.

@simonlg
Copy link
Author

simonlg commented Mar 22, 2021

Because a new instance of Bob is listening on the port, it sends a TCP RST immediately. At the user level, the transaction will timeout.

Ideally, the user level code should be independent of the transport. On UDP, we don't have to retry after a transaction timeout because we know that the retransmissions were sent. On TCP, the user level code would have to behave differently and retry.

We tried TCP keepalives at the OS but there is still a gab where the problem can happen.

kpouer pushed a commit to kpouer/jsip that referenced this issue Jan 8, 2023
(cherry picked from commit 3c97c4cb577d4e857ee9d0ca0266ffc85ba8a080)
kpouer pushed a commit to kpouer/jsip that referenced this issue Jan 15, 2023
(cherry picked from commit 3c97c4cb577d4e857ee9d0ca0266ffc85ba8a080)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants