Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug: initial state of an epoll eid for listener socket not set #2444

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

ethouris
Copy link
Collaborator

@ethouris ethouris commented Aug 25, 2022

Problem 1: if you make a listener socket, but delay with subscribing it to the epoll eid, and the first incoming connection happens before this subscription, the listener's readiness for accept will not be seen in the EID. Generally, it's the epoll flag updating problem.

Problem 2: Incorrectly handled the case of accepting the group connection as the first accepted in various corner cases concerning the multiple pending connections: the idea was wrong to simply qualify the first pending connection as the only one that should make the application get the group connection. This is only working with the optimistic precondition that this connection will not break in the meantime before being accepted.

Changes applied here:

  1. In addEPoll for a socket there's an additional check for events done for a case when it's a listener socket. The SRT_EPOLL_IN event is set only if there are any queued sockets, and if they are group members, it's also checked if the group isn't already connected and accept-extracted
  2. The accept-extracted state is defined through the new m_bPending flag in the group structure. This is true by default for a group created as a mirror group on the listener side for the very first member connection. The group stops being pending only at the moment when it was extracted through the srt_accept call.
  3. If there are multiple pending sockets, also in multiple listener sockets, all listener sockets that contain such pending connection have the SRT_EPOLL_IN flag set (even though only one of them can provide the group connection through srt_accept).
  4. When a group connection is accepted, all listener sockets in the whole application are checked if they contain any sockets that are members of the group being currently accepted. All sockets that belong to the currently accepted group are removed from the queue, and if that made the queue empty, the IN state is also cleared on that listener socket. NOTE: This might potentially be a performance issue. Consider creating a separate container, for example a list with pointers, which will keep all listener sockets.
  5. The documentation for srt_accept has been updated and all these above rules described.

Several changed rules are:

Previously: when there were multiple incoming connections for the same group, especially coming in over different listener sockets, only the first ever reporting in the group was set the SRT_EPOLL_IN flag and only from that one it was possible to accept the group connection, others were always handled in the background. Also the SRT_EPOLL_UPDATE flag was not set properly if the subscription for the listener socket was done after the connection pending state.

Now: when there are multiple incoming connections for the same group, all listeners are set SRT_EPOLL_IN flag and the application can accept the group off any listener that reports incoming connection. Once the group is extracted through calling srt_accept on any of the listeners, all pending connections for the same group in any other listeners are withdrawn, handled in the background, and SRT_EPOLL_IN flag is cleared if there are no more pending connections.

The reasoning: theoretically it doesn't matter from which listener socket you will accept the group, so whether it was only one, or all of them, it shouldn't matter. But the problem could be if this first connection gets broken before the application does accept, while other connections from the same group are still pending. In such a situation the old solution could be in a weird kind of state with some kind of "forever connection" seen at the caller side, but never accepted at the listener side.

@ethouris ethouris added this to the v1.6.0 milestone Oct 11, 2023
@ethouris ethouris added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Oct 11, 2023
srtcore/api.cpp Fixed Show fixed Hide fixed
@codecov-commenter
Copy link

codecov-commenter commented Nov 1, 2023

Codecov Report

Attention: Patch coverage is 94.90446% with 16 lines in your changes missing coverage. Please review.

Project coverage is 67.83%. Comparing base (36260c3) to head (461def5).
Report is 43 commits behind head on master.

Files with missing lines Patch % Lines
test/test_bonding.cpp 90.76% 6 Missing ⚠️
srtcore/api.cpp 91.11% 4 Missing ⚠️
srtcore/epoll.cpp 42.85% 4 Missing ⚠️
test/test_epoll.cpp 98.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2444      +/-   ##
==========================================
+ Coverage   65.08%   67.83%   +2.75%     
==========================================
  Files         101      101              
  Lines       17655    18229     +574     
==========================================
+ Hits        11490    12365     +875     
+ Misses       6165     5864     -301     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxsharabayko maxsharabayko modified the milestones: v1.5.4, v1.6.0 Aug 23, 2024
@ethouris ethouris marked this pull request as ready for review September 10, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants